PDF Parsing API

Parse PDFs to structured JSON using any AI provider. Gemini, Claude, OpenAI, or Ollama for fully local. Same typed blocks as Office formats — no vendor lock-in.

PDFs Have No Semantic Structure

A PDF encodes where to draw glyphs on a page. It has no concept of "heading", "paragraph", or "table cell". Extracting document structure from PDF requires visual understanding — recognizing that larger bold text at the top of a page is a heading, that aligned columns of text form a table, that reading order flows left-to-right then top-to-bottom.

AILANG Parse delegates this visual understanding to the AI provider of your choice, then normalizes the output into the same Block ADT used for DOCX, XLSX, and PPTX parsing. Your downstream code handles all formats identically.

If your source document is a DOCX that was converted to PDF, parse the DOCX directly instead. You'll get track changes, comments, merged cells, and metadata that the PDF lost. See why DOCX > PDF for parsing.

How It Works

AILANG Parse sends the PDF to the configured AI provider with a structured extraction prompt. The AI returns content in a normalized format, which AILANG Parse validates and converts into typed blocks. The result is the same Block ADT — headings, text, tables, images — regardless of which AI provider performed the extraction.

Switch providers with a single flag. No code changes, no output format differences.

# Gemini (fastest, most cost-effective)
ailang run --entry main --caps IO,FS,Env,AI --ai gemini-2.5-flash \
  docparse/main.ail document.pdf

# Claude
ailang run --entry main --caps IO,FS,Env,AI --ai claude-sonnet-4-5-20250514 \
  docparse/main.ail document.pdf

# Ollama (fully local)
ailang run --entry main --caps IO,FS,Env,AI --ai ollama/llama3.2-vision \
  docparse/main.ail document.pdf

AI Providers

Gemini

Best for high-volume processing. gemini-2.5-flash offers the fastest response times and lowest per-page cost. gemini-2.5-pro for complex documents with dense tables or multi-column layouts.

Claude

Strong on nuanced document understanding. claude-sonnet-4-5-20250514 excels at extracting meaning from complex layouts and academic papers. Good for documents where context matters.

OpenAI

Solid general-purpose extraction. gpt-4o handles standard business documents well. Use when your team already has an OpenAI API key and wants a single provider.

Ollama (Local)

Fully local processing — no data leaves your machine. Use any vision-capable model: llama3.2-vision, llava, etc. Ideal for sensitive documents, air-gapped environments, or cost-zero development.

What Gets Extracted

Headings & Text

The AI identifies heading hierarchy and body text from visual cues (font size, weight, position). Extracted as heading and text blocks with inferred levels.

Tables

Visually aligned data is extracted as structured tables with headers and rows. Multi-page tables spanning page breaks are handled by the AI's visual context window.

Images & Figures

Embedded images are identified with their captions and alt text. The AI describes figure content for downstream indexing or accessibility.

Uniform Output

Regardless of AI provider, the output is the same Block ADT: heading, text, table, image. Your pipeline code never changes when you switch providers.

Local Processing with Ollama

For sensitive documents that can't leave your network, use Ollama as the AI backend. Install Ollama, pull a vision-capable model, and point AILANG Parse at it. Zero cloud calls.

# Install Ollama and pull a vision model
ollama pull llama3.2-vision

# Parse PDF fully locally
ailang run --entry main --caps IO,FS,Env,AI \
  --ai ollama/llama3.2-vision \
  docparse/main.ail confidential-report.pdf

The output format is identical to cloud providers. You can develop locally with Ollama and deploy to production with Gemini or Claude — same code, same output structure.

Use Cases

Scanned Document Digitization

OCR-level extraction from scanned PDFs using AI vision. The model reads handwritten notes, stamp marks, and degraded print that rule-based OCR struggles with. Output is structured JSON, not raw text.

Invoice Processing

Extract line items, totals, dates, and vendor details from PDF invoices. The AI identifies table structure from visual layout, handling the wide variation in invoice formats across vendors.

Academic Paper Ingestion

Parse research papers into structured sections: abstract, methodology, results, references. The AI handles multi-column layouts, figure captions, and equation blocks that trip up rule-based PDF parsers.

Legacy Archive Migration

Convert PDF archives to structured JSON for database ingestion or knowledge base population. Process thousands of documents with batch mode, using the most cost-effective AI provider for your volume.

Compliance Document Review

Parse regulatory filings, audit reports, and policy documents from PDF into structured blocks. Feed to an LLM for automated compliance checking, clause extraction, or cross-document comparison.

Air-Gapped Environments

Process classified or restricted documents entirely on-premises using Ollama. No internet connection required after initial model download. The same Block ADT output integrates with your existing pipeline.

Try It

CLI

# Parse with Gemini (set GOOGLE_API_KEY)
GOOGLE_API_KEY="your-key" ailang run --entry main \
  --caps IO,FS,Env,AI --ai gemini-2.5-flash \
  docparse/main.ail report.pdf

# Parse with Ollama (fully local)
ailang run --entry main --caps IO,FS,Env,AI \
  --ai ollama/llama3.2-vision \
  docparse/main.ail report.pdf

API

curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
  -H "Content-Type: application/json" \
  -d '{"filepath":"sample_pdf","outputFormat":"markdown","apiKey":"YOUR_API_KEY","ai":"gemini-2.5-flash"}'

Python SDK

from ailang_parse import DocParse

client = DocParse(api_key="YOUR_API_KEY")

# Parse with AI provider
result = client.parse("report.pdf", output_format="json", ai="gemini-2.5-flash")

# Same Block ADT as Office formats
for block in result.blocks:
    if block.type == "table":
        print(f"Table: {len(block.rows)} rows")
    if block.type == "heading":
        print(f"H{block.level}: {block.text}")

Parse in Browser    API Reference

Frequently Asked Questions

How do I parse a PDF to JSON?

Send the PDF to the AILANG Parse API with an AI provider specified. The AI extracts content into the same structured JSON used for Office formats.

Which AI providers are supported for PDF parsing?

Gemini (gemini-2.5-flash, gemini-2.5-pro), Claude (claude-sonnet-4-5-20250514), OpenAI (gpt-4o), and Ollama for fully local processing. Switch with a single flag.

Can I parse PDFs locally without sending data to the cloud?

Yes. Use Ollama as the AI backend. Set --ai ollama/llama3.2-vision or any vision-capable model. No data leaves your machine.

Why does PDF parsing require AI?

PDFs encode glyph positions, not document structure. Extracting headings, tables, and reading order requires visual understanding. For structured source files, parse the DOCX directly.

Is the output format the same as Office parsing?

Yes. Same Block ADT (headings, text, tables, images) regardless of provider. Your downstream code handles all formats identically.

How much does PDF parsing cost?

AILANG Parse charges per API request. AI provider costs are separate (your own key). Gemini Flash is the most cost-effective for volume. Ollama is free (your hardware).