Step 1: Try it Step 2: Use the API Step 3: Install locally

Build With AILANG Parse

Parse 13 formats, generate 9 — one API, zero dependencies. 1,000 requests/month free.

Get Your API Key

Sign in to generate an API key and try endpoints live.

Using a CLI or AI agent? See Device Auth below

Quick Start

Parse your own file in one command:

# Upload and parse your file (all tiers — Free, Pro, Business)
curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
  -F "filepath=@your-document.docx" \
  -F "outputFormat=markdown" \
  -F "apiKey=dp_YOUR_KEY"
File upload works on all tiers (Free: up to 10 MB, Pro: 25 MB, Business: 50 MB). Use -F "filepath=@file" — the field name must be filepath. Supported formats: DOCX, PPTX, XLSX, ODT, ODP, ODS, CSV, HTML, Markdown, EPUB, EML, MBOX, PDF, PNG, JPG.

Or test with a built-in sample (no file needed):

# Parse a built-in sample — 26 samples available via GET /api/v1/samples
curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
  -H "Content-Type: application/json" \
  -d '{"sample_id":"sample_docx_basic","outputFormat":"markdown","apiKey":"dp_YOUR_KEY"}'

Need an API key? Get one in seconds via device auth — no account creation needed. See full agent guide below.

Tip: Add ?api=http://localhost:8080 to any docs page URL to rewrite all examples to point at your local or self-hosted instance.


Supported Formats

The /api/v1/parse endpoint accepts all of these formats. Office and text formats are parsed deterministically; PDFs and images use pluggable AI.

Loading formats...

Video and audio parsing is available via CLI/WASM. See Self-Host.


Pricing Tiers

Loading pricing...
BrowserFreePro — €29/moBusiness — €99/moCustom
Daily rate limitUnlimited505,00020,000Tailored
Requests / monthUnlimited1,000100,000500,000Any volume
AI parses / monthUnlimited*505002,000Any volume
Max file size20 MB10 MB25 MB50 MBTailored
AI Format File Size Limits
PDF max20 MB10 MB25 MB50 MBTailored
Image max20 MB10 MB25 MB50 MBTailored
Need higher limits? Upgrade to Pro Business Custom — let’s talk — we’re flexible on volume, on-prem, SLAs, and pricing shape.

Maximize your quota: Office formats (DOCX, PPTX, XLSX, ODF) are parsed deterministically — instant, exact, and zero AI cost. Converting a 5 MB DOCX to PDF typically inflates it to 10–30 MB due to font embedding, then requires an AI parse. Send original Office files whenever possible — save your AI quota for scanned documents and images.

Capability budgets (AI/FS limits) are enforced at the type level by AILANG — a hard guarantee, not middleware rate limiting.

Self-Host Option

Want to run AILANG Parse on your own infrastructure? We offer self-hosted deployments for teams that need full control over data residency and processing. Get in touch to discuss self-hosting options.


Data Handling

Your documents stay yours. Here's exactly what happens when you use the API:

Data typeWhat happensRetention
Small documents
< 375 KB raw
Processed entirely in-memory. File content is never written to disk or cloud storage. None — discarded immediately
Large PDFs
≥ 375 KB raw
Temporarily uploaded to regional cloud storage for efficient AI page-by-page extraction. Deleted immediately after parsing completes. Deleted on completion.
24-hour auto-delete safety net.
Parsed output Returned to you in the API response. Not stored server-side. None
Request metadata
timestamp, format, size, block count
Logged for billing and your request history dashboard. No document content is stored. 30 days (planned TTL)
API keys Only a SHA-256 hash is stored. The raw key is shown once at creation and never stored. Until you revoke
Infrastructure: All processing runs in europe-west1 (Belgium). Temporary file storage uses a co-located regional bucket with 24-hour lifecycle deletion. No document content is used for AI model training or analytics.

Recent Requests

API Explorer

Endpoint cards, schemas, and badges are generated from the capability manifest. Adding an endpoint to the API automatically adds it here.

Sign in to try POST endpoints live with your API key.
Loading endpoints from capability manifest...

Migrating from Unstructured

Already using the Unstructured API? AILANG Parse is a drop-in replacement. Change your server URL and you're done:

# Before (Unstructured)
from unstructured_client import UnstructuredClient
client = UnstructuredClient(server_url="https://api.unstructured.io")

# After (AILANG Parse) — one line change
client = UnstructuredClient(
    server_url="https://docparse.ailang.sunholo.com"
)

The /general/v0/general endpoint returns identical element JSON. All existing code works unchanged.


Connect Your AI

Three ways for AI systems to use AILANG Parse:

CLI

Direct invocation from scripts and pipelines. Best for batch processing.

ailang run --entry main \
  --caps IO,FS,Env \
  docparse/main.ail myfile.docx

REST API

Standard HTTP with named JSON parameters. API key in request body.

curl -X POST .../api/v1/parse \
  -H "Content-Type: application/json" \
  -d '{"filepath":"file.docx","outputFormat":"blocks","apiKey":"dp_..."}'

MCP Server

AILANG Parse as a tool for AI assistants (Claude, etc.) via Model Context Protocol.

{
  "name": "docparse",
  "description": "Parse documents",
  "server": "ailang serve-api ..."
}

Claude Code Skill

Install the AILANG Parse skill in Claude Code to parse documents directly from your terminal.

claude install github:sunholo-data/docparse-skill

Once installed, just ask Claude to parse a document. The skill provides scripts for parsing, cost estimation, sample listing, and headless device authentication.

# Set your API key
export DOCPARSE_API_KEY="dp_your_key_here"

# Or get one via device auth (no browser needed)
bash skills/docparse/scripts/device-auth.sh

# Parse a document
bash skills/docparse/scripts/parse.sh document.docx blocks

# Check cost before parsing
bash skills/docparse/scripts/estimate.sh report.pdf blocks

# List available test files
bash skills/docparse/scripts/samples.sh

Agent Integration Guide

Step-by-step workflow for AI agents integrating with AILANG Parse. Expand for the full guide.

API Capabilities (JSON) → Health Check (JSON)
Expand: Complete Agent Workflow (7 steps)

Step 1: Discover the API

# Health check — confirms service is up and version
curl https://docparse.ailang.sunholo.com/api/v1/health

# Full capability manifest — every endpoint, schema, auth, costs
curl https://docparse.ailang.sunholo.com/api/v1/capabilities

# Supported formats
curl https://docparse.ailang.sunholo.com/api/v1/formats

# Machine-readable tool definitions (Claude/OpenAI/MCP)
curl https://docparse.ailang.sunholo.com/api/v1/tools

Step 2: Get an API Key (Device Auth)

Agents use the RFC 8628 Device Authorization flow — no browser session needed after initial approval.

# 1. Request a device code
curl -X POST https://docparse.ailang.sunholo.com/api/v1/auth/device \
  -H "Content-Type: application/json" \
  -d '{"label":"my-agent","scope":"parse"}'
# Returns: device_code, user_code (e.g. ABCD-1234), verification_url

# 2. Display the verification_url to the user. They open it and sign in (authorization is automatic).

# 3. Poll every 5 seconds until approved:
curl -X POST https://docparse.ailang.sunholo.com/api/v1/auth/device/poll \
  -H "Content-Type: application/json" \
  -d '{"deviceCode":"DEVICE_CODE_FROM_STEP_1"}'
# Returns: {"status":"approved", "api_key":"dp_...", "tier":"free"}

Save the api_key — it won't be shown again. The key persists across sessions.

Try It: Request a Device Code

For CLI tools and AI agents. The agent requests a code, the user signs in to authorize, the agent polls for the key.

Step 3: List Test Samples

Built-in sample files let you test parsing without uploading anything.

curl https://docparse.ailang.sunholo.com/api/v1/samples
Sample ID Format AI?
sample_docx_formattingDOCX (rich styles)No
sample_docx_tablesDOCX (merged cells)No
sample_docx_commentsDOCX (comments)No
sample_docx_track_changesDOCX (insertions/deletions/moves)No
sample_docx_footnotesDOCX (footnotes)No
sample_docx_hyperlinksDOCX (hyperlinks)No
sample_docx_real_worldDOCX (complex real-world)No
sample_pptx_notesPPTX (speaker notes)No
sample_pptx_formattingPPTX (rich text)No
sample_xlsx_mergedXLSX (merged cells)No
sample_xlsx_formulasXLSX (formulas)No
sample_xlsx_formatsXLSX (number formats)No
sample_csvCSV (format matrix)No
sample_markdownMarkdown (architecture guide)No
sample_htmlHTML (getting started)No
sample_odtODT (styled tables)No
sample_odpODP (text frames)No
sample_odsODS (merged cells)No
sample_epubEPUB (Alice in Wonderland)No
sample_eml_welcomeEML (welcome email + CSV attachment)No
sample_eml_releaseEML (release notes, text + HTML)No
sample_eml_bugEML (encoded headers, CC, attachments)No
sample_mbox_threadMBOX (4-message reply thread)No
sample_pdfPDF (table report)Yes
sample_mp3Audio (transcription)Yes
sample_mp4Video (visual+audio)Yes

Use GET /api/v1/samples to download test files programmatically.

Step 4: Estimate Cost

curl -X POST https://docparse.ailang.sunholo.com/api/v1/estimate \
  -H "Content-Type: application/json" \
  -d '{"filepath":"data/test_files/sample.docx","outputFormat":"markdown"}'
# Returns: {"format":"zip-office", "strategy":"deterministic", "ai_required":false, "estimated_ms":15}

Step 5: Parse a Document

Option A: Upload your own file (all tiers)

# Upload and parse any file via multipart/form-data
# The field name MUST be "filepath"
curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
  -F "filepath=@report.docx" \
  -F "outputFormat=markdown" \
  -F "apiKey=dp_YOUR_KEY"

# Works with any supported format
curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
  -F "filepath=@data.xlsx" \
  -F "outputFormat=blocks" \
  -F "apiKey=dp_YOUR_KEY"

Option B: Parse a built-in sample (no upload needed)

# Deterministic (Office/Web/Text) — same output every time, <15ms
curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
  -H "Content-Type: application/json" \
  -d '{"sample_id":"sample_docx_basic","outputFormat":"markdown","apiKey":"dp_YOUR_KEY"}'

# AI-powered (PDF/images) — content depends on model
curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
  -H "Content-Type: application/json" \
  -d '{"sample_id":"sample_pdf","outputFormat":"markdown","apiKey":"dp_YOUR_KEY"}'

Output formats: blocks (structured JSON), markdown, html, a2ui (rich UI components).

All formats return the same block types: Text, Heading, Table, Image, Audio, Video, List, Section, Change.

File size limits: Free 10 MB, Pro 25 MB, Business 50 MB. Files over 32 MB use GCS upload (Business tier).

Step 6: Handle Errors

All errors include a suggested_fix field — a plain-text instruction you can act on directly.

Code Retryable When
INVALID_API_KEYNoKey missing, malformed, or revoked
QUOTA_EXCEEDEDYes (after reset)Daily or monthly request limit hit
AI_QUOTA_EXCEEDEDYes (after reset)Monthly AI parse limit hit
FILE_TOO_LARGENoFile exceeds format-specific size limit
INPUT_NOT_FOUNDNoFile path doesn't exist
UNSUPPORTED_FORMATNoFile extension not recognized
PARSE_FAILEDMaybeCorrupt or empty file
AI_UNAVAILABLEYesAI backend down
AUTHORIZATION_PENDINGYes (poll)Device flow: user hasn't approved
DEVICE_CODE_EXPIREDNoDevice code timed out

Step 7: Recommended Workflow

1. GET  /api/v1/capabilities     → learn the full API contract
2. POST /api/v1/auth/device      → acquire credentials (one-time)
3. GET  /api/v1/samples          → find test files to verify integration
4. POST /api/v1/estimate         → check cost before committing
5. POST /api/v1/parse            → parse the document
6. Verify response has request_id in meta (confirms v0.9.0 API)
7. Monitor meta.quota_remaining between requests

Format Categories

Format Type Speed
Office (DOCX, PPTX, XLSX, OD*)Deterministic<15ms
Text (HTML, MD, CSV, EPUB)Deterministic<15ms
PDFAI~2s
Image (PNG, JPG, etc.)AI~1.5s

Deterministic formats have zero AI cost and are unlimited within your request quota. AI formats count against your monthly AI parse limit with per-format file size caps. Free tier: 1,000 requests/month + 50 AI parses. GET /api/v1/pricing for full tier details.

Runnable Examples

# Quickstart: auth + first parse in one command
bash examples/quickstart.sh

# Full 7-step walkthrough
bash examples/agent_workflow.sh

# With existing key (skip auth)
DOCPARSE_API_KEY=dp_... bash examples/agent_workflow.sh

Client Libraries

Thin HTTP wrappers for the AILANG Parse API. Compatible with unstructured-client arguments.

Python PyPI
pip install ailang-parse
Parse files, manage keys, Unstructured-compatible. Typed dataclasses.
JavaScript / TypeScript npm
npm install @ailang/parse
Browser + Node.js. Native fetch, zero dependencies, full TypeScript types.
Go Go
go get github.com/sunholo-data/ailang-parse-go
Context support, stdlib HTTP only. Typed structs for all 9 block variants.
Python JavaScript Go
from ailang_parse import DocParse

client = DocParse(api_key="dp_your_key_here")

result = client.parse("report.docx")
for block in result.blocks:
    if block.type == "heading":
        print(f"H{block.level}: {block.text}")
    elif block.type == "table":
        print(f"Table: {len(block.rows)} rows")
    elif block.type == "change":
        print(f"{block.change_type} by {block.author}")

# Unstructured migration — one import change:
from ailang_parse import UnstructuredClient
client = UnstructuredClient(server_url="https://docparse.ailang.sunholo.com")
elements = client.general.partition(file="report.docx")
import { DocParse } from '@ailang/parse';

const client = new DocParse({ apiKey: 'dp_your_key_here' });

const result = await client.parse('report.docx');
for (const block of result.blocks) {
  if (block.type === 'heading') console.log(`H${block.level}: ${block.text}`);
  if (block.type === 'table') console.log(`Table: ${block.rows?.length} rows`);
  if (block.type === 'change') console.log(`${block.changeType} by ${block.author}`);
}

// Unstructured migration — one import change:
import { UnstructuredClient } from '@ailang/parse';
const uc = new UnstructuredClient({ serverUrl: 'https://docparse.ailang.sunholo.com' });
const elements = await uc.general.partition({ file: 'report.docx' });
client := docparse.New("dp_your_key_here")

result, err := client.Parse(ctx, "report.docx")
for _, block := range result.Blocks {
    switch block.Type {
    case "heading":
        fmt.Printf("H%d: %s\n", block.Level, block.Text)
    case "table":
        fmt.Printf("Table: %d rows\n", len(block.Rows))
    case "change":
        fmt.Printf("%s by %s\n", block.ChangeType, block.Author)
    }
}

// Unstructured migration:
uc := docparse.NewUnstructuredClient("https://docparse.ailang.sunholo.com")
elements, _ := uc.Partition(ctx, "report.docx")

Frequently Asked Questions

Is AILANG Parse a drop-in replacement for Unstructured?

Yes, for Office formats. AILANG Parse implements a compatible REST API at /general/v0/general that accepts the same multipart file upload format as Unstructured. Swap the endpoint URL and get higher-fidelity output with preserved track changes, merged cells, and comments. See the integrations page for migration guides.

What output formats does the AILANG Parse API support?

The API supports structured JSON (Block ADT), Markdown, HTML, plain text, Quarto (QMD), A2UI, and the Unstructured-compatible element format. Specify the output format with the output_format parameter. All formats preserve the same structural information.

Does the AILANG Parse API require an API key?

Yes, all parse requests require an API key. The Free tier provides 1,000 requests/month including 50 AI parses. Generate a key via the dashboard or the device auth flow (/api/v1/auth/device). Pro and Business tiers increase limits to 100,000 and 500,000 requests/month respectively.

Can I run AILANG Parse locally instead of using the cloud API?

Yes. Install the CLI for unlimited local parsing, or clone the repo and use the included Dockerfile for containerised batch processing. Office parsing has zero external dependencies. See the self-host guide for details.