Get Your API Key
Sign in to generate an API key and try endpoints live.
Quick Start
Parse your own file in one command:
# Upload and parse your file (all tiers — Free, Pro, Business)
curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
-F "filepath=@your-document.docx" \
-F "outputFormat=markdown" \
-F "apiKey=dp_YOUR_KEY"
-F "filepath=@file" — the field name must be filepath.
Supported formats: DOCX, PPTX, XLSX, ODT, ODP, ODS, CSV, HTML, Markdown, EPUB, EML, MBOX, PDF, PNG, JPG.
Or test with a built-in sample (no file needed):
# Parse a built-in sample — 26 samples available via GET /api/v1/samples
curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
-H "Content-Type: application/json" \
-d '{"sample_id":"sample_docx_basic","outputFormat":"markdown","apiKey":"dp_YOUR_KEY"}'
Need an API key? Get one in seconds via device auth — no account creation needed. See full agent guide below.
Tip: Add ?api=http://localhost:8080 to any docs page URL to rewrite all examples to point at your local or self-hosted instance.
Supported Formats
The /api/v1/parse endpoint accepts all of these formats. Office and text formats are parsed deterministically; PDFs and images use pluggable AI.
Video and audio parsing is available via CLI/WASM. See Self-Host.
Pricing Tiers
| Browser | Free | Pro — €29/mo | Business — €99/mo | Custom | |
|---|---|---|---|---|---|
| Daily rate limit | Unlimited | 50 | 5,000 | 20,000 | Tailored |
| Requests / month | Unlimited | 1,000 | 100,000 | 500,000 | Any volume |
| AI parses / month | Unlimited* | 50 | 500 | 2,000 | Any volume |
| Max file size | 20 MB | 10 MB | 25 MB | 50 MB | Tailored |
| AI Format File Size Limits | |||||
| PDF max | 20 MB | 10 MB | 25 MB | 50 MB | Tailored |
| Image max | 20 MB | 10 MB | 25 MB | 50 MB | Tailored |
Maximize your quota: Office formats (DOCX, PPTX, XLSX, ODF) are parsed deterministically — instant, exact, and zero AI cost. Converting a 5 MB DOCX to PDF typically inflates it to 10–30 MB due to font embedding, then requires an AI parse. Send original Office files whenever possible — save your AI quota for scanned documents and images.
Capability budgets (AI/FS limits) are enforced at the type level by AILANG — a hard guarantee, not middleware rate limiting.
Self-Host Option
Want to run AILANG Parse on your own infrastructure? We offer self-hosted deployments for teams that need full control over data residency and processing. Get in touch to discuss self-hosting options.
Data Handling
Your documents stay yours. Here's exactly what happens when you use the API:
| Data type | What happens | Retention |
|---|---|---|
| Small documents < 375 KB raw |
Processed entirely in-memory. File content is never written to disk or cloud storage. | None — discarded immediately |
| Large PDFs ≥ 375 KB raw |
Temporarily uploaded to regional cloud storage for efficient AI page-by-page extraction. Deleted immediately after parsing completes. | Deleted on completion. 24-hour auto-delete safety net. |
| Parsed output | Returned to you in the API response. Not stored server-side. | None |
| Request metadata timestamp, format, size, block count |
Logged for billing and your request history dashboard. No document content is stored. | 30 days (planned TTL) |
| API keys | Only a SHA-256 hash is stored. The raw key is shown once at creation and never stored. | Until you revoke |
europe-west1 (Belgium). Temporary file storage uses a co-located regional bucket with 24-hour lifecycle deletion. No document content is used for AI model training or analytics.
Recent Requests
›API Explorer
Endpoint cards, schemas, and badges are generated from the capability manifest. Adding an endpoint to the API automatically adds it here.
Migrating from Unstructured
Already using the Unstructured API? AILANG Parse is a drop-in replacement. Change your server URL and you're done:
# Before (Unstructured)
from unstructured_client import UnstructuredClient
client = UnstructuredClient(server_url="https://api.unstructured.io")
# After (AILANG Parse) — one line change
client = UnstructuredClient(
server_url="https://docparse.ailang.sunholo.com"
)
The /general/v0/general endpoint returns identical element JSON. All existing code works unchanged.
Connect Your AI
Three ways for AI systems to use AILANG Parse:
CLI
Direct invocation from scripts and pipelines. Best for batch processing.
ailang run --entry main \
--caps IO,FS,Env \
docparse/main.ail myfile.docx
REST API
Standard HTTP with named JSON parameters. API key in request body.
curl -X POST .../api/v1/parse \
-H "Content-Type: application/json" \
-d '{"filepath":"file.docx","outputFormat":"blocks","apiKey":"dp_..."}'
MCP Server
AILANG Parse as a tool for AI assistants (Claude, etc.) via Model Context Protocol.
{
"name": "docparse",
"description": "Parse documents",
"server": "ailang serve-api ..."
}
Claude Code Skill
Install the AILANG Parse skill in Claude Code to parse documents directly from your terminal.
claude install github:sunholo-data/docparse-skill
Once installed, just ask Claude to parse a document. The skill provides scripts for parsing, cost estimation, sample listing, and headless device authentication.
# Set your API key
export DOCPARSE_API_KEY="dp_your_key_here"
# Or get one via device auth (no browser needed)
bash skills/docparse/scripts/device-auth.sh
# Parse a document
bash skills/docparse/scripts/parse.sh document.docx blocks
# Check cost before parsing
bash skills/docparse/scripts/estimate.sh report.pdf blocks
# List available test files
bash skills/docparse/scripts/samples.sh
Agent Integration Guide
Step-by-step workflow for AI agents integrating with AILANG Parse. Expand for the full guide.
Expand: Complete Agent Workflow (7 steps)
Step 1: Discover the API
# Health check — confirms service is up and version
curl https://docparse.ailang.sunholo.com/api/v1/health
# Full capability manifest — every endpoint, schema, auth, costs
curl https://docparse.ailang.sunholo.com/api/v1/capabilities
# Supported formats
curl https://docparse.ailang.sunholo.com/api/v1/formats
# Machine-readable tool definitions (Claude/OpenAI/MCP)
curl https://docparse.ailang.sunholo.com/api/v1/tools
Step 2: Get an API Key (Device Auth)
Agents use the RFC 8628 Device Authorization flow — no browser session needed after initial approval.
# 1. Request a device code
curl -X POST https://docparse.ailang.sunholo.com/api/v1/auth/device \
-H "Content-Type: application/json" \
-d '{"label":"my-agent","scope":"parse"}'
# Returns: device_code, user_code (e.g. ABCD-1234), verification_url
# 2. Display the verification_url to the user. They open it and sign in (authorization is automatic).
# 3. Poll every 5 seconds until approved:
curl -X POST https://docparse.ailang.sunholo.com/api/v1/auth/device/poll \
-H "Content-Type: application/json" \
-d '{"deviceCode":"DEVICE_CODE_FROM_STEP_1"}'
# Returns: {"status":"approved", "api_key":"dp_...", "tier":"free"}
Save the api_key — it won't be shown again. The key persists across sessions.
Try It: Request a Device Code
For CLI tools and AI agents. The agent requests a code, the user signs in to authorize, the agent polls for the key.
Step 3: List Test Samples
Built-in sample files let you test parsing without uploading anything.
curl https://docparse.ailang.sunholo.com/api/v1/samples
| Sample ID | Format | AI? |
|---|---|---|
| sample_docx_formatting | DOCX (rich styles) | No |
| sample_docx_tables | DOCX (merged cells) | No |
| sample_docx_comments | DOCX (comments) | No |
| sample_docx_track_changes | DOCX (insertions/deletions/moves) | No |
| sample_docx_footnotes | DOCX (footnotes) | No |
| sample_docx_hyperlinks | DOCX (hyperlinks) | No |
| sample_docx_real_world | DOCX (complex real-world) | No |
| sample_pptx_notes | PPTX (speaker notes) | No |
| sample_pptx_formatting | PPTX (rich text) | No |
| sample_xlsx_merged | XLSX (merged cells) | No |
| sample_xlsx_formulas | XLSX (formulas) | No |
| sample_xlsx_formats | XLSX (number formats) | No |
| sample_csv | CSV (format matrix) | No |
| sample_markdown | Markdown (architecture guide) | No |
| sample_html | HTML (getting started) | No |
| sample_odt | ODT (styled tables) | No |
| sample_odp | ODP (text frames) | No |
| sample_ods | ODS (merged cells) | No |
| sample_epub | EPUB (Alice in Wonderland) | No |
| sample_eml_welcome | EML (welcome email + CSV attachment) | No |
| sample_eml_release | EML (release notes, text + HTML) | No |
| sample_eml_bug | EML (encoded headers, CC, attachments) | No |
| sample_mbox_thread | MBOX (4-message reply thread) | No |
| sample_pdf | PDF (table report) | Yes |
| sample_mp3 | Audio (transcription) | Yes |
| sample_mp4 | Video (visual+audio) | Yes |
Use GET /api/v1/samples to download test files programmatically.
Step 4: Estimate Cost
curl -X POST https://docparse.ailang.sunholo.com/api/v1/estimate \
-H "Content-Type: application/json" \
-d '{"filepath":"data/test_files/sample.docx","outputFormat":"markdown"}'
# Returns: {"format":"zip-office", "strategy":"deterministic", "ai_required":false, "estimated_ms":15}
Step 5: Parse a Document
Option A: Upload your own file (all tiers)
# Upload and parse any file via multipart/form-data
# The field name MUST be "filepath"
curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
-F "filepath=@report.docx" \
-F "outputFormat=markdown" \
-F "apiKey=dp_YOUR_KEY"
# Works with any supported format
curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
-F "filepath=@data.xlsx" \
-F "outputFormat=blocks" \
-F "apiKey=dp_YOUR_KEY"
Option B: Parse a built-in sample (no upload needed)
# Deterministic (Office/Web/Text) — same output every time, <15ms
curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
-H "Content-Type: application/json" \
-d '{"sample_id":"sample_docx_basic","outputFormat":"markdown","apiKey":"dp_YOUR_KEY"}'
# AI-powered (PDF/images) — content depends on model
curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
-H "Content-Type: application/json" \
-d '{"sample_id":"sample_pdf","outputFormat":"markdown","apiKey":"dp_YOUR_KEY"}'
Output formats: blocks (structured JSON), markdown, html, a2ui (rich UI components).
All formats return the same block types: Text, Heading, Table, Image, Audio, Video, List, Section, Change.
File size limits: Free 10 MB, Pro 25 MB, Business 50 MB. Files over 32 MB use GCS upload (Business tier).
Step 6: Handle Errors
All errors include a suggested_fix field — a plain-text instruction you can act on directly.
| Code | Retryable | When |
|---|---|---|
| INVALID_API_KEY | No | Key missing, malformed, or revoked |
| QUOTA_EXCEEDED | Yes (after reset) | Daily or monthly request limit hit |
| AI_QUOTA_EXCEEDED | Yes (after reset) | Monthly AI parse limit hit |
| FILE_TOO_LARGE | No | File exceeds format-specific size limit |
| INPUT_NOT_FOUND | No | File path doesn't exist |
| UNSUPPORTED_FORMAT | No | File extension not recognized |
| PARSE_FAILED | Maybe | Corrupt or empty file |
| AI_UNAVAILABLE | Yes | AI backend down |
| AUTHORIZATION_PENDING | Yes (poll) | Device flow: user hasn't approved |
| DEVICE_CODE_EXPIRED | No | Device code timed out |
Step 7: Recommended Workflow
1. GET /api/v1/capabilities → learn the full API contract
2. POST /api/v1/auth/device → acquire credentials (one-time)
3. GET /api/v1/samples → find test files to verify integration
4. POST /api/v1/estimate → check cost before committing
5. POST /api/v1/parse → parse the document
6. Verify response has request_id in meta (confirms v0.9.0 API)
7. Monitor meta.quota_remaining between requests
Format Categories
| Format | Type | Speed |
|---|---|---|
| Office (DOCX, PPTX, XLSX, OD*) | Deterministic | <15ms |
| Text (HTML, MD, CSV, EPUB) | Deterministic | <15ms |
| AI | ~2s | |
| Image (PNG, JPG, etc.) | AI | ~1.5s |
Deterministic formats have zero AI cost and are unlimited within your request quota. AI formats count against your monthly AI parse limit with per-format file size caps. Free tier: 1,000 requests/month + 50 AI parses. GET /api/v1/pricing for full tier details.
Runnable Examples
# Quickstart: auth + first parse in one command
bash examples/quickstart.sh
# Full 7-step walkthrough
bash examples/agent_workflow.sh
# With existing key (skip auth)
DOCPARSE_API_KEY=dp_... bash examples/agent_workflow.sh
Client Libraries
Thin HTTP wrappers for the AILANG Parse API. Compatible with unstructured-client arguments.
pip install ailang-parse
npm install @ailang/parse
from ailang_parse import DocParse
client = DocParse(api_key="dp_your_key_here")
result = client.parse("report.docx")
for block in result.blocks:
if block.type == "heading":
print(f"H{block.level}: {block.text}")
elif block.type == "table":
print(f"Table: {len(block.rows)} rows")
elif block.type == "change":
print(f"{block.change_type} by {block.author}")
# Unstructured migration — one import change:
from ailang_parse import UnstructuredClient
client = UnstructuredClient(server_url="https://docparse.ailang.sunholo.com")
elements = client.general.partition(file="report.docx")
import { DocParse } from '@ailang/parse';
const client = new DocParse({ apiKey: 'dp_your_key_here' });
const result = await client.parse('report.docx');
for (const block of result.blocks) {
if (block.type === 'heading') console.log(`H${block.level}: ${block.text}`);
if (block.type === 'table') console.log(`Table: ${block.rows?.length} rows`);
if (block.type === 'change') console.log(`${block.changeType} by ${block.author}`);
}
// Unstructured migration — one import change:
import { UnstructuredClient } from '@ailang/parse';
const uc = new UnstructuredClient({ serverUrl: 'https://docparse.ailang.sunholo.com' });
const elements = await uc.general.partition({ file: 'report.docx' });
client := docparse.New("dp_your_key_here")
result, err := client.Parse(ctx, "report.docx")
for _, block := range result.Blocks {
switch block.Type {
case "heading":
fmt.Printf("H%d: %s\n", block.Level, block.Text)
case "table":
fmt.Printf("Table: %d rows\n", len(block.Rows))
case "change":
fmt.Printf("%s by %s\n", block.ChangeType, block.Author)
}
}
// Unstructured migration:
uc := docparse.NewUnstructuredClient("https://docparse.ailang.sunholo.com")
elements, _ := uc.Partition(ctx, "report.docx")
Frequently Asked Questions
Is AILANG Parse a drop-in replacement for Unstructured?
Yes, for Office formats. AILANG Parse implements a compatible REST API at /general/v0/general that accepts the same multipart file upload format as Unstructured. Swap the endpoint URL and get higher-fidelity output with preserved track changes, merged cells, and comments. See the integrations page for migration guides.
What output formats does the AILANG Parse API support?
The API supports structured JSON (Block ADT), Markdown, HTML, plain text, Quarto (QMD), A2UI, and the Unstructured-compatible element format. Specify the output format with the output_format parameter. All formats preserve the same structural information.
Does the AILANG Parse API require an API key?
Yes, all parse requests require an API key. The Free tier provides 1,000 requests/month including 50 AI parses. Generate a key via the dashboard or the device auth flow (/api/v1/auth/device). Pro and Business tiers increase limits to 100,000 and 500,000 requests/month respectively.
Can I run AILANG Parse locally instead of using the cloud API?
Yes. Install the CLI for unlimited local parsing, or clone the repo and use the included Dockerfile for containerised batch processing. Office parsing has zero external dependencies. See the self-host guide for details.