Integrations

Enhance your existing pipeline — not replace it.

Unstructured Compatibility

Use AILANG Parse for Office formats alongside your existing PDF pipeline. Route Office formats through AILANG Parse for structural fidelity. Route PDFs through your preferred tools. Merge the results.

AILANG Parse exposes a drop-in compatible endpoint at POST /general/v0/general that accepts the same multipart file upload format as the Unstructured API. Existing Unstructured clients can point at AILANG Parse with just a URL change.

curl

# Same request format as Unstructured API
curl -X POST https://docparse.ailang.sunholo.com/general/v0/general \
  -F "files=@report.docx" \
  -H "Accept: application/json"

Python — before and after

# Before: Unstructured
from unstructured.partition.auto import partition
elements = partition(filename="report.docx")

# After: AILANG Parse (same API surface)
import requests
resp = requests.post(
    "https://docparse.ailang.sunholo.com/general/v0/general",
    files={"files": open("report.docx", "rb")}
)
elements = resp.json()

The response format follows the Unstructured element schema — each element has a type, text, and metadata field. For Office formats (DOCX, PPTX, XLSX), AILANG Parse extracts structural features that Unstructured misses: track changes, merged cells, text boxes, comments, and headers/footers.

With Docling

Docling excels at PDF layout analysis with its DocLayNet model. AILANG Parse excels at deterministic Office parsing. Use both — route by file type:

  • DOCX, PPTX, XLSX, ODT, ODP, ODS, CSV, HTML, Markdown, EPUB → AILANG Parse (free, fast, deterministic)
  • PDF, scanned images → Docling (layout analysis, table detection)

Python routing example

import requests
from pathlib import Path

AILANG_PARSE_URL = "https://docparse.ailang.sunholo.com"
OFFICE_EXTS = {".docx", ".pptx", ".xlsx", ".odt", ".odp", ".ods",
               ".csv", ".html", ".md", ".epub"}

def parse_document(filepath: str) -> dict:
    ext = Path(filepath).suffix.lower()

    if ext in OFFICE_EXTS:
        # AILANG Parse: deterministic, no per-page billing, full structural fidelity
        resp = requests.post(
            f"{AILANG_PARSE_URL}/api/v1/parse",
            files={"file": open(filepath, "rb")}
        )
        return resp.json()

    elif ext in {".pdf", ".png", ".jpg", ".jpeg", ".tiff"}:
        # Docling: layout-aware PDF and image parsing
        from docling.document_converter import DocumentConverter
        converter = DocumentConverter()
        result = converter.convert(filepath)
        return result.document.export_to_dict()

    else:
        raise ValueError(f"Unsupported format: {ext}")

This gives you structural Office parsing from AILANG Parse and layout-aware PDF parsing from Docling.

With LlamaParse

LlamaParse is a cloud LLM-powered parser — strong on PDFs, but every file costs API credits. AILANG Parse handles Office formats locally for free, saving your LlamaParse budget for PDFs that actually need LLM analysis.

  • Office formats → AILANG Parse (free, local, deterministic, no per-page bill)
  • PDFs → LlamaParse (cloud, LLM-powered, paid per page)

Python routing example

import requests
from pathlib import Path

AILANG_PARSE_URL = "https://docparse.ailang.sunholo.com"
OFFICE_EXTS = {".docx", ".pptx", ".xlsx", ".odt", ".odp", ".ods",
               ".csv", ".html", ".md", ".epub"}

def parse_document(filepath: str) -> dict:
    ext = Path(filepath).suffix.lower()

    if ext in OFFICE_EXTS:
        # AILANG Parse: free, local, deterministic
        resp = requests.post(
            f"{AILANG_PARSE_URL}/api/v1/parse",
            files={"file": open(filepath, "rb")}
        )
        return resp.json()

    elif ext == ".pdf":
        # LlamaParse: cloud LLM-powered PDF parsing
        from llama_parse import LlamaParse
        parser = LlamaParse(result_type="markdown")
        documents = parser.load_data(filepath)
        return {"text": documents[0].text, "metadata": documents[0].metadata}

    else:
        raise ValueError(f"Unsupported format: {ext}")

This routing pattern keeps your existing PDF pipeline intact while adding structural Office parsing at zero marginal cost.

Claude Code Skill

AILANG Parse is available as a Claude Code skill for parsing documents directly within your coding workflow. Install it with a single command:

claude install github:sunholo-data/ailang-parse-skill

Once installed, Claude Code can parse Office documents, extract tables, and convert between formats as part of your development session.

See the Claude Code Skill guide for the full setup walkthrough, usage examples, and supported commands.

REST API

AILANG Parse exposes a full REST API for programmatic access. The API is agent-discoverable — every endpoint is described in the capability manifest with typed errors, cost metadata, and replayability.

Key Endpoints

EndpointMethodDescription
/api/v1/parsePOSTParse a document (file upload)
/api/v1/capabilitiesGETCapability manifest — formats, limits, costs
/api/v1/formatsGETSupported parse and generate formats
/api/v1/healthGETHealth check with version info
/api/v1/estimatePOSTCost and latency estimation before parsing
/general/v0/generalPOSTUnstructured API compatibility endpoint
# Health check
curl https://docparse.ailang.sunholo.com/api/v1/health

# List supported formats
curl https://docparse.ailang.sunholo.com/api/v1/formats

# Parse a document
curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
  -F "file=@report.docx"

# Capability manifest (agent discovery)
curl https://docparse.ailang.sunholo.com/api/v1/capabilities

See the API Reference for the full endpoint documentation, authentication, and OpenAPI spec.

Self-Hosting

Prefer containers? Build and run AILANG Parse locally with Docker for batch CLI processing:

# Build the image
git clone https://github.com/sunholo-data/ailang-parse.git
cd ailang-parse
docker build -t docparse .

# Parse a file (mount your documents into /data)
docker run -v $(pwd):/data docparse /data/report.docx

# Parse with AI (pass your API key)
docker run -e GOOGLE_API_KEY="your-key" \
  -v $(pwd):/data docparse --ai gemini-2.5-flash /data/document.pdf

The image builds AILANG from source and uses a CLI entrypoint for direct parsing. For REST API access, use the hosted API with Python, JS, or Go SDKs. See the Run Locally guide for full installation options.

Frequently Asked Questions

How do I improve DOCX parsing in my Unstructured pipeline?

Route Office formats through AILANG Parse while keeping Unstructured for PDFs. AILANG Parse implements a compatible API, so you can swap the endpoint for Office files with a one-line change. Your pipeline gets full structural fidelity for DOCX/PPTX/XLSX while keeping Unstructured's ML-powered PDF parsing for scanned documents.

Is AILANG Parse an alternative to Unstructured?

For Office formats, yes — AILANG Parse provides dramatically higher structural fidelity (93.9% vs 62.1% on OfficeDocBench). For PDFs and scanned documents, Unstructured's ML pipeline is still excellent. The recommended architecture is to use both: AILANG Parse for Office formats, Unstructured for PDFs.

Can I use AILANG Parse instead of LlamaParse for Office documents?

Yes. Cloud-based parsers upload Office files to an API and bill per page. AILANG Parse parses Office files locally with no AI calls and no per-page billing. Use AILANG Parse for Office formats, cloud parsers for PDFs.

Does AILANG Parse work with LangChain and LlamaIndex?

Yes. The REST API and MCP server integrate with any LangChain or LlamaIndex pipeline. You can use AILANG Parse as a custom document loader or register it as a tool in your agent chain. The /api/v1/tools endpoint serves tool definitions in Claude, OpenAI, and MCP formats for automatic agent discovery.