Back to Case Studies
AILANG Parse AILANG Parse

Stop parsing photos of your documents.

AILANG Parse is AILANG's first SaaS -- a production showcase of what the language can ship. It reads Office XML directly (DOCX, PPTX, XLSX), preserving track changes, merged cells, and comments that PDF conversion destroys. Free in-browser demo, no account required.

SaaS Document AI Open Source Built with AILANG
15 Formats DOCX, PPTX, XLSX, PDF, HTML, CSV, MD, ODT, ODS, ODP, EML, EPUB, TEX, images
93.9% OfficeDocBench accuracy (nearest competitor: 68.0%)
Free in-browser WebAssembly -- files never leave your browser
No AI required Deterministic XML parsing, not LLM guesswork

The Problem with Document Parsing Today

Most tools convert your DOCX to a photo first.

Your DOCX, PPTX, and XLSX files are already structured XML. Every heading, cell, comment, and track change is explicitly labeled in the file. But the dominant approach in 2026 is still to convert Office documents to PDF, render them as images, then run OCR and vision models to guess the structure back out.

That approach:

  • Destroys structure that was already there -- merged cells flatten, track changes disappear, comments vanish
  • Bills per page at $0.001-$0.01, making high-volume pipelines expensive
  • Relies on AI inference for what should be deterministic parsing
  • Hallucinates content where OCR confidence is low

Read the XML directly.

AILANG Parse extracts the structure that's already in your files, deterministically. AI is opt-in, used only for genuinely unstructured sources (scanned PDFs, images).

Try it now

Built on AILANG

AILANG Parse is the first production SaaS built on AILANG -- and proof that the language is ready for real workloads.

Typed Block ADT

Every parsed document returns a typed algebraic data structure: headings, paragraphs, tables, lists, comments, track changes. Downstream code gets exhaustive pattern matching, not string-matching on loosely-typed JSON.

Eval-Driven Development

AILANG Parse is measured against OfficeDocBench on every change. 93.9% coverage isn't a snapshot -- it's a continuous benchmark that prevents regressions as formats are added.

Deterministic Parsing

Same input, same output, every time. No model calls in the hot path for Office formats -- just XML traversal. That makes it replayable, debuggable, and cheap to run at scale.

What You Get That PDF-Conversion Loses

Six structural features that are impossible once you've flattened a document to images. AILANG Parse keeps them because it never flattens in the first place.

Track Changes

Insert, delete, and move blocks with author attribution and timestamps. Structured ChangeBlock data, not rendered accept/reject text.

Merged Cells

Table headers, rows, and cells with colspan/rowspan merge info preserved -- not atomized into individual cells.

Headers & Footers

Extracted as semantic SectionBlocks with typed sub-blocks. Not flattened into body text.

Text Boxes & Shapes

Content extracted from DrawingML and VML shapes, including legacy VML images that other parsers silently drop.

Embedded Images

Images detected with base64 data and MIME type. Optional AI descriptions via Gemini, Claude, or Ollama multimodal.

Guaranteed Correctness

No silent data loss. 63 structural contracts guarantee every block, merged cell, and track change from the source appears in the output.

Two Paths to Parsed Documents

Converting DOCX to PDF is like photographing a spreadsheet. Track changes, merged cells, comments -- all gone. Then ML spends seconds reconstructing what the XML already had.

Everyone else
DOCX structured XML
PDF
track changesmerged cellscomments
ML rebuild 2-5 sec
Flat text
53-73% OfficeDocBench
AILANG Parse
DOCX structured XML
Direct XML instant
Block ADT
track changesmerged cellscomments
Your format
JSONMarkdownHTMLQuarto+5
93.9% OfficeDocBench -- eval-driven
Benchmark: 93.9% on OfficeDocBench across 69 test files and 11 formats. Nearest competitor 68.0% adjusted (Kreuzberg). AILANG Parse is the only parser with 100% format coverage.
See Full Benchmarks Try It Live

Why This Matters for Sunholo Clients

Even if you never use AILANG Parse directly, it changes what we can build for you.

Proof that AILANG ships. AILANG Parse is the first production SaaS built on AILANG. It demonstrates that the language, its effect system, and its eval-driven development workflow are ready for real customer workloads -- not just research demos.

For document-heavy workflows

If your pipeline processes contracts, reports, or spreadsheets, AILANG Parse is available as a standalone SaaS, as a managed API, or embedded directly into your Multivac deployment.

  • Drop-in replacement for PDF-conversion parsers
  • Preserves structure that feeds cleaner RAG
  • Deterministic outputs -- same doc, same parse

For everyone else

AILANG Parse is a reference point. The same approach -- typed ADTs, eval-driven development, effect boundaries -- is how we build every AI system for Sunholo clients.

  • Benchmarks that catch regressions before production
  • Deterministic cores with AI at the edges
  • Systems that survive model upgrades

Drop a Document. See the Structure.

The full product -- including live demo, pricing, API docs, benchmarks, and per-format guides -- lives at sunholo.com/ailang-parse. Your files never leave the browser.