Batch Mode: Compile Once, Parse Many
AILANG Parse compiles 38 modules before parsing starts. In batch mode, this happens once — then every file is parsed with zero recompilation. This is the single most important document parsing performance optimisation.
Batch (multiple files or folder)
# Pass multiple files
docparse *.docx *.pptx *.xlsx
# Or an entire folder
docparse ~/Documents/
# Both auto-enable batch mode
58 files in 2.5s (44ms/file)
Loop (recompiles each time)
# Don't do this — each invocation
# recompiles all 38 modules
for f in *.docx; do
docparse "$f"
done
58 files in ~25s (~430ms/file)
The docparse CLI detects multiple arguments automatically and passes --batch to AILANG. No extra flags needed.
Tracing Control
AILANG's auto-trace collector creates ~2.7M objects per run when OTEL_EXPORTER_OTLP_ENDPOINT or GOOGLE_CLOUD_PROJECT is set. The docparse CLI disables this by default for a 2–5x speedup.
| File | With Trace | No Trace | Speedup |
|---|---|---|---|
| Alice EPUB (185KB) | 2.59s | 1.26s | 2.1x |
| Moby Dick EPUB (797KB) | 9.79s | 2.82s | 3.5x |
| 10MB DOCX | 1.95s | 0.40s | 4.9x |
If you call ailang run directly (bypassing the CLI wrapper), set the environment variable yourself:
# Direct ailang invocation with tracing disabled
AILANG_NO_TRACE=1 ailang run --batch --entry main --caps IO,FS,Env \
--max-recursion-depth 50000 docparse/main.ail file1.docx file2.pptx
# Re-enable tracing for debugging
AILANG_NO_TRACE=0 docparse report.docx
Folder Benchmark
58 files across 11 formats (DOCX, PPTX, XLSX, ODT, ODP, ODS, EPUB, HTML, Markdown, CSV, TSV), including a 10MB DOCX and 11MB PPTX. Same files, same machine, measured wall-clock time.
| Tool | Files Parsed | Total Time | Per File | Quality |
|---|---|---|---|---|
| Kreuzberg v4.7.2 | 55/58 | 318ms | 6ms | 71.3% |
| MarkItDown v0.1.5 | 46/58 | 905ms | 20ms | 67.9% |
| AILANG Parse v0.3.0 | 58/58 | 2.54s | 44ms | 92.2% |
| Unstructured v0.22.16 | 39/58 | 3.46s | 89ms | 62.1% |
| Pandoc v3.9.0 | 51/58 | 3.64s | 71ms | 74.6% |
| Docling v2.84.0 | 42/58 | 8.64s | 206ms | 64.0% |
CLI vs API Server
CLI (docparse)
Best for: batch processing folders, CI/CD pipelines, one-off conversions.
- Batch mode amortises compilation
- Zero network overhead
- Processes local files directly
API Server
Best for: web apps, real-time parsing, multi-user access.
- AILANG modules stay compiled in memory
- No startup cost per request
- Sub-100ms response for most files
For sustained throughput (e.g., processing uploads), the API server avoids compilation overhead entirely. For batch jobs against local files, the CLI with folder input is optimal.
Single-File Overhead
Parsing a single file via docparse report.docx takes ~400–500ms regardless of file size. This is AILANG runtime startup (module compilation), not parse time. The actual parse of a 5KB DOCX is <1ms.
| File | Single File | In Batch | Startup Overhead |
|---|---|---|---|
| sample.docx (5KB) | 411ms | ~2ms | ~409ms |
| tables.docx (31KB) | 466ms | ~5ms | ~461ms |
| 10MB DOCX | 472ms | ~70ms | ~402ms |
| Moby Dick EPUB (797KB) | 2.78s | ~2.3s | ~480ms |
AILANG incremental compilation caching (M-INCREMENTAL-TYPECHECK) will reduce startup further in a future release.
Quick Performance Tips
- Always batch. Pass folders or globs:
docparse ~/inbox/ordocparse *.docx *.xlsx - Use the CLI wrapper.
docparsesetsAILANG_NO_TRACE=1and auto-enables batch mode. - Use the API for real-time. Compilation stays in memory — every request is pure parse time.
- Skip AI for Office files. DOCX/PPTX/XLSX/ODF parsing is deterministic and needs no AI model. Images still require
--ai; PDFs default to--aibut can opt into local deterministic backends via--pdf-backend docling(~5× faster) or--pdf-backend liteparse(~40× faster) — see PDF parsing. - Profile if needed. Pass
-cpuprofile profile.outor-memprofile mem.outtoailang runfor Go pprof analysis.
Frequently Asked Questions
How fast is AILANG Parse compared to other document parsers?
58 mixed Office files (22.9MB, 11 formats) in 2.5 seconds using batch mode. Faster than Unstructured (3.5s), Pandoc (3.6s), and Docling (8.6s) while parsing every file.
Why is AILANG Parse slow on a single file?
Single-file invocations pay ~400ms of AILANG runtime startup. Use batch mode (pass multiple files or a folder) to amortise this — per-file time drops to ~44ms.
How do I enable batch mode?
The docparse CLI enables it automatically when you pass multiple files or a folder. Just run docparse ~/Documents/. No flags needed.
What does AILANG_NO_TRACE do?
Disables auto-trace collection (2.7M objects per run when OTEL endpoints are configured). Gives 2–5x speedup. The docparse CLI sets this by default.