Three Ways to Parse
AILANG Parse offers three ways to parse documents. Choose what fits your needs:
| Browser (WASM) | API + SDKs | Run Locally | |
|---|---|---|---|
| Install required | None | SDK only | AILANG CLI |
| Office parsing | Yes | Yes | Yes |
| AI / PDF parsing | With API key | Yes | Yes |
| Offline capable | Yes (Office) | No | Yes (Office) |
| Throughput | Browser-limited | Unlimited (paid) | Unlimited |
| Cost | Free | Free tier, then paid | Free (bring your own AI keys) |
| Data privacy | Stays in browser | Regional servers (EU). Data handling → | Stays on your machine |
Quick Start
Parse a document in two commands:
# Install AILANG
curl -fsSL https://ailang.sunholo.com/install.sh | bash
# Install AILANG Parse from the package registry
ailang install sunholo/ailang_parse
Now parse a document:
# Parse a DOCX file
ailang run --entry main --caps IO,FS,Env \
~/.ailang/cache/registry/sunholo/ailang_parse/*/docparse/main.ail your-file.docx
docparse CLI (./bin/docparse on macOS/Linux, bin\docparse.cmd on Windows), test files, and benchmarks:
git clone https://github.com/sunholo-data/ailang-parse.git
cd ailang-parse
./bin/docparse data/test_files/sample.docx
Convert between formats:
# DOCX to HTML
./bin/docparse input.docx --convert output.html
# CSV to DOCX report
./bin/docparse data.csv --convert report.docx
# Markdown to PowerPoint slides
./bin/docparse notes.md --convert slides.pptx
# Any format to Quarto Markdown (for Quarto rendering)
./bin/docparse report.docx --convert report.qmd
Generate documents with AI:
# Generate a DOCX report from a prompt
ailang run --entry main --caps IO,FS,Env,AI --ai gemini-2.5-flash \
docparse/main.ail --generate report.docx --prompt "Q1 sales report with revenue table"
Install AILANG
Step 1: Install the AILANG CLI
AILANG is a single binary with no dependencies. Install it for your platform:
# macOS / Linux
curl -fsSL https://ailang.sunholo.com/install.sh | bash
# Windows (PowerShell)
irm https://ailang.sunholo.com/install.ps1 | iex
Or download directly from GitHub releases for any platform.
Verify the installation:
ailang --version
# ailang v0.9.2 (or later)
Step 2: Install AILANG Parse
Install the parsing package from the AILANG registry:
ailang install sunholo/ailang_parse
This downloads all parsing modules and their dependencies. No build step needed.
Optional: Clone the Source
For development, testing, and benchmarks, clone the repository:
# macOS / Linux
git clone https://github.com/sunholo-data/ailang-parse.git
cd ailang-parse
./bin/docparse data/test_files/sample.docx
# Type-check all modules
./bin/docparse --check
# Windows
git clone https://github.com/sunholo-data/ailang-parse.git
cd ailang-parse
bin\docparse.cmd data\test_files\sample.docx
# Type-check all modules
bin\docparse.cmd --check
Batch Mode & Folder Parsing
Parse multiple files in one command. The CLI compiles once and runs all inputs — 5–15x faster than parsing sequentially:
# Parse all emails in a directory
docparse ~/inbox/
# Parse specific files (auto-detects batch mode)
docparse report.docx slides.pptx data.xlsx
# Glob patterns work too
docparse *.eml
What is AILANG?
AILANG is a general-purpose programming language designed for AI-powered applications. It treats AI calls as a first-class effect — just like file I/O or network access — with hard capability budgets that cap how many AI calls, file reads, or network requests a program can make. These limits are enforced at compile time, not just by convention.
AILANG also supports formal contracts verified by Z3, deterministic execution for non-AI code paths, and a package registry for sharing modules. The result: programs that are auditable, reproducible, and cost-predictable.
AILANG Parse is one application built on AILANG. Once you have the CLI installed, you can build your own AI-powered tools — document processors, data pipelines, code generators, agents — using the same language and toolchain.
ailang prompt for the full language reference, or visit ailang.sunholo.com for documentation, examples, and the package registry.AI Models
AILANG Parse uses AI for PDFs, images, audio, and video. Office formats (DOCX, PPTX, XLSX, etc.) are parsed deterministically with zero AI calls.
Google (Vertex AI / AI Studio)
# Via Application Default Credentials (Cloud Run, GKE)
GOOGLE_API_KEY="" ailang run --entry main --caps IO,FS,Env,AI \
--ai gemini-2.5-flash docparse/main.ail document.pdf
# Via API key (local development)
GOOGLE_API_KEY="your-key" ailang run --entry main --caps IO,FS,Env,AI \
--ai gemini-2.5-flash docparse/main.ail document.pdf
Anthropic (Claude)
ANTHROPIC_API_KEY="sk-ant-..." ailang run --entry main --caps IO,FS,Env,AI \
--ai claude-haiku-4-5 docparse/main.ail document.pdf
Ollama (Local, Free)
# Start Ollama and pull a model
ollama pull granite3.2-vision
# Run AILANG Parse with local model (no API key needed)
ailang run --entry main --caps IO,FS,Env,AI \
--ai granite-docling docparse/main.ail document.pdf
Audio & Video
# Transcribe audio — returns transcription, speaker count, language
./bin/docparse recording.mp3 --ai gemini-2.5-flash
# Extract video content — visual scenes, tables, transcription
./bin/docparse presentation.mp4 --ai gemini-2.5-flash
# Convert audio transcription to DOCX
./bin/docparse interview.wav --ai gemini-2.5-flash --convert transcript.docx
# Convert video content to HTML report
./bin/docparse tutorial.mp4 --ai gemini-2.5-flash --convert report.html
Audio formats: WAV, MP3, AIFF, AAC, OGG, FLAC. Video formats: MP4, MOV, AVI, WebM, WMV, MPEG, MPG, FLV, 3GPP.
gemini-2.5-flash — best balance of accuracy (92% on PDF benchmark) and speed. Ollama models score lower (0-3%) due to structured JSON output limitations.SDKs (API, not local)
The Python, JavaScript, and Go SDKs connect to the hosted API — they don't run parsing locally. Use them when you want zero-maintenance scaling without installing AILANG.
# Python
pip install ailang-parse
# JavaScript / TypeScript
npm install @ailang/parse
# Go
go get github.com/sunholo-data/ailang-parse-go
Example (Python):
from ailang_parse import AilangParse
client = AilangParse()
result = client.parse("report.docx")
print(result.markdown)
SDKs include a free tier (1,000 requests/month + 50 AI parses). See the API documentation for full details, authentication, and pricing.
Configuration
Local parsing is configured via environment variables and AILANG capability budgets.
?api=http://localhost:8080 to any API docs page URL and all curl examples, code snippets, and live playgrounds will automatically point at your local instance.
Environment Variables
| Variable | Default | Description |
|---|---|---|
| GOOGLE_API_KEY | (empty) | Google AI Studio API key for Gemini models (PDF, image, audio, video parsing) |
| ANTHROPIC_API_KEY | (empty) | Anthropic API key for Claude models |
| DOCPARSE_OUTPUT_DIR | docparse/data | Output directory for parsed results |
Office formats (DOCX, PPTX, XLSX, ODT, ODP, ODS) need no API keys — parsing is fully deterministic and offline.
Capability Budgets
AILANG enforces hard limits on what each program can do. These are compile-time guarantees — exceed them and the runtime halts deterministically.
| Capability | Budget | Purpose |
|---|---|---|
| IO @limit | 50,000 | Console output operations |
| FS @limit | 5,000 | File system reads (ZIP entries, output writes) |
| AI @limit | 30 | AI model calls (PDF pages, image descriptions) |
| Net | unlimited | HTTP requests (metadata, AI endpoints) |
| Rand | unlimited | Random number generation |
| Clock | unlimited | Timestamp access |
Supported Formats
| Format | Strategy | Features |
|---|---|---|
| DOCX | Deterministic | Track changes, merged cells, text boxes, comments, headers/footers, footnotes, endnotes, images |
| PPTX | Deterministic | Slides, tables, images, metadata |
| XLSX | Deterministic | Shared strings, multiple sheets, merged cells |
| ODT | Deterministic | ODF text, headers/footers, images |
| ODP | Deterministic | ODF presentation slides |
| ODS | Deterministic | ODF spreadsheet sheets |
| HTML | Deterministic | XHTML via std/xml, AI fallback for dirty HTML |
| Markdown | Deterministic | Headings, lists, tables, code blocks, links |
| CSV / TSV | Deterministic | Delimiter detection, quoted fields |
| EPUB | Deterministic | ZIP + XHTML chapters, images, metadata |
| AI Required | Page-by-page extraction via pluggable AI model | |
| PNG / JPG | AI Required | Document image extraction or description |
| Audio .wav .mp3 .aiff .aac .ogg .flac | AI Required | Full transcription, speaker detection, language identification, content summary |
| Video .mp4 .mov .avi .webm .wmv .mpeg .mpg .flv .3gpp | AI Required | Visual content extraction (headings, tables, text, images), spoken-word transcription |
| QMD (Quarto) | Generate only | Quarto Markdown with YAML front matter, CriticMarkup for track changes, pipe/grid tables |
Docker
Prefer containers? A CLI-only Dockerfile is included in the repository:
# Build the image
git clone https://github.com/sunholo-data/ailang-parse.git
cd ailang-parse
docker build -t docparse .
# Parse a file (mount your documents into /data)
docker run -v $(pwd):/data docparse /data/report.docx
# Parse with AI (pass your API key)
docker run -e GOOGLE_API_KEY="your-key" \
-v $(pwd):/data docparse --ai gemini-2.5-flash /data/document.pdf
The image builds AILANG from source, installs all dependencies, and uses an ENTRYPOINT for direct CLI parsing. No API server included — this is for local batch processing in containers.
Use the hosted API (1,000 free requests/month), or contact us for custom managed deployment or as part of Sunholo Multivac — our AI platform with dedicated support.
Contact Sunholo →
Frequently Asked Questions
How do I install AILANG Parse?
Run curl -fsSL https://ailang.sunholo.com/install.sh | bash to install AILANG, then ailang install sunholo/ailang_parse to download the parsing package. For the convenience CLI wrapper and test files, clone the repo: git clone https://github.com/sunholo-data/ailang-parse.git.
What are the system requirements?
AILANG is a single Go binary. It runs on macOS, Linux (amd64/arm64), and Windows. For Office-only parsing, it needs approximately 50MB RAM and no GPU. AI-powered PDF parsing needs connectivity to Gemini/Claude, or a local Ollama model.
What's the difference between the API and running locally?
The hosted API gives you zero-maintenance scaling with Python, JS, and Go SDKs — 1,000 free requests/month included. Running locally gives you unlimited free parsing with full privacy, but requires installing the AILANG CLI. Both produce identical output.
Does local parsing require an internet connection?
For Office formats (DOCX, PPTX, XLSX, ODF), parsing is fully offline — no internet needed. AI-powered PDF parsing requires connectivity to Gemini or Claude, unless you use a local Ollama model which also runs offline.
Can I use AILANG for other projects?
Yes. AILANG is a general-purpose programming language for AI-powered applications. Once installed, you can build your own tools with capability budgets, formal contracts, and pluggable AI models. Run ailang prompt for the full language reference, or visit ailang.sunholo.com to explore the ecosystem.