Run Locally

Install AILANG, parse documents from your terminal. Zero dependencies for Office formats, no AI calls and no per-page billing.

Three Ways to Parse

AILANG Parse offers three ways to parse documents. Choose what fits your needs:

Browser (WASM)API + SDKsRun Locally
Install requiredNoneSDK onlyAILANG CLI
Office parsingYesYesYes
AI / PDF parsingWith API keyYesYes
Offline capableYes (Office)NoYes (Office)
ThroughputBrowser-limitedUnlimited (paid)Unlimited
CostFreeFree tier, then paidFree (bring your own AI keys)
Data privacyStays in browserRegional servers (EU). Data handling →Stays on your machine

Quick Start

Parse a document in two commands:

# Install AILANG
curl -fsSL https://ailang.sunholo.com/install.sh | bash

# Install AILANG Parse from the package registry
ailang install sunholo/ailang_parse

Now parse a document:

# Parse a DOCX file
ailang run --entry main --caps IO,FS,Env \
  ~/.ailang/cache/registry/sunholo/ailang_parse/*/docparse/main.ail your-file.docx
Want the convenience wrapper? Clone the repo for the docparse CLI (./bin/docparse on macOS/Linux, bin\docparse.cmd on Windows), test files, and benchmarks:
git clone https://github.com/sunholo-data/ailang-parse.git
cd ailang-parse
./bin/docparse data/test_files/sample.docx

Convert between formats:

# DOCX to HTML
./bin/docparse input.docx --convert output.html

# CSV to DOCX report
./bin/docparse data.csv --convert report.docx

# Markdown to PowerPoint slides
./bin/docparse notes.md --convert slides.pptx

# Any format to Quarto Markdown (for Quarto rendering)
./bin/docparse report.docx --convert report.qmd

Generate documents with AI:

# Generate a DOCX report from a prompt
ailang run --entry main --caps IO,FS,Env,AI --ai gemini-2.5-flash \
  docparse/main.ail --generate report.docx --prompt "Q1 sales report with revenue table"

Install AILANG

Step 1: Install the AILANG CLI

AILANG is a single binary with no dependencies. Install it for your platform:

# macOS / Linux
curl -fsSL https://ailang.sunholo.com/install.sh | bash
# Windows (PowerShell)
irm https://ailang.sunholo.com/install.ps1 | iex

Or download directly from GitHub releases for any platform.

Verify the installation:

ailang --version
# ailang v0.9.2 (or later)

Step 2: Install AILANG Parse

Install the parsing package from the AILANG registry:

ailang install sunholo/ailang_parse

This downloads all parsing modules and their dependencies. No build step needed.

Optional: Clone the Source

For development, testing, and benchmarks, clone the repository:

# macOS / Linux
git clone https://github.com/sunholo-data/ailang-parse.git
cd ailang-parse
./bin/docparse data/test_files/sample.docx

# Type-check all modules
./bin/docparse --check
# Windows
git clone https://github.com/sunholo-data/ailang-parse.git
cd ailang-parse
bin\docparse.cmd data\test_files\sample.docx

# Type-check all modules
bin\docparse.cmd --check

Batch Mode & Folder Parsing

Parse multiple files in one command. The CLI compiles once and runs all inputs — 5–15x faster than parsing sequentially:

# Parse all emails in a directory
docparse ~/inbox/

# Parse specific files (auto-detects batch mode)
docparse report.docx slides.pptx data.xlsx

# Glob patterns work too
docparse *.eml

What is AILANG?

AILANG is a general-purpose programming language designed for AI-powered applications. It treats AI calls as a first-class effect — just like file I/O or network access — with hard capability budgets that cap how many AI calls, file reads, or network requests a program can make. These limits are enforced at compile time, not just by convention.

AILANG also supports formal contracts verified by Z3, deterministic execution for non-AI code paths, and a package registry for sharing modules. The result: programs that are auditable, reproducible, and cost-predictable.

AILANG Parse is one application built on AILANG. Once you have the CLI installed, you can build your own AI-powered tools — document processors, data pipelines, code generators, agents — using the same language and toolchain.

Explore AILANG: Run ailang prompt for the full language reference, or visit ailang.sunholo.com for documentation, examples, and the package registry.

AI Models

AILANG Parse uses AI for PDFs, images, audio, and video. Office formats (DOCX, PPTX, XLSX, etc.) are parsed deterministically with zero AI calls.

Video and audio parsing is available via CLI, WASM, and the browser demo using your own AI key. The hosted API focuses on Office formats, PDFs, and images. Run locally for full format support with no per-request charges.

Google (Vertex AI / AI Studio)

# Via Application Default Credentials (Cloud Run, GKE)
GOOGLE_API_KEY="" ailang run --entry main --caps IO,FS,Env,AI \
  --ai gemini-2.5-flash docparse/main.ail document.pdf

# Via API key (local development)
GOOGLE_API_KEY="your-key" ailang run --entry main --caps IO,FS,Env,AI \
  --ai gemini-2.5-flash docparse/main.ail document.pdf

Anthropic (Claude)

ANTHROPIC_API_KEY="sk-ant-..." ailang run --entry main --caps IO,FS,Env,AI \
  --ai claude-haiku-4-5 docparse/main.ail document.pdf

Ollama (Local, Free)

# Start Ollama and pull a model
ollama pull granite3.2-vision

# Run AILANG Parse with local model (no API key needed)
ailang run --entry main --caps IO,FS,Env,AI \
  --ai granite-docling docparse/main.ail document.pdf

Audio & Video

# Transcribe audio — returns transcription, speaker count, language
./bin/docparse recording.mp3 --ai gemini-2.5-flash

# Extract video content — visual scenes, tables, transcription
./bin/docparse presentation.mp4 --ai gemini-2.5-flash

# Convert audio transcription to DOCX
./bin/docparse interview.wav --ai gemini-2.5-flash --convert transcript.docx

# Convert video content to HTML report
./bin/docparse tutorial.mp4 --ai gemini-2.5-flash --convert report.html

Audio formats: WAV, MP3, AIFF, AAC, OGG, FLAC. Video formats: MP4, MOV, AVI, WebM, WMV, MPEG, MPG, FLV, 3GPP.

Recommended model: gemini-2.5-flash — best balance of accuracy (92% on PDF benchmark) and speed. Ollama models score lower (0-3%) due to structured JSON output limitations.

SDKs (API, not local)

The Python, JavaScript, and Go SDKs connect to the hosted API — they don't run parsing locally. Use them when you want zero-maintenance scaling without installing AILANG.

# Python
pip install ailang-parse

# JavaScript / TypeScript
npm install @ailang/parse

# Go
go get github.com/sunholo-data/ailang-parse-go

Example (Python):

from ailang_parse import AilangParse

client = AilangParse()
result = client.parse("report.docx")
print(result.markdown)

SDKs include a free tier (1,000 requests/month + 50 AI parses). See the API documentation for full details, authentication, and pricing.

Local vs API: For local parsing with full control, use the AILANG CLI described above. For programmatic integration without managing infrastructure, use the SDKs.

Configuration

Local parsing is configured via environment variables and AILANG capability budgets.

Docs tip: Add ?api=http://localhost:8080 to any API docs page URL and all curl examples, code snippets, and live playgrounds will automatically point at your local instance.

Environment Variables

VariableDefaultDescription
GOOGLE_API_KEY(empty)Google AI Studio API key for Gemini models (PDF, image, audio, video parsing)
ANTHROPIC_API_KEY(empty)Anthropic API key for Claude models
DOCPARSE_OUTPUT_DIRdocparse/dataOutput directory for parsed results

Office formats (DOCX, PPTX, XLSX, ODT, ODP, ODS) need no API keys — parsing is fully deterministic and offline.

Capability Budgets

AILANG enforces hard limits on what each program can do. These are compile-time guarantees — exceed them and the runtime halts deterministically.

CapabilityBudgetPurpose
IO @limit50,000Console output operations
FS @limit5,000File system reads (ZIP entries, output writes)
AI @limit30AI model calls (PDF pages, image descriptions)
NetunlimitedHTTP requests (metadata, AI endpoints)
RandunlimitedRandom number generation
ClockunlimitedTimestamp access

Supported Formats

FormatStrategyFeatures
DOCXDeterministicTrack changes, merged cells, text boxes, comments, headers/footers, footnotes, endnotes, images
PPTXDeterministicSlides, tables, images, metadata
XLSXDeterministicShared strings, multiple sheets, merged cells
ODTDeterministicODF text, headers/footers, images
ODPDeterministicODF presentation slides
ODSDeterministicODF spreadsheet sheets
HTMLDeterministicXHTML via std/xml, AI fallback for dirty HTML
MarkdownDeterministicHeadings, lists, tables, code blocks, links
CSV / TSVDeterministicDelimiter detection, quoted fields
EPUBDeterministicZIP + XHTML chapters, images, metadata
PDFAI RequiredPage-by-page extraction via pluggable AI model
PNG / JPGAI RequiredDocument image extraction or description
Audio
.wav .mp3 .aiff .aac .ogg .flac
AI RequiredFull transcription, speaker detection, language identification, content summary
Video
.mp4 .mov .avi .webm .wmv .mpeg .mpg .flv .3gpp
AI RequiredVisual content extraction (headings, tables, text, images), spoken-word transcription
QMD (Quarto)Generate onlyQuarto Markdown with YAML front matter, CriticMarkup for track changes, pipe/grid tables

Docker

Prefer containers? A CLI-only Dockerfile is included in the repository:

# Build the image
git clone https://github.com/sunholo-data/ailang-parse.git
cd ailang-parse
docker build -t docparse .

# Parse a file (mount your documents into /data)
docker run -v $(pwd):/data docparse /data/report.docx

# Parse with AI (pass your API key)
docker run -e GOOGLE_API_KEY="your-key" \
  -v $(pwd):/data docparse --ai gemini-2.5-flash /data/document.pdf

The image builds AILANG from source, installs all dependencies, and uses an ENTRYPOINT for direct CLI parsing. No API server included — this is for local batch processing in containers.

Need a managed API with authentication, rate limiting, and SDKs?
Use the hosted API (1,000 free requests/month), or contact us for custom managed deployment or as part of Sunholo Multivac — our AI platform with dedicated support.

Contact Sunholo →

Frequently Asked Questions

How do I install AILANG Parse?

Run curl -fsSL https://ailang.sunholo.com/install.sh | bash to install AILANG, then ailang install sunholo/ailang_parse to download the parsing package. For the convenience CLI wrapper and test files, clone the repo: git clone https://github.com/sunholo-data/ailang-parse.git.

What are the system requirements?

AILANG is a single Go binary. It runs on macOS, Linux (amd64/arm64), and Windows. For Office-only parsing, it needs approximately 50MB RAM and no GPU. AI-powered PDF parsing needs connectivity to Gemini/Claude, or a local Ollama model.

What's the difference between the API and running locally?

The hosted API gives you zero-maintenance scaling with Python, JS, and Go SDKs — 1,000 free requests/month included. Running locally gives you unlimited free parsing with full privacy, but requires installing the AILANG CLI. Both produce identical output.

Does local parsing require an internet connection?

For Office formats (DOCX, PPTX, XLSX, ODF), parsing is fully offline — no internet needed. AI-powered PDF parsing requires connectivity to Gemini or Claude, unless you use a local Ollama model which also runs offline.

Can I use AILANG for other projects?

Yes. AILANG is a general-purpose programming language for AI-powered applications. Once installed, you can build your own tools with capability budgets, formal contracts, and pluggable AI models. Run ailang prompt for the full language reference, or visit ailang.sunholo.com to explore the ecosystem.