AILANG

DocParse

AILANG-powered document parsing via WebAssembly
Loading WASM runtime...
Demo Mode

Stored in browser only. Get a free key

Drop a document here or click to browse
Parse Office documents, PDFs, images, audio, video, and text files — powered by AILANG WebAssembly
DOCX PPTX XLSX PDF PNG JPG MP3 MP4 TXT MD CSV
Try:

Upload a document to see it parsed into structured blocks

Supported Formats

DocParse uses 8 AILANG modules running in WebAssembly to parse documents entirely in your browser. Office documents are handled by pure std/xml functions; PDFs, images, audio, and video use AILANG's ! {AI} effect with Gemini.

AILANG std/xml Office Documents

DOCX, PPTX, and XLSX files are ZIP archives containing XML. JSZip extracts entries, then AILANG's std/xml module parses document structure into typed blocks — headings, paragraphs, tables, lists, and embedded images. Pure functions, no AI needed.

.docx .pptx .xlsx

! {AI} PDFs, Images, Audio & Video

Binary formats use AILANG's AI effect system with Gemini multimodal extraction. Add a Gemini API key above to enable — or use the CLI with --caps AI --ai gemini-3-flash-preview. Embedded images in Office docs also get AI descriptions when a key is set.

.pdf .png .jpg .webp .mp3 .wav .mp4 .webm

pure Text Files

Plain text, markdown, CSV, HTML, and XML files are read directly as text blocks. No extraction or AI needed.

.txt .md .csv .html .xml

How It Works

1

Upload

Drop a file or click to browse. Everything runs in your browser via WebAssembly.

2

Format Detect AILANG

AILANG's getFormatInfo pure function identifies format and routes to the right parser.

3

ZIP + XML Parse AILANG

JSZip extracts entries, then AILANG's std/xml parses XML structure via findAll, getText, getAttr.

4

AI Describe ! {AI}

Embedded images get AI descriptions via Gemini (optional — requires API key).

5

Block Output AILANG

AILANG's output_formatter module produces structured JSON blocks, rendered HTML, or markdown.

DocParse Powers the Document Extractor

DocParse is a general-purpose document parser built with AILANG. It's also integrated into the AILANG Document Extractor, where Office files are first parsed by DocParse then processed through AI extraction and AILANG contract validation.