Upload a document to see it parsed into structured blocks
Supported Formats
DocParse uses 8 AILANG modules running in WebAssembly to parse documents entirely in your browser.
Office documents are handled by pure std/xml functions; PDFs, images, audio, and video use AILANG's
! {AI} effect with Gemini.
AILANG std/xml Office Documents
DOCX, PPTX, and XLSX files are ZIP archives containing XML.
JSZip extracts entries, then AILANG's std/xml module parses
document structure into typed blocks — headings, paragraphs,
tables, lists, and embedded images. Pure functions, no AI needed.
! {AI} PDFs, Images, Audio & Video
Binary formats use AILANG's AI effect system with Gemini multimodal extraction.
Add a Gemini API key above to enable — or use the CLI with
--caps AI --ai gemini-3-flash-preview.
Embedded images in Office docs also get AI descriptions when a key is set.
pure Text Files
Plain text, markdown, CSV, HTML, and XML files are read directly as text blocks. No extraction or AI needed.
How It Works
Upload
Drop a file or click to browse. Everything runs in your browser via WebAssembly.
Format Detect AILANG
AILANG's getFormatInfo pure function identifies format and routes to the right parser.
ZIP + XML Parse AILANG
JSZip extracts entries, then AILANG's std/xml parses XML structure via findAll, getText, getAttr.
AI Describe ! {AI}
Embedded images get AI descriptions via Gemini (optional — requires API key).
Block Output AILANG
AILANG's output_formatter module produces structured JSON blocks, rendered HTML, or markdown.
DocParse Powers the Document Extractor
DocParse is a general-purpose document parser built with AILANG. It's also integrated into the AILANG Document Extractor, where Office files are first parsed by DocParse then processed through AI extraction and AILANG contract validation.