AILANG

AILANG Document Extractor

AI extraction & Office parsing via AILANG WebAssembly
Try:
Demo Mode

Stored in browser only. Get a free key

Document

Extraction Schema

Generated AILANG

-- Define a schema to see generated AILANG code

Pipeline

DocParse AILANG Office docs
Compile Schema pure
Load Module WASM
Extract Fields ! {AI}
Validate contracts
Done

Results

Select a demo or provide your own document and schema

AILANG Effects & Stdlib in Action

See how AILANG's effect system, contracts, and stdlib modules work together to create a trustworthy AI extraction pipeline. Every line below runs in this demo via WebAssembly — from parsing Office documents with std/xml to validating AI output with contracts.

DocParse: Office Documents via std/xml

DOCX, PPTX, and XLSX files are ZIP archives containing XML. DocParse uses AILANG's std/xml module — parseXml, findAll, getText, getAttr — to extract document structure directly in WebAssembly. No server, no heavy dependencies. 8 AI-generated AILANG modules handle the hard parts: merged table cells, text boxes, headers/footers, footnotes, track changes, and hyperlinks. All pure functions, all contract-verified. 17 real-world test files all parse successfully.

Raw DOCX XML (inside .zip)
<w:tbl>
  <w:tr>
    <w:tc>
      <w:tcPr>
        <w:gridSpan w:val="2"/>
      </w:tcPr>
      <w:p><w:r><w:t>Merged Header</w:t>
    </w:tc>
    <w:tc>
      <w:tcPr><w:vMerge/></w:tcPr>
      <w:p/>
    </w:tc>
  </w:tr>
</w:tbl>
DocParse Output (structured JSON)
{
  "type": "table",
  "headers": [
    { "text": "Merged Header",
      "colSpan": 2,
      "merged": false },
    { "text": "",
      "colSpan": 1,
      "merged": true }
  ]
}
import std/xml (parseXml, findAll, findFirst, getText, getAttr)
import std/list (map, flatMap, filter, length as listLength)

-- Algebraic data type: every document becomes typed blocks
export type Block = TextBlock({text: string, style: string, level: int})
                  | TableBlock({rows: [[TableCell]], headers: [TableCell]})
                  | ImageBlock({data: string, description: string, mime: string})
                  | HeadingBlock({text: string, level: int})
                  | SectionBlock({kind: string, blocks: [Block]})

-- Merged cell handling: reads gridSpan + vMerge from XML
pure func parseTableCell(node: XmlNode) -> TableCell {
  let props = findFirst(node, "w:tcPr")
  let span = optMap(\p. getAttr(findFirst(p, "w:gridSpan"), "w:val"), props)
  let merge = optMap(\p. findFirst(p, "w:vMerge"), props)
  { text: getText(node), colSpan: getOrElse(span, 1),
    rowSpan: 1, merged: isSome(merge) }
}

-- Contract: filters never grow the list
pure func filterHeadings(blocks: [Block]) -> [Block]
  ensures { listLength(result) <= listLength(blocks) }
{ filter(isHeading, blocks) }
Try DocParse — drop a DOCX, PPTX, or XLSX

The AI Effect: ! {AI}

AILANG tracks side effects in the type system. The AI effect marks functions that call an external AI oracle. The host grants this capability — in the browser, a JavaScript callback calls Gemini Flash. In production, ailang run --ai gemini-2-5-flash --caps AI handles it natively.

import std/ai (call)

-- Effectful: calls AI oracle for field extraction
-- The ! {AI} annotation declares this function has the AI effect
func extractFields(document: string) -> string ! {AI} {
  let prompt = "Extract these fields as JSON..." ++ document
  call(prompt)   -- std/ai.call invokes host AI handler
}

-- Main pipeline: effectful extraction + pure validation
export func processDocument(doc: string) -> string ! {AI} {
  let raw = extractFields(doc)
  validateOnly(raw)   -- pure validation, no effects
}

-- Pure fallback: validate pre-extracted data
export pure func validateOnly(json: string) -> string = ...

Contracts: requires / ensures

Preconditions and postconditions are declared in the function signature. AILANG checks them at call time — invalid inputs are caught before any computation happens. No defensive checks scattered through code.

-- Contract: non-empty input, non-empty output
pure func validateExtraction(jsonString: string) -> string
  requires { length(trim(jsonString)) > 0 }
  ensures  { result != "" }
{
  match decode(jsonString) {
    Err(e) => encodeError("Invalid JSON: " ++ e),
    Ok(obj) =>
      match parseRecord(obj) {
        None => encodeError("Missing fields"),
        Some(r) =>
          match validateFields(r) {
            Some(err) => encodeError(err),
            None => encodeResult(r)
          }
      }
  }
}

std/json: Type-Safe JSON

AILANG's JSON library returns Option types for every access, forcing you to handle missing fields. No undefined, no null surprises.

import std/json (Json, decode, encode, getString, getInt, jo, kv, js, jb)

-- Parse a record from JSON: every field access returns Option
pure func parseRecord(j: Json) -> Option[MyRecord] =
  match getString(j, "vendor") {
    None => None,              -- field missing? return None
    Some(vendor) =>
      match getInt(j, "total_cents") {
        None => None,
        Some(total) =>
          Some({ vendor: vendor, total_cents: total })
      }
  }

-- Encode result as JSON
pure func encodeResult(r: MyRecord) -> string =
  encode(jo([kv("valid", jb(true)), kv("vendor", js(r.vendor))]))

std/option & std/result: No Null, No Exceptions

Option[a] is Some(value) or None. Result[a, e] is Ok(value) or Err(error). Pattern matching forces you to handle both cases. No forgotten null checks.

import std/option (Option, Some, None, getOrElse)
import std/result (Result, Ok, Err)

-- Every JSON decode returns Result: you MUST handle both
match decode(jsonString) {
  Err(e) => -- parse failed, e is the error message
    encodeError("Invalid JSON: " ++ e),
  Ok(obj) => -- parse succeeded, obj is the JSON value
    processObj(obj)
}

-- Optional fields with defaults
let discount = getOrElse(getInt(j, "discount"), 0)
-- ^ If "discount" is missing, defaults to 0. No NaN. No undefined.

Capability-Based Security

AILANG uses deny-by-default capabilities. Code can only perform effects that are explicitly granted by the host. The WASM sandbox has no network or filesystem access — the AI handler is injected by JavaScript.

-- CLI: explicitly grant capabilities
-- $ ailang run --ai gemini-2-5-flash --caps AI,IO module.ail

-- Effect budgets: limit operations for cost control
func pipeline() -> string ! {AI @limit=10, IO @limit=50} {
  -- At most 10 AI calls, 50 IO operations
  -- Budget exceeded? Runtime error with clear message
  ...
}

-- In the browser (WASM):
-- No FS, no Net, no IO by default
-- AI effect only works if host registers a handler:
--   ailangSetAIHandler(jsCallback)
-- This is the capability model in action:
--   the language controls the pipeline,
--   the host provides capabilities

Language ReferenceWhy AILANGailang docs std/ai

How It Works

1

Upload document

Paste text, upload PDF/images, or Office docs (DOCX, PPTX, XLSX)

2

DocParse AILANG

Office files are parsed locally via DocParse — AILANG's std/xml extracts text from ZIP+XML archives

3

Define schema

Specify fields to extract — AILANG generates typed validation code

4

AI extracts ! {AI}

Gemini Flash extracts structured data via AILANG's AI effect system

5

AILANG validates contracts

Contracts, types, and std/json guarantee correctness of AI output

Try AILANG in Your Project

AILANG is open source and AI-native. Build reliable applications where AI handles implementation and AILANG guarantees correctness.