ODS Parsing API — Parse OpenDocument Spreadsheets to Structured JSON

Q: How do I parse an ODS file to JSON?

Drop the .ods file into the AILANG Parse Workbench, send it to the API, or use the Python/JS/Go/R SDK. The parser reads OpenDocument Spreadsheet XML directly from the .ods zip — no LibreOffice subprocess and no XLSX conversion.

Q: Are sheets extracted individually?

Yes. Every table:table becomes a SectionBlock with kind 'sheet: ' wrapping a TableBlock with rows, headers, and merged cells. Empty trailing rows are filtered.

Q: Are merged cells preserved?

Yes. ODF stores horizontal merges in table:number-columns-spanned on the cell that owns the merge. AILANG Parse resolves these into explicit colSpan metadata, matching the XLSX parser's contract.

Q: Does ODS parsing need LibreOffice installed?

No. AILANG Parse reads ODF content.xml directly from the .ods zip archive. There is no soffice subprocess and no XLSX conversion step.

Q: Does ODS parsing run in the browser?

Yes. ODS parsing runs in WebAssembly via the same AILANG parser used by the CLI. Files never leave the browser tab — try it in the Workbench.

Spreadsheets Without the Subprocess

An .ods file is a zip archive with one content.xml describing every sheet as a table:table element. Cells, rows, headers, and merge metadata all live in the ODF table: namespace. None of it maps to Excel's x: schema.

Most parsers handle ODS by piping it through LibreOffice to convert to XLSX or CSV first — losing per-sheet structure, losing merge metadata, and adding seconds of latency. AILANG Parse reads ODF directly: each sheet becomes a SectionBlock wrapping a TableBlock with the same shape used by the XLSX parser.

Same Block ADT as the XLSX parser — one downstream pipeline handles both formats without branching on file extension.

Raw ODF XML vs Structured Output

Raw content.xml

<office:body>
 <office:spreadsheet>
  <table:table table:name="Q1">
   <table:table-row>
    <table:table-cell
      table:number-columns-spanned="2">
     <text:p>Region</text:p>
    </table:table-cell>
    <table:table-cell>
     <text:p>Total</text:p>
    </table:table-cell>
   </table:table-row>
   <table:table-row>
    <table:table-cell>
     <text:p>EMEA</text:p>
    </table:table-cell>
    <table:table-cell>
     <text:p>UK</text:p>
    </table:table-cell>
    <table:table-cell>
     <text:p>125000</text:p>
    </table:table-cell>
   </table:table-row>
  </table:table>
 </office:spreadsheet>
</office:body>

Structured output

{
  "metadata": {
    "title": "Q1 Targets",
    "author": "Finance Team"
  },
  "blocks": [
    {
      "type": "section",
      "kind": "sheet:Q1",
      "blocks": [
        {
          "type": "table",
          "headers": [
            {"text": "Region", "colSpan": 2},
            {"text": "Total"}
          ],
          "rows": [
            [
              {"text": "EMEA"},
              {"text": "UK"},
              {"text": "125000"}
            ]
          ]
        }
      ]
    }
  ]
}

What Gets Extracted

Per-Sheet SectionBlocks

Every table:table becomes a SectionBlock with kind: "sheet:<name>". Sheet names come from table:name attributes.

Tables with Headers

Each sheet's first non-empty row is identified as the header. Subsequent rows become data rows. Empty trailing rows are filtered (ODS files often pad sheets with thousands of empty rows).

Merged Cells

table:number-columns-spanned on the owning cell is resolved into explicit colSpan metadata. Same shape as the XLSX parser, so downstream code is identical.

Cell Text

Cell content is extracted as text (the displayed value). Numbers, dates, and formulas come through as their rendered representation — same as the XLSX parser.

Multi-Sheet Workbooks

Workbooks with dozens of sheets come through as an ordered list of SectionBlocks, one per sheet. Iterate them in workbook order without losing sheet boundaries.

Metadata

Document metadata from meta.xml — dc:title, dc:creator, dc:date. Useful for indexing spreadsheets by author or date in bulk pipelines.

Sheets & Merges

Because each sheet becomes its own SectionBlock, downstream code can iterate sheets cleanly without flattening the workbook. Want a single sheet by name? Filter on kind. Want every table in the file? Walk all sections.

from ailang_parse import DocParse

result = DocParse(api_key="...").parse_file("targets.ods")

for sheet in result.blocks:
    if sheet.type != "section":
        continue
    sheet_name = sheet.kind.replace("sheet:", "")
    table = sheet.blocks[0]    # one TableBlock per sheet
    print(sheet_name, "-", len(table.rows), "rows")
    for row in table.rows:
        print("  ", [cell.text for cell in row])

Use Cases

Open Data Pipelines

Many government open-data portals publish ODS alongside CSV. ODS preserves multi-sheet structure that CSV flattens. Native parsing means you can ingest the structured version directly.

Mixed-Office Pipelines

Organisations with a Linux/LibreOffice contingent end up with mixed XLSX/ODS archives. One parser, one Block ADT. See XLSX parsing for the Excel counterpart.

Financial Data Migration

Migrate legacy ODS workbooks to a database, data warehouse, or modern spreadsheet format. Sheet structure, column headers, and merged-cell hints survive intact.

Tabular RAG

Each sheet becomes its own retrievable chunk. Embed the sheet as a unit so the model gets a coherent table instead of a windowed slice of cells from the wrong rows.

Compliance Archives

ODS is the standard for long-term preservation of spreadsheet data in many EU jurisdictions. Index, search, and audit ODS archives without converting them to a proprietary format.

Browser-Based Review

Drop an ODS into the Workbench — the parser runs in WebAssembly, files never leave your browser. Useful when financial data is confidential.

Try It

Workbench (in-browser)

Drop an ODS file into the Workbench — parsing runs in WebAssembly with the same AILANG parser used by the CLI. No upload, no signup.

CLI

# Parse an ODS file
./bin/docparse data.ods

# Convert ODS to CSV
./bin/docparse data.ods --convert data.csv

# Convert ODS to XLSX
./bin/docparse data.ods --convert data.xlsx

API

curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
  -H "Content-Type: application/json" \
  -d '{"filepath":"data.ods","outputFormat":"json","apiKey":"YOUR_API_KEY"}'

Parse in Browser API Reference

Frequently Asked Questions

How do I parse an ODS file to JSON?

Drop it into the Workbench, send it to the API, or use the Python/JS/Go/R SDK. The parser reads OpenDocument Spreadsheet XML directly — no LibreOffice subprocess.

Are sheets extracted individually?

Yes. Every table:table becomes a SectionBlock with kind: "sheet:<name>" wrapping a TableBlock with rows, headers, and merge metadata.

Are merged cells preserved?

Yes. ODF encodes horizontal merges via table:number-columns-spanned. AILANG Parse resolves these into explicit colSpan metadata, matching the XLSX parser's contract.

Does ODS parsing need LibreOffice installed?

No. content.xml is read directly from the .ods zip. No soffice subprocess, no XLSX conversion.

Does ODS parsing run in the browser?

Yes. The same AILANG parser used by the CLI compiles to WebAssembly. Files never leave the browser tab.