Slides as Structured Data
An .odp file is a zip archive with one big content.xml describing every slide as a draw:page element. Inside each slide, text frames, tables, lists, and images sit as draw:frame children with the ODF text:, table:, and draw: namespaces. None of this maps to PowerPoint's p: schema.
Parsers that pipe ODP through LibreOffice to convert to PPTX or PDF lose slide structure, lose master-slide context, and add seconds of latency per file. AILANG Parse reads ODF directly: each draw:page becomes a SectionBlock with a slide name, and the contents come through as typed blocks within it.
Raw ODF XML vs Structured Output
<office:body>
<office:presentation>
<draw:page draw:name="Slide1">
<draw:frame>
<draw:text-box>
<text:h text:outline-level="1">
Q1 Roadmap
</text:h>
</draw:text-box>
</draw:frame>
<draw:frame>
<draw:text-box>
<text:list>
<text:list-item>
<text:p>Ship v0.9</text:p>
</text:list-item>
<text:list-item>
<text:p>OfficeDocBench</text:p>
</text:list-item>
</text:list>
</draw:text-box>
</draw:frame>
</draw:page>
</office:presentation>
</office:body>
{
"metadata": {
"title": "Q1 Roadmap",
"author": "Alice Chen"
},
"blocks": [
{
"type": "section",
"kind": "slide:Slide1",
"blocks": [
{
"type": "heading",
"level": 1,
"text": "Q1 Roadmap"
},
{
"type": "list",
"ordered": false,
"items": [
"Ship v0.9",
"OfficeDocBench"
]
}
]
}
]
}
What Gets Extracted
Per-Slide SectionBlocks
Every draw:page becomes a SectionBlock with kind: "slide:<name>". Slide names come from draw:name attributes when present, falling back to a generic label otherwise.
Text Frames & Headings
Text inside draw:text-box elements comes through as text and heading blocks. text:h elements preserve their outline level (1–6); text:p elements become normal text blocks.
Tables on Slides
Tables can sit directly inside slides or inside frames. Both paths come through as TableBlock with rows, headers, and merged cells preserved — same shape as the DOCX/PPTX parser.
Bullet & Numbered Lists
text:list elements become typed list blocks. Each text:list-item becomes an entry in the list; empty items are filtered out.
Images
Embedded images (draw:image) extracted with their xlink:href path and inferred MIME type. Image references stay anchored to the slide they appeared on.
Metadata
Document metadata from meta.xml — dc:title, dc:creator, dc:date. Useful for indexing slide decks by author or date in bulk pipelines.
Slide-by-Slide Structure
Because each slide becomes its own SectionBlock, downstream code can iterate slides cleanly without flattening the deck into one continuous stream. Want just the headings? Filter to type: "section" with kind matching "slide:*", then walk each slide's children for type: "heading". Want just slides with tables? Same shape, different filter.
from ailang_parse import DocParse
result = DocParse(api_key="...").parse_file("deck.odp")
for slide in result.blocks:
if slide.type != "section":
continue
print("---", slide.kind, "---")
for block in slide.blocks:
if block.type == "heading":
print("H" + str(block.level), block.text)
elif block.type == "list":
for item in block.items:
print(" -", item)
Use Cases
Conference Talk Archives
Many academic and open-source conferences publish slide decks in ODP. Index them by speaker, search across hundreds of decks, or feed slide outlines to an LLM without losing slide boundaries.
Mixed-Office Pipelines
Organisations with a Linux/LibreOffice contingent end up with mixed PPTX/ODP archives. One parser, one Block ADT, one downstream pipeline. See PPTX parsing for the PowerPoint counterpart.
Slide-Level RAG
Each slide becomes its own retrievable chunk with explicit boundaries. Embed the slide as a unit instead of a windowed chunk of flattened text — the model gets a coherent slide instead of half of slide 4 plus half of slide 5.
Outline Extraction
Pull every slide title for an automatic table of contents. Generate PPTX or QMD output from an ODP source for cross-tool collaboration.
Browser-Based Review
Drop an ODP into the Workbench — the parser runs in WebAssembly, the file never leaves your browser. Useful when slide decks are confidential.
Format Migration
Convert ODP archives to PPTX, HTML, or Markdown. Slide structure, text frames, lists, and images survive the conversion intact — no rendering pass required.
Try It
Workbench (in-browser)
Drop an ODP file into the Workbench — parsing runs in WebAssembly with the same AILANG parser used by the CLI. No upload, no signup.
CLI
# Parse an ODP file
./bin/docparse deck.odp
# Convert ODP to Markdown
./bin/docparse deck.odp --convert outline.md
# Convert ODP to PPTX
./bin/docparse deck.odp --convert deck.pptx
API
curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
-H "Content-Type: application/json" \
-d '{"filepath":"deck.odp","outputFormat":"json","apiKey":"YOUR_API_KEY"}'
Parse in Browser API Reference
Frequently Asked Questions
How do I parse an ODP file to JSON?
Are slides extracted individually?
Yes. Every draw:page becomes its own SectionBlock with a slide name. Text frames, tables, lists, and images inside that slide come through as typed blocks within the section.
Does ODP parsing need LibreOffice installed?
No. content.xml is read directly from the .odp zip. No soffice subprocess, no headless rendering, no PPTX conversion.
What slide elements are extracted?
Text frames (draw:text-box), headings (text:h), paragraphs (text:p), tables (table:table), lists (text:list), and images (draw:image) — each anchored to the slide it lives on.
Does ODP parsing run in the browser?
Yes. The same AILANG parser used by the CLI compiles to WebAssembly. Files never leave the browser tab.