Presentations Are Visual Containers That Defeat Text Extraction
A .pptx file is a zip archive containing PresentationML — XML that encodes slides, shapes, text frames, tables, and speaker notes. Most parsers render slides to images or flatten them to plain text, losing the distinction between title, body, notes, and table content.
AILANG Parse reads PresentationML directly. Each slide becomes a section block with typed content: titles, body text, tables, and speaker notes are all separate blocks. The parser extracts from shape placeholders in layout order, so content appears in reading sequence.
Raw PresentationML vs Structured Output
<p:sld>
<p:cSld>
<p:spTree>
<p:sp>
<p:nvSpPr><p:nvPr>
<p:ph type="title"/>
</p:nvPr></p:nvSpPr>
<p:txBody>
<a:p><a:r>
<a:t>Q1 Results</a:t>
</a:r></a:p>
</p:txBody>
</p:sp>
<p:sp>
<p:nvSpPr><p:nvPr>
<p:ph idx="1"/>
</p:nvPr></p:nvSpPr>
<p:txBody>
<a:p><a:r>
<a:t>Revenue up 15% YoY</a:t>
</a:r></a:p>
<a:p><a:r>
<a:t>New markets: APAC, LATAM</a:t>
</a:r></a:p>
</p:txBody>
</p:sp>
</p:spTree>
</p:cSld>
</p:sld>
<!-- notesSlide1.xml -->
<p:notes>
<p:cSld>
<p:spTree>
<p:sp><p:txBody>
<a:p><a:r>
<a:t>Mention APAC expansion
timeline: Q3 launch</a:t>
</a:r></a:p>
</p:txBody></p:sp>
</p:spTree>
</p:cSld>
</p:notes>
{
"metadata": {
"title": "Q1 Results",
"author": "Marketing Team",
"slides": 24
},
"blocks": [
{
"type": "section",
"kind": "slide",
"blocks": [
{
"type": "heading",
"level": 1,
"text": "Q1 Results"
},
{
"type": "text",
"text": "Revenue up 15% YoY"
},
{
"type": "text",
"text": "New markets: APAC, LATAM"
},
{
"type": "text",
"style": "speaker-notes",
"text": "Mention APAC expansion
timeline: Q3 launch"
}
]
}
]
}
What Gets Extracted
Slide Text & Titles
Title and subtitle placeholders are extracted as headings. Body content placeholders become text blocks. Text is ordered by placeholder type (title first) then by position on the slide.
Speaker Notes
Notes from notesSlide XML are extracted as text blocks with style: "speaker-notes", associated with their parent slide. The content that presenters actually say — not just what the audience sees.
Tables within Slides
Slide-embedded tables are extracted with full row/column structure, merged cells, and cell content. They appear as table blocks within their parent slide section. Deep dive →
Shape & Text Box Content
All text-bearing shapes are extracted: content placeholders, freeform text boxes, callout shapes. Text within grouped shapes is flattened into reading order.
Slide Ordering
Slides are parsed in presentation order as defined in presentation.xml. Each slide becomes a numbered SectionBlock with kind: "slide".
Comments & Metadata
Slide-level comments are extracted with author and date. Presentation metadata (title, author, slide count, created date) comes from core properties. Comment details →
Speaker Notes
PPTX stores speaker notes in separate notesSlide XML files, one per slide. These contain the presenter's talking points, data sources, and context that doesn't appear on the projected slide. AILANG Parse extracts notes as text blocks with style: "speaker-notes".
This is critical for AI applications: the notes often contain more actionable content than the slide bullet points. A training deck's slide says "Security Best Practices" — the notes contain the actual procedures and exceptions.
[
{"type": "heading", "level": 1, "text": "Security Best Practices"},
{"type": "text", "text": "Follow the principle of least privilege"},
{"type": "text", "text": "Rotate credentials every 90 days"},
{"type": "text", "style": "speaker-notes",
"text": "Exception: service accounts in prod-critical path use 180-day rotation per SEC-2024-003. Mention the incident from January if anyone pushes back on rotation frequency."}
]
Tables in Slides
Presentation slides frequently contain data tables — quarterly results, comparison matrices, feature lists. These use the same a:tbl markup as DOCX tables, with gridSpan for horizontal merges and vMerge for vertical merges.
{
"type": "table",
"headers": [
{"text": "Feature"},
{"text": "Free"},
{"text": "Pro"},
{"text": "Business"}
],
"rows": [
[{"text": "API Requests"}, {"text": "1K/mo"}, {"text": "100K/mo"}, {"text": "500K/mo"}],
[{"text": "File Size"}, {"text": "10 MB"}, {"text": "25 MB"}, {"text": "50 MB"}],
[{"text": "Support"}, {"text": "Community"}, {"text": "Email"}, {"text": "Dedicated"}]
]
}
See Tables & Merged Cells for the full specification.
Use Cases
Training Material Indexing
Parse corporate training decks into structured blocks. Speaker notes contain the detailed procedures and context that slides only summarize. Index both slide content and notes for enterprise search and knowledge base systems.
Meeting Deck Summarization
Feed structured slide JSON to an LLM for meeting prep summaries. The model sees title hierarchy, bullet points, data tables, and speaker notes as separate typed blocks — producing better summaries than working from flattened text.
Compliance Presentation Review
Regulatory presentations (board reports, audit findings, risk assessments) need systematic review. Structured parsing enables automated checks: required sections present, data tables match expected formats, comments from reviewers captured.
Sales Enablement Search
Sales teams maintain hundreds of decks. Parse them into structured JSON for semantic search: find slides about specific products, pricing tables, competitive comparisons. Speaker notes surface the positioning and objection handling that isn't on the slide.
Presentation-to-Document
Convert PPTX to Markdown, HTML, or Quarto for documentation. Slide titles become headings, bullet points become lists, tables are preserved, and speaker notes can be included as annotations or footnotes.
AI Slide Analysis
Give LLMs the full slide context: title, body, table data, and speaker notes as typed blocks. The model can identify key messages, extract action items from notes, and cross-reference data tables across slides — tasks impossible with flat text extraction.
Try It
CLI
# Parse a PPTX file
ailang run --entry main --caps IO,FS,Env \
docparse/main.ail presentation.pptx
# Convert PPTX to Markdown
./bin/docparse presentation.pptx --convert output.md
# Convert PPTX to HTML
./bin/docparse presentation.pptx --convert output.html
API
curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
-H "Content-Type: application/json" \
-d '{"filepath":"sample_pptx_formatting","outputFormat":"markdown","apiKey":"YOUR_API_KEY"}'
Python SDK
from ailang_parse import DocParse
client = DocParse(api_key="YOUR_API_KEY")
result = client.parse("presentation.pptx", output_format="json")
# Iterate slides
for block in result.blocks:
if block.kind == "slide":
for child in block.blocks:
if child.type == "heading":
print(f"Slide: {child.text}")
if child.style == "speaker-notes":
print(f" Notes: {child.text[:80]}...")
Parse in Browser API Reference
Frequently Asked Questions
How do I parse a PPTX file to JSON?
Send the .pptx file to the AILANG Parse API or use the Python/JS/Go SDK. The parser reads PresentationML directly and returns structured JSON with slide content, notes, and tables.
Are speaker notes extracted from PPTX files?
Yes. Notes from notesSlide XML are extracted as text blocks with style: "speaker-notes", associated with their parent slide.
How are tables in slides handled?
Tables are extracted with full row/column structure, merged cells, and cell content. See table extraction docs.
Does PPTX parsing preserve slide order?
Yes. Slides are parsed in presentation order from presentation.xml. Each slide becomes a numbered section block.
What about shapes and text boxes in slides?
All text-bearing shapes (titles, subtitles, content placeholders, freeform text boxes) are extracted and ordered by placeholder type then position.
How fast is PPTX parsing?
Typical presentations parse in under 50ms. No rendering engine or LibreOffice dependency. 93.9% composite on OfficeDocBench.