PPTX Parsing API — Parse PowerPoint to Structured JSON

Q: How do I parse a PPTX file to JSON?

Send the .pptx file to the AILANG Parse API or use the Python/JS/Go SDK. The parser reads PresentationML directly and returns structured JSON with slide content, speaker notes, tables, and metadata.

Q: Are speaker notes extracted from PPTX files?

Yes. Speaker notes from notesSlide XML are extracted as separate blocks associated with their slide. The notes content is fully preserved with formatting.

Q: How are tables in slides handled?

Tables within slides are extracted with full row/column structure, merged cells (gridSpan/vMerge), and cell content. They appear as table blocks within their parent slide section.

Q: Does PPTX parsing preserve slide order?

Yes. Slides are parsed in presentation order as defined in presentation.xml. Each slide becomes a section block numbered by its position.

Q: How fast is PPTX parsing?

Typical presentations parse in under 50ms. The parser reads PresentationML directly with no rendering engine or LibreOffice dependency. 93.9% composite on OfficeDocBench.

Presentations Are Visual Containers That Defeat Text Extraction

A .pptx file is a zip archive containing PresentationML — XML that encodes slides, shapes, text frames, tables, and speaker notes. Most parsers render slides to images or flatten them to plain text, losing the distinction between title, body, notes, and table content.

AILANG Parse reads PresentationML directly. Each slide becomes a section block with typed content: titles, body text, tables, and speaker notes are all separate blocks. The parser extracts from shape placeholders in layout order, so content appears in reading sequence.

Speaker notes often contain more substance than the slide itself — talking points, data sources, follow-up actions. PDF-first parsers drop them entirely because notes aren't part of the rendered slide.

Raw PresentationML vs Structured Output

Raw slide1.xml + notesSlide1.xml

<p:sld>
  <p:cSld>
    <p:spTree>
      <p:sp>
        <p:nvSpPr><p:nvPr>
          <p:ph type="title"/>
        </p:nvPr></p:nvSpPr>
        <p:txBody>
          <a:p><a:r>
            <a:t>Q1 Results</a:t>
          </a:r></a:p>
        </p:txBody>
      </p:sp>
      <p:sp>
        <p:nvSpPr><p:nvPr>
          <p:ph idx="1"/>
        </p:nvPr></p:nvSpPr>
        <p:txBody>
          <a:p><a:r>
            <a:t>Revenue up 15% YoY</a:t>
          </a:r></a:p>
          <a:p><a:r>
            <a:t>New markets: APAC, LATAM</a:t>
          </a:r></a:p>
        </p:txBody>
      </p:sp>
    </p:spTree>
  </p:cSld>
</p:sld>

<!-- notesSlide1.xml -->
<p:notes>
  <p:cSld>
    <p:spTree>
      <p:sp><p:txBody>
        <a:p><a:r>
          <a:t>Mention APAC expansion
timeline: Q3 launch</a:t>
        </a:r></a:p>
      </p:txBody></p:sp>
    </p:spTree>
  </p:cSld>
</p:notes>

Structured output

{
  "metadata": {
    "title": "Q1 Results",
    "author": "Marketing Team",
    "slides": 24
  },
  "blocks": [
    {
      "type": "section",
      "kind": "slide",
      "blocks": [
        {
          "type": "heading",
          "level": 1,
          "text": "Q1 Results"
        },
        {
          "type": "text",
          "text": "Revenue up 15% YoY"
        },
        {
          "type": "text",
          "text": "New markets: APAC, LATAM"
        },
        {
          "type": "text",
          "style": "speaker-notes",
          "text": "Mention APAC expansion
timeline: Q3 launch"
        }
      ]
    }
  ]
}

What Gets Extracted

Slide Text & Titles

Title and subtitle placeholders are extracted as headings. Body content placeholders become text blocks. Text is ordered by placeholder type (title first) then by position on the slide.

Speaker Notes

Notes from notesSlide XML are extracted as text blocks with style: "speaker-notes", associated with their parent slide. The content that presenters actually say — not just what the audience sees.

Tables within Slides

Slide-embedded tables are extracted with full row/column structure, merged cells, and cell content. They appear as table blocks within their parent slide section. Deep dive →

Shape & Text Box Content

All text-bearing shapes are extracted: content placeholders, freeform text boxes, callout shapes. Text within grouped shapes is flattened into reading order.

Slide Ordering

Slides are parsed in presentation order as defined in presentation.xml. Each slide becomes a numbered SectionBlock with kind: "slide".

Comments & Metadata

Slide-level comments are extracted with author and date. Presentation metadata (title, author, slide count, created date) comes from core properties. Comment details →

Speaker Notes

PPTX stores speaker notes in separate notesSlide XML files, one per slide. These contain the presenter's talking points, data sources, and context that doesn't appear on the projected slide. AILANG Parse extracts notes as text blocks with style: "speaker-notes".

This is critical for AI applications: the notes often contain more actionable content than the slide bullet points. A training deck's slide says "Security Best Practices" — the notes contain the actual procedures and exceptions.

[
  {"type": "heading", "level": 1, "text": "Security Best Practices"},
  {"type": "text", "text": "Follow the principle of least privilege"},
  {"type": "text", "text": "Rotate credentials every 90 days"},
  {"type": "text", "style": "speaker-notes",
   "text": "Exception: service accounts in prod-critical path use 180-day rotation per SEC-2024-003. Mention the incident from January if anyone pushes back on rotation frequency."}
]

Tables in Slides

Presentation slides frequently contain data tables — quarterly results, comparison matrices, feature lists. These use the same a:tbl markup as DOCX tables, with gridSpan for horizontal merges and vMerge for vertical merges.

{
  "type": "table",
  "headers": [
    {"text": "Feature"},
    {"text": "Free"},
    {"text": "Pro"},
    {"text": "Business"}
  ],
  "rows": [
    [{"text": "API Requests"}, {"text": "1K/mo"}, {"text": "100K/mo"}, {"text": "500K/mo"}],
    [{"text": "File Size"}, {"text": "10 MB"}, {"text": "25 MB"}, {"text": "50 MB"}],
    [{"text": "Support"}, {"text": "Community"}, {"text": "Email"}, {"text": "Dedicated"}]
  ]
}

See Tables & Merged Cells for the full specification.

Use Cases

Training Material Indexing

Parse corporate training decks into structured blocks. Speaker notes contain the detailed procedures and context that slides only summarize. Index both slide content and notes for enterprise search and knowledge base systems.

Meeting Deck Summarization

Feed structured slide JSON to an LLM for meeting prep summaries. The model sees title hierarchy, bullet points, data tables, and speaker notes as separate typed blocks — producing better summaries than working from flattened text.

Compliance Presentation Review

Regulatory presentations (board reports, audit findings, risk assessments) need systematic review. Structured parsing enables automated checks: required sections present, data tables match expected formats, comments from reviewers captured.

Sales Enablement Search

Sales teams maintain hundreds of decks. Parse them into structured JSON for semantic search: find slides about specific products, pricing tables, competitive comparisons. Speaker notes surface the positioning and objection handling that isn't on the slide.

Presentation-to-Document

Convert PPTX to Markdown, HTML, or Quarto for documentation. Slide titles become headings, bullet points become lists, tables are preserved, and speaker notes can be included as annotations or footnotes.

AI Slide Analysis

Give LLMs the full slide context: title, body, table data, and speaker notes as typed blocks. The model can identify key messages, extract action items from notes, and cross-reference data tables across slides — tasks impossible with flat text extraction.

Try It

CLI

# Parse a PPTX file
ailang run --entry main --caps IO,FS,Env \
  docparse/main.ail presentation.pptx

# Convert PPTX to Markdown
./bin/docparse presentation.pptx --convert output.md

# Convert PPTX to HTML
./bin/docparse presentation.pptx --convert output.html

API

curl -X POST https://docparse.ailang.sunholo.com/api/v1/parse \
  -H "Content-Type: application/json" \
  -d '{"filepath":"sample_pptx_formatting","outputFormat":"markdown","apiKey":"YOUR_API_KEY"}'

Python SDK

from ailang_parse import DocParse

client = DocParse(api_key="YOUR_API_KEY")
result = client.parse("presentation.pptx", output_format="json")

# Iterate slides
for block in result.blocks:
    if block.kind == "slide":
        for child in block.blocks:
            if child.type == "heading":
                print(f"Slide: {child.text}")
            if child.style == "speaker-notes":
                print(f"  Notes: {child.text[:80]}...")

Parse in Browser API Reference

Frequently Asked Questions

How do I parse a PPTX file to JSON?

Send the .pptx file to the AILANG Parse API or use the Python/JS/Go SDK. The parser reads PresentationML directly and returns structured JSON with slide content, notes, and tables.

Are speaker notes extracted from PPTX files?

Yes. Notes from notesSlide XML are extracted as text blocks with style: "speaker-notes", associated with their parent slide.

How are tables in slides handled?

Tables are extracted with full row/column structure, merged cells, and cell content. See table extraction docs.

Does PPTX parsing preserve slide order?

Yes. Slides are parsed in presentation order from presentation.xml. Each slide becomes a numbered section block.

What about shapes and text boxes in slides?

All text-bearing shapes (titles, subtitles, content placeholders, freeform text boxes) are extracted and ordered by placeholder type then position.

How fast is PPTX parsing?

Typical presentations parse in under 50ms. No rendering engine or LibreOffice dependency. 93.9% composite on OfficeDocBench.