The Problem
Track changes are the negotiation history of a document. In legal contract review, every insertion and deletion carries meaning: who changed what, when, and why.
Yet every document parser on the market throws them away.
How It Works
A DOCX file is a ZIP archive containing XML files governed by the ECMA-376 (Office Open XML) standard. Track changes are stored as inline revision markup within the document body XML (word/document.xml).
AILANG Parse reads these XML elements directly:
w:ins— Insertions. The element wraps the inserted content and carriesw:authorandw:dateattributes.w:del— Deletions. Wraps the deleted content (which Word preserves in the XML even though it is struck through in the UI).w:rPrChange— Run property changes. Records formatting modifications (bold, italic, font size, color) with before/after state.
Each revision element includes the author name and an ISO 8601 timestamp, both extracted verbatim. AILANG Parse does not interpret or resolve the changes — it preserves them as structured data so downstream consumers can decide how to handle them.
The Block ADT
AILANG Parse represents all document content as a flat list of typed blocks. The Block ADT has 9 variants: Text, Heading, Table, Image, Audio, Video, List, Section, and Change.
Track changes map to the Change variant, which carries:
- type —
"insertion","deletion", or"formatChange" - author — the author name from the revision markup
- date — the ISO 8601 timestamp
- content — the affected text
This means track changes are first-class citizens in the output, not annotations or metadata bolted onto text blocks. They flow through the same pipeline as every other block type — into JSON, into format conversion, into Quarto Markdown (which renders them as CriticMarkup).
What Gets Extracted
| Change Type | OOXML Element | Extracted Data |
|---|---|---|
| Insertions | w:ins | Author, date, inserted text content |
| Deletions | w:del | Author, date, deleted text content (preserved from XML) |
| Formatting | w:rPrChange | Author, date, property change description |
| Move from | w:moveFrom | Author, date, original location content |
| Move to | w:moveTo | Author, date, destination content |
| Section props | w:sectPrChange | Author, date, section property modification |
| Paragraph props | w:pPrChange | Author, date, paragraph formatting change |
All change types preserve the full author string and ISO 8601 date from the XML. Nothing is inferred, approximated, or dropped.
Example Output
Given a DOCX contract with tracked edits from two authors, AILANG Parse produces output like this:
[
{
"type": "Heading",
"level": 1,
"content": "Service Agreement"
},
{
"type": "Text",
"content": "This agreement is entered into between Party A and Party B."
},
{
"type": "Change",
"changeType": "deletion",
"author": "Sarah Chen",
"date": "2026-03-15T14:32:00Z",
"content": "30 calendar days"
},
{
"type": "Change",
"changeType": "insertion",
"author": "Sarah Chen",
"date": "2026-03-15T14:32:00Z",
"content": "45 business days"
},
{
"type": "Text",
"content": "written notice to the other party."
},
{
"type": "Change",
"changeType": "insertion",
"author": "James Park",
"date": "2026-03-17T09:15:00Z",
"content": "Termination for cause requires documented breach with a 15-day cure period."
},
{
"type": "Change",
"changeType": "formatChange",
"author": "James Park",
"date": "2026-03-17T09:16:00Z",
"content": "Indemnification clause modified (bold applied)"
}
]
Every change is a discrete block with its own author and timestamp. Downstream consumers can filter by author, reconstruct the document at any point in its revision history, or render the changes visually (as AILANG Parse does when converting to Quarto Markdown with CriticMarkup).
Other Parsers
No other document parser we tested extracts track changes from DOCX files. This is not a matter of degree — the feature is entirely absent.
| Parser | Track Changes | Notes |
|---|---|---|
| AILANG Parse | Full extraction — author, date, type, content | Reads w:ins, w:del, w:rPrChange directly from OOXML |
| Unstructured | Not extracted | Converts DOCX to text; revision markup is discarded |
| Docling | Not extracted | Converts DOCX to PDF internally; track changes lost in conversion |
| LlamaParse | Not extracted | Cloud API returns accepted-view text only |
| MarkItDown | Not extracted | Converts DOCX to Markdown; revision markup is stripped |
The reason is architectural. PDF-first parsers lose track changes in the rendering step, and text extraction libraries do not handle revision markup. AILANG Parse reads the OOXML revision elements directly.
Try It
Upload a DOCX with track changes and see the Change blocks appear in the output. Compare it to what your current parser returns — the revision data is either there or it is not.
Frequently Asked Questions
How do I extract track changes from a DOCX file programmatically?
AILANG Parse reads w:ins, w:del, and w:moveTo/w:moveFrom revision markup directly from the DOCX XML, returning each revision with author, timestamp, and change type.
Do other document parsers preserve track changes from DOCX files?
No. All major parsers either convert to PDF (stripping revisions) or use libraries that ignore revision elements. See the comparison page for details.
Can I get track change author and date information from a DOCX?
Yes. AILANG Parse extracts w:author and w:date from every revision element, with ISO 8601 timestamps for filtering, sorting, or audit trails.
What does deterministic document parsing mean?
Same input always produces the same output. AILANG Parse uses rule-based extraction verified by Z3 formal contracts — no ML models, no variability.
Format Guides
Track changes are supported in DOCX and PPTX files. See the full format guides for everything AILANG Parse extracts: