The artificial complexity of OOXML files (the DOCX case)
The complexity of the OOXML format is linked to its design and was deliberately created to make the format more difficult for non-Microsoft software developers to implement. Compatibility issues are caused by a veritable “maze” of tags used even for the simplest content, which binds users to the Microsoft ecosystem in the first example of standard-based lock-in.
The DOCX case
To demonstrate the difference in complexity between the XML schemas of Writer and Word text documents in the ODF and OOXML formats, I used two classic English theatre plays: William Shakespeare’s Hamlet and Oscar Wilde’s The Importance of Being Earnest. I downloaded the text versions of these works from Project Gutenberg — a library of classic texts for which US copyright has expired — and deleted the introductions and conclusions added by Project Gutenberg, without making any other changes.
I then repeated this process for both documents.
I copied all the text and pasted it, without any formatting, into two newly created blank documents. For Writer, I used the template that I usually use for unstructured documents; for Word, I did not use a template. This means that, in Writer, the XML schema contains information about the template (margins, paragraph and font formatting), but this does not increase its complexity.
To perform the analysis, I duplicated and renamed the two files, replacing the original extension with “ZIP”, and then decompressed them to create two folders containing all the files of the respective XML schemas.
The LibreOffice folder contained three subfolders and six files with the same names as those in the ODS file examined last week, as would be expected of a standard aiming to simplify life for developers and users. All the content is actually in the content.xml file, while the other files contain instructions for displaying the text document correctly.
The Microsoft 365 folder contains three subfolders and the [Content_Types].xml file, as with the XLSX file examined last week. One of the subfolders has a different name, but this is related to the application and does not increase complexity. Opening the [Content_Types].xml file provides information about the other files, including those in the subfolders.
In this case, the content is in the document.xml file inside the Word folder, which contains folders and files that differ completely from those in the XLSX file. Again, there is no technical reason for this difference in the XML schemas of the two files other than to make their internal structures different and more complex.
Let’s now analyse William Shakespeare’s Hamlet and then Oscar Wilde’s The Importance of Being Earnest.
Here is the PDF of Hamlet:
hamlet
The difference in complexity between the document.xml and content.xml files is striking when you compare their lengths: the content.xml file has 6,802 lines, while the document.xml file has 60,245 lines, compared to a text document of 5,566 lines.
Let us now compare the two files’ XML schemas from the beginning to the end of the introduction.
CONTENT.XML
<office:body>
<office:text text:use-soft-page-breaks=”true”>
<office:forms form:automatic-focus=”false” form:apply-design-mode=”false”/>
<text:sequence-decls>
<text:sequence-decl text:display-outline-level=”0″ text:name=”Illustration”/>
<text:sequence-decl text:display-outline-level=”0″ text:name=”Table”/>
<text:sequence-decl text:display-outline-level=”0″ text:name=”Text”/>
<text:sequence-decl text:display-outline-level=”0″ text:name=”Drawing”/>
<text:sequence-decl text:display-outline-level=”0″ text:name=”Figure”/>
</text:sequence-decls>
<text:p text:style-name=”P1″>THE TRAGEDY OF HAMLET, PRINCE OF DENMARK</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″>by William Shakespeare</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″>Contents</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″><text:s/>ACT I</text:p>
<text:p text:style-name=”P1″><text:s/>Scene I. Elsinore. A platform before the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene II. Elsinore. A room of state in the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene III. A room in Polonius’s house</text:p>
<text:p text:style-name=”P1″><text:s/>Scene IV. The platform</text:p>
<text:p text:style-name=”P1″><text:s/>Scene V. A more remote part of the Castle</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″><text:s/>ACT II</text:p>
<text:p text:style-name=”P1″><text:s/>Scene I. A room in Polonius’s house</text:p>
<text:p text:style-name=”P1″><text:s/>Scene II. A room in the Castle</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″><text:s/>ACT III</text:p>
<text:p text:style-name=”P1″><text:s/>Scene I. A room in the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene II. A hall in the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene III. A room in the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene IV. Another room in the Castle</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″><text:s/>ACT IV</text:p>
<text:p text:style-name=”P1″><text:s/>Scene I. A room in the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene II. Another room in the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene III. Another room in the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene IV. A plain in Denmark</text:p>
<text:p text:style-name=”P1″><text:s/>Scene V. Elsinore. A room in the Castle</text:p>
<text:p text:style-name=”P1″><text:soft-page-break/><text:s/>Scene VI. Another room in the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene VII. Another room in the Castle</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″><text:s/>ACT V</text:p>
<text:p text:style-name=”P1″><text:s/>Scene I. A churchyard</text:p>
<text:p text:style-name=”P1″><text:s/>Scene II. A hall in the Castle</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″>Dramatis Personæ</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″>HAMLET, Prince of Denmark</text:p>
<text:p text:style-name=”P1″>CLAUDIUS, King of Denmark, Hamlet’s uncle</text:p>
<text:p text:style-name=”P1″>The GHOST of the late king, Hamlet’s father</text:p>
<text:p text:style-name=”P1″>GERTRUDE, the Queen, Hamlet’s mother, now wife of Claudius</text:p>
<text:p text:style-name=”P1″>POLONIUS, Lord Chamberlain</text:p>
<text:p text:style-name=”P1″>LAERTES, Son to Polonius</text:p>
<text:p text:style-name=”P1″>OPHELIA, Daughter to Polonius</text:p>
<text:p text:style-name=”P1″>HORATIO, Friend to Hamlet</text:p>
<text:p text:style-name=”P1″>FORTINBRAS, Prince of Norway</text:p>
<text:p text:style-name=”P1″>VOLTEMAND, Courtier</text:p>
<text:p text:style-name=”P1″>CORNELIUS, Courtier</text:p>
<text:p text:style-name=”P1″>ROSENCRANTZ, Courtier</text:p>
<text:p text:style-name=”P1″>GUILDENSTERN, Courtier</text:p>
<text:p text:style-name=”P1″>MARCELLUS, Officer</text:p>
<text:p text:style-name=”P1″>BARNARDO, Officer</text:p>
<text:p text:style-name=”P1″>FRANCISCO, a Soldier</text:p>
<text:p text:style-name=”P1″>OSRIC, Courtier</text:p>
<text:p text:style-name=”P1″>REYNALDO, Servant to Polonius</text:p>
<text:p text:style-name=”P1″>Players</text:p>
<text:p text:style-name=”P1″>A Gentleman, Courtier</text:p>
<text:p text:style-name=”P1″>A Priest</text:p>
<text:p text:style-name=”P1″><text:soft-page-break/>Two Clowns, Grave-diggers</text:p>
<text:p text:style-name=”P1″>A Captain</text:p>
<text:p text:style-name=”P1″>English Ambassadors.</text:p>
<text:p text:style-name=”P1″>Lords, Ladies, Officers, Soldiers, Sailors, Messengers, and Attendants</text:p>
It is a reasonably complex XML file. After the initial instructions on the sequence of content, the text of the tragedy can easily be located alongside the sequence of the five acts and the descriptions of the dramatis personae.
DOCUMENT.XML
<w:body>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” wp14:paraId=”73C9069B” wp14:textId=”09294AE1″>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>THE TRAGEDY OF HAMLET, PRINCE OF DENMARK</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”3AA342A9″ wp14:textId=”00E76CB9″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”1B479704″ wp14:textId=”129900F6″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>by William Shakespeare</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”6C6F02DD” wp14:textId=”1D8A204F”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”1F7D25FF” wp14:textId=”4619853B”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”50471715″ wp14:textId=”708F3004″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”7E2BE7BA” wp14:textId=”48A7F848″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”5B16A927″ wp14:textId=”10A9E3F9″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Contents</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”429A556E” wp14:textId=”6416D4DB”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”1C38A773″ wp14:textId=”4F3F8ED2″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>ACT I</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”59328897″ wp14:textId=”21C9F129″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene I. Elsinore. A platform before the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”30C9E582″ wp14:textId=”0A7616FF”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene II. Elsinore. A room of state in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”5EAB6C01″ wp14:textId=”70B75214″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene III. A room in Polonius’s house</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”232393A3″ wp14:textId=”069440B2″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene IV. The platform</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”65E1A75F” wp14:textId=”1E769B73″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene V. A more remote part of the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”6C6D2F5C” wp14:textId=”13700863″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”339DBFF3″ wp14:textId=”4AF718C4″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>ACT II</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”774AAE76″ wp14:textId=”3F8EE2B8″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene I. A room in Polonius’s house</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”526BCABF” wp14:textId=”441F6801″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene II. A room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”6A1841AB” wp14:textId=”1FBE8D34″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”577B4504″ wp14:textId=”1BF167DB”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>ACT III</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”2724CC9A” wp14:textId=”293764E9″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene I. A room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”04FF9ABE” wp14:textId=”30F918C2″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene II. A hall in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”338872C6″ wp14:textId=”1F0AFFE6″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene III. A room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”46D240C2″ wp14:textId=”3D28AE8B”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene IV. Another room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”64F40DC7″ wp14:textId=”16C2A388″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”4B538D6F” wp14:textId=”7CB11368″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>ACT IV</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”2508ABE7″ wp14:textId=”4925909D”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene I. A room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”5ABD1B8F” wp14:textId=”68A02D9E”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene II. Another room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”57D2E145″ wp14:textId=”08927478″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene III. Another room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”2BA12E96″ wp14:textId=”1E35C8BC”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene IV. A plain in Denmark</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”4DF8BEC9″ wp14:textId=”67676CF3″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene V. Elsinore. A room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”17EE90DC” wp14:textId=”708C9696″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene VI. Another room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”3302F704″ wp14:textId=”2ADB2A66″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene VII. Another room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”0F7C9E68″ wp14:textId=”5D706618″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”1091F950″ wp14:textId=”2EE5201C”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>ACT V</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”6E162B67″ wp14:textId=”10199C37″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene I. A churchyard</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”1A2FA647″ wp14:textId=”683EF1FA”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene II. A hall in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”62A90ACE” wp14:textId=”156F1611″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”557F5426″ wp14:textId=”05194972″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”011BF8B2″ wp14:textId=”175BE494″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”4BB65B79″ wp14:textId=”7256A412″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”1EEEFC18″ wp14:textId=”2D4F2D20″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Dramatis Personæ</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”69D361D6″ wp14:textId=”0A66ADE7″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”7198BA63″ wp14:textId=”0ECB601B”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>HAMLET, Prince of Denmark</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”7A30698D” wp14:textId=”2A3EE787″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>CLAUDIUS, King of Denmark, Hamlet’s uncle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”7D437DFF” wp14:textId=”0C3AFC43″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>The GHOST of the late king, Hamlet’s father</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”663C7E0E” wp14:textId=”4F1E93F2″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>GERTRUDE, the Queen, Hamlet’s mother, now wife of Claudius</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”1EE14B03″ wp14:textId=”567F43B4″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>POLONIUS, Lord Chamberlain</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”7A4F8A78″ wp14:textId=”39759F7E”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>LAERTES, Son to Polonius</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”11E371D7″ wp14:textId=”36CD515A”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>OPHELIA, Daughter to Polonius</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”2D438C1E” wp14:textId=”7211E8E5″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>HORATIO, Friend to Hamlet</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”4E6B50D6″ wp14:textId=”559117D7″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>FORTINBRAS, Prince of Norway</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”1B5B4955″ wp14:textId=”599A64FC”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>VOLTEMAND, Courtier</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”32BA9096″ wp14:textId=”6E8C2728″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>CORNELIUS, Courtier</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”60FD9B45″ wp14:textId=”2F2E3956″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>ROSENCRANTZ, Courtier</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”0CC7985B” wp14:textId=”56DED383″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>GUILDENSTERN, Courtier</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”775EA68F” wp14:textId=”089F9982″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>MARCELLUS, Officer</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”4E2AEAC2″ wp14:textId=”34855F77″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>BARNARDO, Officer</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”6DB5A437″ wp14:textId=”146C2E48″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>FRANCISCO, a Soldier</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”389BDBAC” wp14:textId=”0B30EC2E”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>OSRIC, Courtier</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”12730B2E” wp14:textId=”60DC1BFE”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>REYNALDO, Servant to Polonius</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”7FA85C5A” wp14:textId=”3D66976B”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Players</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”2F38E070″ wp14:textId=”309A60BF”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>A Gentleman, Courtier</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”53493710″ wp14:textId=”48B3D2A5″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>A Priest</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”611C5F94″ wp14:textId=”22FB27D4″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Two Clowns, Grave-diggers</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”577DC4BA” wp14:textId=”2FD3CAA0″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>A Captain</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”0BAF6209″ wp14:textId=”35658011″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>English Ambassadors.</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”260F5D8D” wp14:textId=”0FC10ABC”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Lords, Ladies, Officers, Soldiers, Sailors, Messengers, and Attendants</w:t>
</w:r>
</w:p>
This XML file contains a lot of repetition. Is it really necessary to include the same information about the schema and style for every line of content, including empty ones? I doubt it personally, but since I am not a technician, I am willing to listen to reasons from those who argue that this content is essential and not artificial complexity.
Let’s see if the same thing happens with Oscar Wilde’s ‘The Importance of Being Earnest’. Here is the PDF:
earnest
In this case, the artificial complexity of the Word document is less apparent, as the content.xml file has 3,974 lines compared to a text document of 3,885 lines, and the document.xml file has 8,610 lines. Therefore, we have gone from a file that is almost ten times longer in terms of the number of lines to a file that is just over twice as long. This difference can be explained by comparing the first lines of the two files’ XML schemas (only those with content).
CONTENT.XML
<text:p text:style-name=”P1″>The Importance of Being Earnest</text:p>
<text:p text:style-name=”P1″>A Trivial Comedy for Serious People</text:p>
<text:p text:style-name=”P1″>THE PERSONS IN THE PLAY</text:p>
<text:p text:style-name=”P1″>John Worthing, J.P.</text:p>
<text:p text:style-name=”P1″>Algernon Moncrieff</text:p>
<text:p text:style-name=”P1″>Rev. Canon Chasuble, D.D.</text:p>
<text:p text:style-name=”P1″>Merriman, Butler</text:p>
<text:p text:style-name=”P1″>Lane, Manservant</text:p>
<text:p text:style-name=”P1″>Lady Bracknell</text:p>
<text:p text:style-name=”P1″>Hon. Gwendolen Fairfax</text:p>
<text:p text:style-name=”P1″>Cecily Cardew</text:p>
<text:p text:style-name=”P1″>Miss Prism, Governess</text:p>
<text:p text:style-name=”P1″>THE SCENES OF THE PLAY</text:p>
<text:p text:style-name=”P1″>ACT I. Algernon Moncrieff’s Flat in Half-Moon Street, W.</text:p>
<text:p text:style-name=”P1″>ACT II. The Garden at the Manor House, Woolton.</text:p>
<text:p text:style-name=”P1″>ACT III. Drawing-Room at the Manor House, Woolton.</text:p>
<text:p text:style-name=”P1″>TIME: The Present.</text:p>
<text:p text:style-name=”P1″>LONDON: ST. JAMES’S THEATRE</text:p>
<text:p text:style-name=”P1″>Lessee and Manager: Mr. George Alexander</text:p>
<text:p text:style-name=”P1″>February 14th, 1895</text:p>
<text:p text:style-name=”P1″>John Worthing, J.P.: Mr. George Alexander.</text:p>
<text:p text:style-name=”P1″>Algernon Moncrieff: Mr. Allen Aynesworth.</text:p>
<text:p text:style-name=”P1″>Rev. Canon Chasuble, D.D.: Mr. H. H. Vincent.</text:p>
<text:p text:style-name=”P1″>Merriman: Mr. Frank Dyall.</text:p>
<text:p text:style-name=”P1″>Lane: Mr. F. Kinsey Peile.</text:p>
<text:p text:style-name=”P1″>Lady Bracknell: Miss Rose Leclercq.</text:p>
<text:p text:style-name=”P1″>Hon. Gwendolen Fairfax: Miss Irene Vanbrugh.</text:p>
<text:p text:style-name=”P1″>Cecily Cardew: Miss Evelyn Millard.</text:p>
<text:p text:style-name=”P1″>Miss Prism: Mrs. George Canninge.</text:p>
<text:p text:style-name=”P1″>FIRST ACT</text:p>
<text:p text:style-name=”P1″>SCENE</text:p>
<text:p text:style-name=”P1″>Morning-room in Algernon’s flat in Half-Moon Street. The room is</text:p>
<text:p text:style-name=”P1″>luxuriously and artistically furnished. The sound of a piano is heard</text:p>
<text:p text:style-name=”P1″>in the adjoining room.</text:p>
<text:p text:style-name=”P1″>[Lane is arranging afternoon tea on the table, and after the music has</text:p>
<text:p text:style-name=”P1″>ceased, Algernon enters.]</text:p>
<text:p text:style-name=”P1″>ALGERNON.</text:p>
<text:p text:style-name=”P1″>Did you hear what I was playing, Lane?</text:p>
<text:p text:style-name=”P1″>LANE.</text:p>
<text:p text:style-name=”P1″>I didn’t think it polite to listen, sir.</text:p>
<text:p text:style-name=”P1″><text:soft-page-break/>ALGERNON.</text:p>
<text:p text:style-name=”P1″>I’m sorry for that, for your sake. I don’t play accurately—any one can</text:p>
<text:p text:style-name=”P1″>play accurately—but I play with wonderful expression. As far as the</text:p>
<text:p text:style-name=”P1″>piano is concerned, sentiment is my forte. I keep science for Life.</text:p>
<text:p text:style-name=”P1″>LANE.</text:p>
<text:p text:style-name=”P1″>Yes, sir.</text:p>
<text:p text:style-name=”P1″>ALGERNON.</text:p>
<text:p text:style-name=”P1″>And, speaking of the science of Life, have you got the cucumber</text:p>
<text:p text:style-name=”P1″>sandwiches cut for Lady Bracknell?</text:p>
<text:p text:style-name=”P1″>LANE.</text:p>
<text:p text:style-name=”P1″>Yes, sir. [Hands them on a salver.]</text:p>
<text:p text:style-name=”P1″>ALGERNON.</text:p>
<text:p text:style-name=”P1″>[Inspects them, takes two, and sits down on the sofa.] Oh! . . . by the</text:p>
<text:p text:style-name=”P1″>way, Lane, I see from your book that on Thursday night, when Lord</text:p>
<text:p text:style-name=”P1″>Shoreman and Mr. Worthing were dining with me, eight bottles of</text:p>
<text:p text:style-name=”P1″>champagne are entered as having been consumed.</text:p>
<text:p text:style-name=”P1″>LANE.</text:p>
<text:p text:style-name=”P1″>Yes, sir; eight bottles and a pint.</text:p>
<text:p text:style-name=”P1″>ALGERNON.</text:p>
<text:p text:style-name=”P1″>Why is it that at a bachelor’s establishment the servants invariably</text:p>
<text:p text:style-name=”P1″>drink the champagne? I ask merely for information.</text:p>
<text:p text:style-name=”P1″>LANE.</text:p>
<text:p text:style-name=”P1″>I attribute it to the superior quality of the wine, sir. I have often</text:p>
<text:p text:style-name=”P1″>observed that in married households the champagne is rarely of a</text:p>
<text:p text:style-name=”P1″>first-rate brand.</text:p>
<text:p text:style-name=”P1″><text:soft-page-break/>ALGERNON.</text:p>
<text:p text:style-name=”P1″>Good heavens! Is marriage so demoralising as that?</text:p>
DOCUMENT.XML
<w:t>The Importance of Being Earnest</w:t>
<w:t>A Trivial Comedy for Serious People</w:t>
<w:t>THE PERSONS IN THE PLAY</w:t>
<w:t>John Worthing, J.P. Algernon Moncrieff Rev. Canon Chasuble, D.D. Merriman, Butler Lane, Manservant Lady Bracknell Hon. Gwendolen Fairfax Cecily Cardew Miss Prism, Governess</w:t>
<w:t>THE SCENES OF THE PLAY</w:t>
<w:t>ACT I. Algernon Moncrieff’s Flat in Half-Moon Street, W.</w:t>
<w:t>ACT II. The Garden at the Manor House, Woolton.</w:t>
<w:t>ACT III. Drawing-Room at the Manor House, Woolton.</w:t>
<w:t>TIME: The Present.</w:t>
<w:t>LONDON: ST. JAMES’S THEATRE</w:t>
<w:t>Lessee and Manager: Mr. George Alexander</w:t>
<w:t>February 14th, 1895</w:t>
<w:t>John Worthing, J.P.: Mr. George Alexander. Algernon Moncrieff: Mr. Allen Aynesworth. Rev. Canon Chasuble, D.D.: Mr. H. H. Vincent. Merriman: Mr. Frank Dyall. Lane: Mr. F. Kinsey Peile. Lady Bracknell: Miss Rose Leclercq. Hon. Gwendolen Fairfax: Miss Irene Vanbrugh. Cecily Cardew: Miss Evelyn Millard. Miss Prism: Mrs. George Canninge.</w:t>
<w:t>FIRST ACT</w:t>
<w:t>SCENE</w:t>
<w:t>Morning-room in Algernon’s flat in Half-Moon Street. The room is luxuriously and artistically furnished. The sound of a piano is heard in the adjoining room.</w:t>
<w:t>[Lane is arranging afternoon tea on the table, and after the music has ceased, Algernon enters.]</w:t>
<w:t>ALGERNON. Did you hear what I was playing, Lane?</w:t>
<w:t>LANE. I didn’t think it polite to listen, sir.</w:t>
<w:t>ALGERNON. I’m sorry for that, for your sake. I don’t play accurately—any one can play accurately—but I play with wonderful expression. As far as the piano is concerned, sentiment is my forte. I keep science for Life.</w:t>
<w:t>LANE. Yes, sir.</w:t>
<w:t>ALGERNON. And, speaking of the science of Life, have you got the cucumber sandwiches cut for Lady Bracknell?</w:t>
<w:t>LANE. Yes, sir. [Hands them on a salver.]</w:t>
<w:t>ALGERNON. [Inspects them, takes two, and sits down on the sofa.] Oh! . . . by the way, Lane, I see from your book that on Thursday night, when Lord Shoreman and Mr. Worthing were dining with me, eight bottles of champagne are entered as having been consumed.</w:t>
<w:t>LANE. Yes, sir; eight bottles and a pint.</w:t>
<w:t>ALGERNON. Why is it that at a bachelor’s establishment the servants invariably drink the champagne? I ask merely for information.</w:t>
<w:t>LANE. I attribute it to the superior quality of the wine, sir. I have often observed that in married households the champagne is rarely of a first-rate brand.</w:t>
<w:t>ALGERNON. Good heavens! Is marriage so demoralising as that?</w:t>
While the content.xml file retains all the line breaks (hard returns) of the text document, the document.xml file “reinterprets” the text, reconstructing all the paragraphs even when this makes no sense, as with lists of characters and the actors who play them. It also adds punctuation that does not exist in the text file, such as commas to replace hard returns. This is why the file is shorter than the “Hamlet” file, but it introduces an arbitrary “simplification” that does not respect the original document.
Until today, I was convinced that the XML schema of OOXML files was unnecessarily complex for the reasons I have explained at length on several occasions. However, it is not only unnecessarily complex, but also unnecessarily “creative” (always complicating the lives of developers and users).
Conclusions
Unfortunately, the reality is what I have explained several times, without going into technical detail. This has been confirmed by more technical analyses of XLSX and DOCX files, and I believe it will also be confirmed by next week’s PPTX file analysis. Microsoft has created an unnecessarily complex and incomprehensibly creative file format, which complicates the lives of developers and users more than I thought.
Indeed, while it is challenging to manage artificial complexity, it is arguably impossible to manage “creativity” that reinterprets the contents of a document by inventing paragraphs where it might make sense — albeit with a faithful format — and where it makes no sense, as with lists.
Perhaps, in my personal opinion, “creativity” was introduced to make it difficult for companies based in countries where reverse engineering is not illegal to emulate the OOXML format, as I don’t believe “creative” reverse engineering is possible, even with the help of AI.
Users should protect their rights by choosing an open standard format, such as ODF, which gives them control over their content and everything that this entails, including privacy protection, proper management of sensitive data and the ability to decide what to share and with whom.
This is a format whose development process, characteristics, and version are known; whose description corresponds to what happens on the user’s PC; and which faithfully reproduces the contents of the displayed document. It is a format that enables even less experienced users to identify and, in many cases, solve problems.
In short, it is the only open and standard document format that we would all like to have, but which only a minority use due to a lack of knowledge about the reality of the OOXML format, and the messianic trust that too many users place in Microsoft. This leads them to believe that there cannot be a commercial strategy behind a document format that is hostile to users’ interests.