The artificial complexity of OOXML files (the DOCX case)

The complexity of the OOXML format is linked to its design and was deliberately created to make the format more difficult for non-Microsoft software developers to implement. Compatibility issues are caused by a veritable “maze” of tags used even for the simplest content, which binds users to the Microsoft ecosystem in the first example of standard-based lock-in.

The DOCX case

To demonstrate the difference in complexity between the XML schemas of Writer and Word text documents in the ODF and OOXML formats, I used two classic English theatre plays: William Shakespeare’s Hamlet and Oscar Wilde’s The Importance of Being Earnest. I downloaded the text versions of these works from Project Gutenberg — a library of classic texts for which US copyright has expired — and deleted the introductions and conclusions added by Project Gutenberg, without making any other changes.

I then repeated this process for both documents.

I copied all the text and pasted it, without any formatting, into two newly created blank documents. For Writer, I used the template that I usually use for unstructured documents; for Word, I did not use a template. This means that, in Writer, the XML schema contains information about the template (margins, paragraph and font formatting), but this does not increase its complexity.

To perform the analysis, I duplicated and renamed the two files, replacing the original extension with “ZIP”, and then decompressed them to create two folders containing all the files of the respective XML schemas.

The LibreOffice folder contained three subfolders and six files with the same names as those in the ODS file examined last week, as would be expected of a standard aiming to simplify life for developers and users. All the content is actually in the content.xml file, while the other files contain instructions for displaying the text document correctly.

The Microsoft 365 folder contains three subfolders and the [Content_Types].xml file, as with the XLSX file examined last week. One of the subfolders has a different name, but this is related to the application and does not increase complexity. Opening the [Content_Types].xml file provides information about the other files, including those in the subfolders.

In this case, the content is in the document.xml file inside the Word folder, which contains folders and files that differ completely from those in the XLSX file. Again, there is no technical reason for this difference in the XML schemas of the two files other than to make their internal structures different and more complex.

Let’s now analyse William Shakespeare’s Hamlet and then Oscar Wilde’s The Importance of Being Earnest.

Here is the PDF of Hamlet:

hamlet

 

The difference in complexity between the document.xml and content.xml files is striking when you compare their lengths: the content.xml file has 6,802 lines, while the document.xml file has 60,245 lines, compared to a text document of 5,566 lines.

Let us now compare the two files’ XML schemas from the beginning to the end of the introduction.

CONTENT.XML

<office:body>
<office:text text:use-soft-page-breaks=”true”>
<office:forms form:automatic-focus=”false” form:apply-design-mode=”false”/>
<text:sequence-decls>
<text:sequence-decl text:display-outline-level=”0″ text:name=”Illustration”/>
<text:sequence-decl text:display-outline-level=”0″ text:name=”Table”/>
<text:sequence-decl text:display-outline-level=”0″ text:name=”Text”/>
<text:sequence-decl text:display-outline-level=”0″ text:name=”Drawing”/>
<text:sequence-decl text:display-outline-level=”0″ text:name=”Figure”/>
</text:sequence-decls>
<text:p text:style-name=”P1″>THE TRAGEDY OF HAMLET, PRINCE OF DENMARK</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″>by William Shakespeare</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″>Contents</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″><text:s/>ACT I</text:p>
<text:p text:style-name=”P1″><text:s/>Scene I. Elsinore. A platform before the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene II. Elsinore. A room of state in the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene III. A room in Polonius’s house</text:p>
<text:p text:style-name=”P1″><text:s/>Scene IV. The platform</text:p>
<text:p text:style-name=”P1″><text:s/>Scene V. A more remote part of the Castle</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″><text:s/>ACT II</text:p>
<text:p text:style-name=”P1″><text:s/>Scene I. A room in Polonius’s house</text:p>
<text:p text:style-name=”P1″><text:s/>Scene II. A room in the Castle</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″><text:s/>ACT III</text:p>
<text:p text:style-name=”P1″><text:s/>Scene I. A room in the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene II. A hall in the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene III. A room in the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene IV. Another room in the Castle</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″><text:s/>ACT IV</text:p>
<text:p text:style-name=”P1″><text:s/>Scene I. A room in the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene II. Another room in the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene III. Another room in the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene IV. A plain in Denmark</text:p>
<text:p text:style-name=”P1″><text:s/>Scene V. Elsinore. A room in the Castle</text:p>
<text:p text:style-name=”P1″><text:soft-page-break/><text:s/>Scene VI. Another room in the Castle</text:p>
<text:p text:style-name=”P1″><text:s/>Scene VII. Another room in the Castle</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″><text:s/>ACT V</text:p>
<text:p text:style-name=”P1″><text:s/>Scene I. A churchyard</text:p>
<text:p text:style-name=”P1″><text:s/>Scene II. A hall in the Castle</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″>Dramatis Personæ</text:p>
<text:p text:style-name=”P1″/>
<text:p text:style-name=”P1″>HAMLET, Prince of Denmark</text:p>
<text:p text:style-name=”P1″>CLAUDIUS, King of Denmark, Hamlet’s uncle</text:p>
<text:p text:style-name=”P1″>The GHOST of the late king, Hamlet’s father</text:p>
<text:p text:style-name=”P1″>GERTRUDE, the Queen, Hamlet’s mother, now wife of Claudius</text:p>
<text:p text:style-name=”P1″>POLONIUS, Lord Chamberlain</text:p>
<text:p text:style-name=”P1″>LAERTES, Son to Polonius</text:p>
<text:p text:style-name=”P1″>OPHELIA, Daughter to Polonius</text:p>
<text:p text:style-name=”P1″>HORATIO, Friend to Hamlet</text:p>
<text:p text:style-name=”P1″>FORTINBRAS, Prince of Norway</text:p>
<text:p text:style-name=”P1″>VOLTEMAND, Courtier</text:p>
<text:p text:style-name=”P1″>CORNELIUS, Courtier</text:p>
<text:p text:style-name=”P1″>ROSENCRANTZ, Courtier</text:p>
<text:p text:style-name=”P1″>GUILDENSTERN, Courtier</text:p>
<text:p text:style-name=”P1″>MARCELLUS, Officer</text:p>
<text:p text:style-name=”P1″>BARNARDO, Officer</text:p>
<text:p text:style-name=”P1″>FRANCISCO, a Soldier</text:p>
<text:p text:style-name=”P1″>OSRIC, Courtier</text:p>
<text:p text:style-name=”P1″>REYNALDO, Servant to Polonius</text:p>
<text:p text:style-name=”P1″>Players</text:p>
<text:p text:style-name=”P1″>A Gentleman, Courtier</text:p>
<text:p text:style-name=”P1″>A Priest</text:p>
<text:p text:style-name=”P1″><text:soft-page-break/>Two Clowns, Grave-diggers</text:p>
<text:p text:style-name=”P1″>A Captain</text:p>
<text:p text:style-name=”P1″>English Ambassadors.</text:p>
<text:p text:style-name=”P1″>Lords, Ladies, Officers, Soldiers, Sailors, Messengers, and Attendants</text:p>

It is a reasonably complex XML file. After the initial instructions on the sequence of content, the text of the tragedy can easily be located alongside the sequence of the five acts and the descriptions of the dramatis personae.

DOCUMENT.XML

<w:body>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” wp14:paraId=”73C9069B” wp14:textId=”09294AE1″>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>THE TRAGEDY OF HAMLET, PRINCE OF DENMARK</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”3AA342A9″ wp14:textId=”00E76CB9″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”1B479704″ wp14:textId=”129900F6″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>by William Shakespeare</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”6C6F02DD” wp14:textId=”1D8A204F”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”1F7D25FF” wp14:textId=”4619853B”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”50471715″ wp14:textId=”708F3004″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”7E2BE7BA” wp14:textId=”48A7F848″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”5B16A927″ wp14:textId=”10A9E3F9″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Contents</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”429A556E” wp14:textId=”6416D4DB”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”1C38A773″ wp14:textId=”4F3F8ED2″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>ACT I</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”59328897″ wp14:textId=”21C9F129″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene I. Elsinore. A platform before the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”30C9E582″ wp14:textId=”0A7616FF”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene II. Elsinore. A room of state in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”5EAB6C01″ wp14:textId=”70B75214″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene III. A room in Polonius’s house</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”232393A3″ wp14:textId=”069440B2″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene IV. The platform</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”65E1A75F” wp14:textId=”1E769B73″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene V. A more remote part of the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”6C6D2F5C” wp14:textId=”13700863″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”339DBFF3″ wp14:textId=”4AF718C4″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>ACT II</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”774AAE76″ wp14:textId=”3F8EE2B8″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene I. A room in Polonius’s house</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”526BCABF” wp14:textId=”441F6801″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene II. A room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”6A1841AB” wp14:textId=”1FBE8D34″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”577B4504″ wp14:textId=”1BF167DB”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>ACT III</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”2724CC9A” wp14:textId=”293764E9″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene I. A room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”04FF9ABE” wp14:textId=”30F918C2″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene II. A hall in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”338872C6″ wp14:textId=”1F0AFFE6″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene III. A room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”46D240C2″ wp14:textId=”3D28AE8B”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene IV. Another room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”64F40DC7″ wp14:textId=”16C2A388″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”4B538D6F” wp14:textId=”7CB11368″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>ACT IV</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”2508ABE7″ wp14:textId=”4925909D”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene I. A room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”5ABD1B8F” wp14:textId=”68A02D9E”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene II. Another room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”57D2E145″ wp14:textId=”08927478″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene III. Another room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”2BA12E96″ wp14:textId=”1E35C8BC”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene IV. A plain in Denmark</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”4DF8BEC9″ wp14:textId=”67676CF3″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene V. Elsinore. A room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”17EE90DC” wp14:textId=”708C9696″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene VI. Another room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”3302F704″ wp14:textId=”2ADB2A66″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene VII. Another room in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”0F7C9E68″ wp14:textId=”5D706618″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”1091F950″ wp14:textId=”2EE5201C”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>ACT V</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”6E162B67″ wp14:textId=”10199C37″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene I. A churchyard</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”1A2FA647″ wp14:textId=”683EF1FA”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Scene II. A hall in the Castle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”62A90ACE” wp14:textId=”156F1611″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”557F5426″ wp14:textId=”05194972″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”011BF8B2″ wp14:textId=”175BE494″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”4BB65B79″ wp14:textId=”7256A412″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”1EEEFC18″ wp14:textId=”2D4F2D20″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Dramatis Personæ</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”69D361D6″ wp14:textId=”0A66ADE7″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t xml:space=”preserve”> </w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”7198BA63″ wp14:textId=”0ECB601B”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>HAMLET, Prince of Denmark</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”7A30698D” wp14:textId=”2A3EE787″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>CLAUDIUS, King of Denmark, Hamlet’s uncle</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”7D437DFF” wp14:textId=”0C3AFC43″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>The GHOST of the late king, Hamlet’s father</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”663C7E0E” wp14:textId=”4F1E93F2″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>GERTRUDE, the Queen, Hamlet’s mother, now wife of Claudius</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”1EE14B03″ wp14:textId=”567F43B4″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>POLONIUS, Lord Chamberlain</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”7A4F8A78″ wp14:textId=”39759F7E”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>LAERTES, Son to Polonius</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”11E371D7″ wp14:textId=”36CD515A”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>OPHELIA, Daughter to Polonius</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”2D438C1E” wp14:textId=”7211E8E5″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>HORATIO, Friend to Hamlet</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”4E6B50D6″ wp14:textId=”559117D7″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>FORTINBRAS, Prince of Norway</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”1B5B4955″ wp14:textId=”599A64FC”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>VOLTEMAND, Courtier</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”32BA9096″ wp14:textId=”6E8C2728″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>CORNELIUS, Courtier</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”60FD9B45″ wp14:textId=”2F2E3956″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>ROSENCRANTZ, Courtier</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”0CC7985B” wp14:textId=”56DED383″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>GUILDENSTERN, Courtier</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”775EA68F” wp14:textId=”089F9982″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>MARCELLUS, Officer</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”4E2AEAC2″ wp14:textId=”34855F77″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>BARNARDO, Officer</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”6DB5A437″ wp14:textId=”146C2E48″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>FRANCISCO, a Soldier</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”389BDBAC” wp14:textId=”0B30EC2E”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>OSRIC, Courtier</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”12730B2E” wp14:textId=”60DC1BFE”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>REYNALDO, Servant to Polonius</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”7FA85C5A” wp14:textId=”3D66976B”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Players</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”2F38E070″ wp14:textId=”309A60BF”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>A Gentleman, Courtier</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”53493710″ wp14:textId=”48B3D2A5″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>A Priest</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”611C5F94″ wp14:textId=”22FB27D4″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Two Clowns, Grave-diggers</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”577DC4BA” wp14:textId=”2FD3CAA0″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>A Captain</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”0BAF6209″ wp14:textId=”35658011″>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>English Ambassadors.</w:t>
</w:r>
</w:p>
<w:p xmlns:wp14=”http://schemas.microsoft.com/office/word/2010/wordml” w:rsidP=”03F02A92″ wp14:paraId=”260F5D8D” wp14:textId=”0FC10ABC”>
<w:pPr>
<w:pStyle w:val=”Normal”/>
</w:pPr>
<w:r w:rsidR=”0C2508E9″>
<w:rPr/>
<w:t>Lords, Ladies, Officers, Soldiers, Sailors, Messengers, and Attendants</w:t>
</w:r>
</w:p>

This XML file contains a lot of repetition. Is it really necessary to include the same information about the schema and style for every line of content, including empty ones? I doubt it personally, but since I am not a technician, I am willing to listen to reasons from those who argue that this content is essential and not artificial complexity.

Let’s see if the same thing happens with Oscar Wilde’s ‘The Importance of Being Earnest’. Here is the PDF:

earnest

 

In this case, the artificial complexity of the Word document is less apparent, as the content.xml file has 3,974 lines compared to a text document of 3,885 lines, and the document.xml file has 8,610 lines. Therefore, we have gone from a file that is almost ten times longer in terms of the number of lines to a file that is just over twice as long. This difference can be explained by comparing the first lines of the two files’ XML schemas (only those with content).

CONTENT.XML

<text:p text:style-name=”P1″>The Importance of Being Earnest</text:p>
<text:p text:style-name=”P1″>A Trivial Comedy for Serious People</text:p>
<text:p text:style-name=”P1″>THE PERSONS IN THE PLAY</text:p>
<text:p text:style-name=”P1″>John Worthing, J.P.</text:p>
<text:p text:style-name=”P1″>Algernon Moncrieff</text:p>
<text:p text:style-name=”P1″>Rev. Canon Chasuble, D.D.</text:p>
<text:p text:style-name=”P1″>Merriman, Butler</text:p>
<text:p text:style-name=”P1″>Lane, Manservant</text:p>
<text:p text:style-name=”P1″>Lady Bracknell</text:p>
<text:p text:style-name=”P1″>Hon. Gwendolen Fairfax</text:p>
<text:p text:style-name=”P1″>Cecily Cardew</text:p>
<text:p text:style-name=”P1″>Miss Prism, Governess</text:p>
<text:p text:style-name=”P1″>THE SCENES OF THE PLAY</text:p>
<text:p text:style-name=”P1″>ACT I. Algernon Moncrieff’s Flat in Half-Moon Street, W.</text:p>
<text:p text:style-name=”P1″>ACT II. The Garden at the Manor House, Woolton.</text:p>
<text:p text:style-name=”P1″>ACT III. Drawing-Room at the Manor House, Woolton.</text:p>
<text:p text:style-name=”P1″>TIME: The Present.</text:p>
<text:p text:style-name=”P1″>LONDON: ST. JAMES’S THEATRE</text:p>
<text:p text:style-name=”P1″>Lessee and Manager: Mr. George Alexander</text:p>
<text:p text:style-name=”P1″>February 14th, 1895</text:p>
<text:p text:style-name=”P1″>John Worthing, J.P.: Mr. George Alexander.</text:p>
<text:p text:style-name=”P1″>Algernon Moncrieff: Mr. Allen Aynesworth.</text:p>
<text:p text:style-name=”P1″>Rev. Canon Chasuble, D.D.: Mr. H. H. Vincent.</text:p>
<text:p text:style-name=”P1″>Merriman: Mr. Frank Dyall.</text:p>
<text:p text:style-name=”P1″>Lane: Mr. F. Kinsey Peile.</text:p>
<text:p text:style-name=”P1″>Lady Bracknell: Miss Rose Leclercq.</text:p>
<text:p text:style-name=”P1″>Hon. Gwendolen Fairfax: Miss Irene Vanbrugh.</text:p>
<text:p text:style-name=”P1″>Cecily Cardew: Miss Evelyn Millard.</text:p>
<text:p text:style-name=”P1″>Miss Prism: Mrs. George Canninge.</text:p>
<text:p text:style-name=”P1″>FIRST ACT</text:p>
<text:p text:style-name=”P1″>SCENE</text:p>
<text:p text:style-name=”P1″>Morning-room in Algernon’s flat in Half-Moon Street. The room is</text:p>
<text:p text:style-name=”P1″>luxuriously and artistically furnished. The sound of a piano is heard</text:p>
<text:p text:style-name=”P1″>in the adjoining room.</text:p>
<text:p text:style-name=”P1″>[Lane is arranging afternoon tea on the table, and after the music has</text:p>
<text:p text:style-name=”P1″>ceased, Algernon enters.]</text:p>
<text:p text:style-name=”P1″>ALGERNON.</text:p>
<text:p text:style-name=”P1″>Did you hear what I was playing, Lane?</text:p>
<text:p text:style-name=”P1″>LANE.</text:p>
<text:p text:style-name=”P1″>I didn’t think it polite to listen, sir.</text:p>
<text:p text:style-name=”P1″><text:soft-page-break/>ALGERNON.</text:p>
<text:p text:style-name=”P1″>I’m sorry for that, for your sake. I don’t play accurately—any one can</text:p>
<text:p text:style-name=”P1″>play accurately—but I play with wonderful expression. As far as the</text:p>
<text:p text:style-name=”P1″>piano is concerned, sentiment is my forte. I keep science for Life.</text:p>
<text:p text:style-name=”P1″>LANE.</text:p>
<text:p text:style-name=”P1″>Yes, sir.</text:p>
<text:p text:style-name=”P1″>ALGERNON.</text:p>
<text:p text:style-name=”P1″>And, speaking of the science of Life, have you got the cucumber</text:p>
<text:p text:style-name=”P1″>sandwiches cut for Lady Bracknell?</text:p>
<text:p text:style-name=”P1″>LANE.</text:p>
<text:p text:style-name=”P1″>Yes, sir. [Hands them on a salver.]</text:p>
<text:p text:style-name=”P1″>ALGERNON.</text:p>
<text:p text:style-name=”P1″>[Inspects them, takes two, and sits down on the sofa.] Oh! . . . by the</text:p>
<text:p text:style-name=”P1″>way, Lane, I see from your book that on Thursday night, when Lord</text:p>
<text:p text:style-name=”P1″>Shoreman and Mr. Worthing were dining with me, eight bottles of</text:p>
<text:p text:style-name=”P1″>champagne are entered as having been consumed.</text:p>
<text:p text:style-name=”P1″>LANE.</text:p>
<text:p text:style-name=”P1″>Yes, sir; eight bottles and a pint.</text:p>
<text:p text:style-name=”P1″>ALGERNON.</text:p>
<text:p text:style-name=”P1″>Why is it that at a bachelor’s establishment the servants invariably</text:p>
<text:p text:style-name=”P1″>drink the champagne? I ask merely for information.</text:p>
<text:p text:style-name=”P1″>LANE.</text:p>
<text:p text:style-name=”P1″>I attribute it to the superior quality of the wine, sir. I have often</text:p>
<text:p text:style-name=”P1″>observed that in married households the champagne is rarely of a</text:p>
<text:p text:style-name=”P1″>first-rate brand.</text:p>
<text:p text:style-name=”P1″><text:soft-page-break/>ALGERNON.</text:p>
<text:p text:style-name=”P1″>Good heavens! Is marriage so demoralising as that?</text:p>

DOCUMENT.XML

<w:t>The Importance of Being Earnest</w:t>
<w:t>A Trivial Comedy for Serious People</w:t>
<w:t>THE PERSONS IN THE PLAY</w:t>
<w:t>John Worthing, J.P. Algernon Moncrieff Rev. Canon Chasuble, D.D. Merriman, Butler Lane, Manservant Lady Bracknell Hon. Gwendolen Fairfax Cecily Cardew Miss Prism, Governess</w:t>
<w:t>THE SCENES OF THE PLAY</w:t>
<w:t>ACT I. Algernon Moncrieff’s Flat in Half-Moon Street, W.</w:t>
<w:t>ACT II. The Garden at the Manor House, Woolton.</w:t>
<w:t>ACT III. Drawing-Room at the Manor House, Woolton.</w:t>
<w:t>TIME: The Present.</w:t>
<w:t>LONDON: ST. JAMES’S THEATRE</w:t>
<w:t>Lessee and Manager: Mr. George Alexander</w:t>
<w:t>February 14th, 1895</w:t>
<w:t>John Worthing, J.P.: Mr. George Alexander. Algernon Moncrieff: Mr. Allen Aynesworth. Rev. Canon Chasuble, D.D.: Mr. H. H. Vincent. Merriman: Mr. Frank Dyall. Lane: Mr. F. Kinsey Peile. Lady Bracknell: Miss Rose Leclercq. Hon. Gwendolen Fairfax: Miss Irene Vanbrugh. Cecily Cardew: Miss Evelyn Millard. Miss Prism: Mrs. George Canninge.</w:t>
<w:t>FIRST ACT</w:t>
<w:t>SCENE</w:t>
<w:t>Morning-room in Algernon’s flat in Half-Moon Street. The room is luxuriously and artistically furnished. The sound of a piano is heard in the adjoining room.</w:t>
<w:t>[Lane is arranging afternoon tea on the table, and after the music has ceased, Algernon enters.]</w:t>
<w:t>ALGERNON. Did you hear what I was playing, Lane?</w:t>
<w:t>LANE. I didn’t think it polite to listen, sir.</w:t>
<w:t>ALGERNON. I’m sorry for that, for your sake. I don’t play accurately—any one can play accurately—but I play with wonderful expression. As far as the piano is concerned, sentiment is my forte. I keep science for Life.</w:t>
<w:t>LANE. Yes, sir.</w:t>
<w:t>ALGERNON. And, speaking of the science of Life, have you got the cucumber sandwiches cut for Lady Bracknell?</w:t>
<w:t>LANE. Yes, sir. [Hands them on a salver.]</w:t>
<w:t>ALGERNON. [Inspects them, takes two, and sits down on the sofa.] Oh! . . . by the way, Lane, I see from your book that on Thursday night, when Lord Shoreman and Mr. Worthing were dining with me, eight bottles of champagne are entered as having been consumed.</w:t>
<w:t>LANE. Yes, sir; eight bottles and a pint.</w:t>
<w:t>ALGERNON. Why is it that at a bachelor’s establishment the servants invariably drink the champagne? I ask merely for information.</w:t>
<w:t>LANE. I attribute it to the superior quality of the wine, sir. I have often observed that in married households the champagne is rarely of a first-rate brand.</w:t>
<w:t>ALGERNON. Good heavens! Is marriage so demoralising as that?</w:t>

While the content.xml file retains all the line breaks (hard returns) of the text document, the document.xml file “reinterprets” the text, reconstructing all the paragraphs even when this makes no sense, as with lists of characters and the actors who play them. It also adds punctuation that does not exist in the text file, such as commas to replace hard returns. This is why the file is shorter than the “Hamlet” file, but it introduces an arbitrary “simplification” that does not respect the original document.

Until today, I was convinced that the XML schema of OOXML files was unnecessarily complex for the reasons I have explained at length on several occasions. However, it is not only unnecessarily complex, but also unnecessarily “creative” (always complicating the lives of developers and users).

Conclusions

Unfortunately, the reality is what I have explained several times, without going into technical detail. This has been confirmed by more technical analyses of XLSX and DOCX files, and I believe it will also be confirmed by next week’s PPTX file analysis. Microsoft has created an unnecessarily complex and incomprehensibly creative file format, which complicates the lives of developers and users more than I thought.

Indeed, while it is challenging to manage artificial complexity, it is arguably impossible to manage “creativity” that reinterprets the contents of a document by inventing paragraphs where it might make sense — albeit with a faithful format — and where it makes no sense, as with lists.

Perhaps, in my personal opinion, “creativity” was introduced to make it difficult for companies based in countries where reverse engineering is not illegal to emulate the OOXML format, as I don’t believe “creative” reverse engineering is possible, even with the help of AI.

Users should protect their rights by choosing an open standard format, such as ODF, which gives them control over their content and everything that this entails, including privacy protection, proper management of sensitive data and the ability to decide what to share and with whom.

This is a format whose development process, characteristics, and version are known; whose description corresponds to what happens on the user’s PC; and which faithfully reproduces the contents of the displayed document. It is a format that enables even less experienced users to identify and, in many cases, solve problems.

In short, it is the only open and standard document format that we would all like to have, but which only a minority use due to a lack of knowledge about the reality of the OOXML format, and the messianic trust that too many users place in Microsoft. This leads them to believe that there cannot be a commercial strategy behind a document format that is hostile to users’ interests.

The artificial complexity of OOXML files (the XLSX case)

The post, published on 18 July 2025, which explained why an artificially complex XML schema, such as that used by Microsoft 365 (formerly Microsoft Office) files, is in fact a subtle tool for locking in users because it is invisible and impossible to detect without in-depth study, was picked up by various IT media outlets. This was probably because it explained a problem that everyone faces without having the tools to solve it in a way that was accessible to everyone.

Some of these articles sparked a debate between those who supported my thesis and those who defended Microsoft, the true champions of lock-in, who claimed that the complexity of the XML schema was not artificial but rather a reflection of the complexity of the documents themselves.

This complexity relates to various factors, such as size (number of pages), structure (text, tables, graphs and images), content management (data entry by multiple people and systems) and customisation through metadata. These factors influence the management, classification and storage of the document itself.

The different approaches to complexity management between ODF and OOXML

However, the ODF and OOXML formats handle this complexity in completely different ways. In the first case, the XML schema seeks to simplify the work of developers and users by ensuring that both sets of requirements are met. Developers have all the descriptive tools related to document complexity at their disposal, and users can distinguish between descriptive elements and content because the two are almost always separate. The content is also consistent in syntax with the document.

In the second case, the XML schema does nothing to simplify the developer’s task and complicates the user’s task by putting all the elements – description and content – together without any apparent logic. This makes the two difficult or even impossible to distinguish.

The complexity of the OOXML format is linked to its design and was deliberately created to make the format more difficult for non-Microsoft software developers to implement. Compatibility issues are caused by a veritable “maze” of tags used even for the simplest content, which binds users to the Microsoft ecosystem in the first example of standard-based lock-in.

Added to this is the widespread use of convoluted descriptions, such as those relating to dates, which are linked to a bug introduced by Visicalc and still present in Excel 67 years after it was discovered, and the arbitrary separation of content, such as sentences or even words that are broken between two content elements. The format reflects the internal data structures and legacy features of Microsoft Office. It uses non-standard language encodings and units of measurement, as well as inconsistent naming conventions and rules between modules. It also uses abstruse tag names that are difficult to decipher.

The XLSX case

To illustrate the difference in complexity between the ODF and OOXML XML schemas, I created a simple spreadsheet containing dates from my life that are either significant or ironic. These include the date I broke my nose, the date it was repaired, and the date I re-married my wife in Las Vegas to celebrate the 30th anniversary of the marriage with an informal ceremony (a drive-through wedding in a limousine).

This is a screenshot of the spreadsheet:

To perform the analysis, I duplicated and renamed the two files, replacing the original extension with “ZIP”, and then unzipped them to create two folders containing all the files of the respective XML schemas.

The LibreOffice folder contains three subfolders and six files, one of which is called content.xml and immediately catches the eye due to its evocative name. Opening it reveals all the contents, while the other files contain instructions for displaying the spreadsheet correctly.

This is the significant portion of the LibreOffice content.xml file:

<office:body>
<office:spreadsheet>
<table:calculation-settings table:case-sensitive=”false” table:automatic-find-labels=”false” table:use-regular-expressions=”false” table:use-wildcards=”true”>
<table:iteration table:maximum-difference=”0.0001″/>
</table:calculation-settings>
<table:table table:name=”Foglio1″ table:style-name=”ta1″>
<office:forms form:automatic-focus=”false” form:apply-design-mode=”false”/>
<table:table-column table:style-name=”co1″ table:default-cell-style-name=”ce2″/>
<table:table-column table:style-name=”co2″ table:default-cell-style-name=”ce4″/>
<table:table-column table:style-name=”co3″ table:number-columns-repeated=”16382″/>
<table:table-row table:style-name=”ro1″>
<table:table-cell table:style-name=”ce1″ office:value-type=”string” calcext:value-type=”string”>
<text:p>Event</text:p>
</table:table-cell>
<table:table-cell table:style-name=”ce3″ office:value-type=”string” calcext:value-type=”string”>
<text:p>Date</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name=”ro1″>
<table:table-cell office:value-type=”string” calcext:value-type=”string”>
<text:p>Was Born in Umbria</text:p>
</table:table-cell>
<table:table-cell table:style-name=”ce7″ office:value-type=”date” office:date-value=”1954-08-12″ calcext:value-type=”date”>
<text:p>08/12/1954</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name=”ro1″>
<table:table-cell office:value-type=”string” calcext:value-type=”string”>
<text:p>Broke Nose in Rome</text:p>
</table:table-cell>
<table:table-cell table:style-name=”ce7″ office:value-type=”date” office:date-value=”1965-01-18″ calcext:value-type=”date”>
<text:p>01/18/1965</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name=”ro1″>
<table:table-cell office:value-type=”string” calcext:value-type=”string”>
<text:p>University Degree in Milan</text:p>
</table:table-cell>
<table:table-cell table:style-name=”ce7″ office:value-type=”date” office:date-value=”1978-11-19″ calcext:value-type=”date”>
<text:p>11/19/1978</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name=”ro1″>
<table:table-cell office:value-type=”string” calcext:value-type=”string”>
<text:p>First Job at Italian Touring Club</text:p>
</table:table-cell>
<table:table-cell table:style-name=”ce7″ office:value-type=”date” office:date-value=”1981-01-10″ calcext:value-type=”date”>
<text:p>01/10/1981</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name=”ro1″>
<table:table-cell office:value-type=”string” calcext:value-type=”string”>
<text:p>Hired by Honeywell and Got 1st PC</text:p>
</table:table-cell>
<table:table-cell table:style-name=”ce7″ office:value-type=”date” office:date-value=”1983-01-09″ calcext:value-type=”date”>
<text:p>01/09/1983</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name=”ro1″>
<table:table-cell office:value-type=”string” calcext:value-type=”string”>
<text:p>1st Wedding in Assisi</text:p>
</table:table-cell>
<table:table-cell table:style-name=”ce7″ office:value-type=”date” office:date-value=”1984-08-09″ calcext:value-type=”date”>
<text:p>08/09/1984</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name=”ro1″>
<table:table-cell office:value-type=”string” calcext:value-type=”string”>
<text:p>BBC Show Interview in Birmingham</text:p>
</table:table-cell>
<table:table-cell table:style-name=”ce7″ office:value-type=”date” office:date-value=”1987-02-17″ calcext:value-type=”date”>
<text:p>02/17/1987</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name=”ro1″>
<table:table-cell office:value-type=”string” calcext:value-type=”string”>
<text:p>Installed OpenOffice</text:p>
</table:table-cell>
<table:table-cell table:style-name=”ce7″ office:value-type=”date” office:date-value=”2003-02-01″ calcext:value-type=”date”>
<text:p>02/01/2003</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name=”ro1″>
<table:table-cell office:value-type=”string” calcext:value-type=”string”>
<text:p>Repaired Nose in Rozzano</text:p>
</table:table-cell>
<table:table-cell table:style-name=”ce7″ office:value-type=”date” office:date-value=”2008-12-04″ calcext:value-type=”date”>
<text:p>12/04/2008</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name=”ro1″>
<table:table-cell office:value-type=”string” calcext:value-type=”string”>
<text:p>Launched LibreOffice</text:p>
</table:table-cell>
<table:table-cell table:style-name=”ce7″ office:value-type=”date” office:date-value=”2010-09-28″ calcext:value-type=”date”>
<text:p>09/28/2010</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name=”ro1″>
<table:table-cell office:value-type=”string” calcext:value-type=”string”>
<text:p>2nd Wedding in Las Vegas</text:p>
</table:table-cell>
<table:table-cell table:style-name=”ce7″ office:value-type=”date” office:date-value=”2014-08-08″ calcext:value-type=”date”>
<text:p>08/08/2014</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name=”ro2″ table:number-rows-repeated=”1048563″>
<table:table-cell table:number-columns-repeated=”2″/>
</table:table-row>
<table:table-row table:style-name=”ro2″>
<table:table-cell table:number-columns-repeated=”2″/>
</table:table-row>
</table:table>
<table:named-expressions/>
</office:spreadsheet>
</office:body>

This is an XML file of reasonable complexity. Even someone without technical knowledge can identify the contents of the two columns with a little effort. The file is in an understandable format for dates, text strings and tags (table row, table cell, text and date value).

The Microsoft 365 folder contains three subfolders and the [Content_Types].xml file. Opening this file reveals information about the other files, including those in the subfolders. It also shows that the contents should be found in the sheet1.xml file, which is hidden in the worksheets folder, which is hidden in the xl folder. There is no technical reason for this game of hide-and-seek other than to make the internal structure of the XLSX file more complicated.

The significant part of the Microsoft 365 sheet1.xml file is as follows:

<dimension ref=”A1:B12″/>
<sheetViews>
<sheetView tabSelected=”1″ workbookViewId=”0″>
<selection activeCell=”B1″ sqref=”B1:B1048576″/>
</sheetView>
</sheetViews>
<sheetFormatPr defaultRowHeight=”15″/>
<cols>
<col min=”1″ max=”1″ width=”34.140625″ style=”2″ bestFit=”1″ customWidth=”1″/>
<col min=”2″ max=”2″ width=”11.7109375″ style=”4″ bestFit=”1″ customWidth=”1″/>
</cols>
<sheetData>
<row r=”1″ spans=”1:2″>
<c r=”A1″ s=”1″ t=”s”>
<v>0</v>
</c>
<c r=”B1″ s=”3″ t=”s”>
<v>1</v>
</c>
</row>
<row r=”2″ spans=”1:2″>
<c r=”A2″ s=”2″ t=”s”>
<v>2</v>
</c>
<c r=”B2″ s=”4″>
<v>19948</v>
</c>
</row>
<row r=”3″ spans=”1:2″>
<c r=”A3″ s=”2″ t=”s”>
<v>3</v>
</c>
<c r=”B3″ s=”4″>
<v>23760</v>
</c>
</row>
<row r=”4″ spans=”1:2″>
<c r=”A4″ s=”2″ t=”s”>
<v>4</v>
</c>
<c r=”B4″ s=”4″>
<v>28813</v>
</c>
</row>
<row r=”5″ spans=”1:2″>
<c r=”A5″ s=”2″ t=”s”>
<v>5</v>
</c>
<c r=”B5″ s=”4″>
<v>29860</v>
</c>
</row>
<row r=”6″ spans=”1:2″>
<c r=”A6″ s=”2″ t=”s”>
<v>6</v>
</c>
<c r=”B6″ s=”4″>
<v>30560</v>
</c>
</row>
<row r=”7″ spans=”1:2″>
<c r=”A7″ s=”2″ t=”s”>
<v>7</v>
</c>
<c r=”B7″ s=”4″>
<v>30933</v>
</c>
</row>
<row r=”8″ spans=”1:2″>
<c r=”A8″ s=”2″ t=”s”>
<v>8</v>
</c>
<c r=”B8″ s=”4″>
<v>31825</v>
</c>
</row>
<row r=”9″ spans=”1:2″>
<c r=”A9″ s=”2″ t=”s”>
<v>9</v>
</c>
<c r=”B9″ s=”4″>
<v>37623</v>
</c>
</row>
<row r=”10″ spans=”1:2″>
<c r=”A10″ s=”2″ t=”s”>
<v>10</v>
</c>
<c r=”B10″ s=”4″>
<v>39550</v>
</c>
</row>
<row r=”11″ spans=”1:2″>
<c r=”A11″ s=”2″ t=”s”>
<v>11</v>
</c>
<c r=”B11″ s=”4″>
<v>40449</v>
</c>
</row>
<row r=”12″ spans=”1:2″>
<c r=”A12″ s=”2″ t=”s”>
<v>12</v>
</c>
<c r=”B12″ s=”4″>
<v>41859</v>
</c>
</row>
</sheetData>
<pageMargins left=”0.7″ right=”0.7″ top=”0.75″ bottom=”0.75″ header=”0.3″ footer=”0.3″/>

It’s an extremely cryptic XML file. Apart from a few tags – col, row, sheetview and sheetdata – the XML schema is completely incomprehensible. Where are the dates? Where are the event descriptions?

They are actually there, but I challenge anyone to find them unless they know that Excel describes them with a sequential number starting from 1 January 1900. For 29 February 1900, it adds one, despite this being a “phantom” day that the programme – which is incompatible with the Gregorian calendar, a standard recognised even by the Chinese and Muslims – stubbornly continues to consider as existing, even though the “leap year bug” was discovered by Bob Bemer in 1958.

Therefore, in Excel, 19948 corresponds to 12 August 1954, but in reality it is 13 August 1954. The other dates are obviously 23,760; 28,813; 29,860; 30,560; 30,933; 31,825; 37,623; 39,550; 40,449; and 41,859. This is intuitive and easy to calculate, and above all it is imposed by the complexity of the document. Bullshit.

However, the mystery of the event descriptions remains. According to sheet1.xml, they do not exist since they do not appear in any way, not even as a reference. It is as if the spreadsheet consisted only of the second column.

So, I return to the [Content_Types].xml file and open the XML files in the order they are listed, searching for them in the subfolders.

<Types xmlns=”http://schemas.openxmlformats.org/package/2006/content-types”>
<Default Extension=”rels” ContentType=”application/vnd.openxmlformats-package.relationships+xml”/>
<Default Extension=”xml” ContentType=”application/xml”/>
<Override PartName=”/xl/workbook.xml” ContentType=”application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml”/>
<Override PartName=”/docProps/core.xml” ContentType=”application/vnd.openxmlformats-package.core-properties+xml”/>
<Override PartName=”/docProps/app.xml” ContentType=”application/vnd.openxmlformats-officedocument.extended-properties+xml”/>
<Override PartName=”/xl/worksheets/sheet1.xml” ContentType=”application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml”/>
<Override PartName=”/xl/theme/theme1.xml” ContentType=”application/vnd.openxmlformats-officedocument.theme+xml”/>
<Override PartName=”/xl/styles.xml” ContentType=”application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml”/>
<Override PartName=”/xl/sharedStrings.xml” ContentType=”application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml”/>
</Types>

Just as I was about to give up hope, I finally found the event descriptions in the sharedStrings.xml file. The file contains the following:

<sst xmlns=”http://schemas.openxmlformats.org/spreadsheetml/2006/main” count=”13″ uniqueCount=”13″>
<si>
<t>Event</t>
</si>
<si>
<t>Date</t>
</si>
<si>
<t>Was Born in Umbria</t>
</si>
<si>
<t>Broke Nose in Rome</t>
</si>
<si>
<t>University Degree in Milan</t>
</si>
<si>
<t>First Job at Italian Touring Club</t>
</si>
<si>
<t>Hired by Honeywell and Got 1st PC</t>
</si>
<si>
<t>1st Wedding in Assisi</t>
</si>
<si>
<t>BBC Show Interview in Birmingham</t>
</si>
<si>
<t>Installed OpenOffice</t>
</si>
<si>
<t>Repaired Nose in Rozzano</t>
</si>
<si>
<t>Launched LibreOffice</t>
</si>
<si>
<t>2nd Wedding in Las Vegas</t>
</si>
</sst>

If Sheet1.xml was cryptic, this is downright incomprehensible. I would like to know how the events are related to the dates within the sheet1.xml file through the different tags used in the two files: <t> and <v>. There are no cross-references linking the two elements, and if there are any, I challenge anyone to explain them to me in a way that I can understand and justify the mysterious format of these references.

Unfortunately, the reality is what I have already tried to explain without going into technical detail, as I have done in this post and will do in the next one dedicated to the DOCX case. Microsoft has developed an unnecessarily complex format – if LibreOffice can handle the information more simply, I don’t understand why Microsoft 365 can’t do the same – with the aim of making it extremely difficult to emulate the OOXML format without reverse engineering, which is widely used in some countries.

Furthermore, office suites that use reverse engineering to adopt OOXML as their native format merely help Microsoft to defend its market share by promoting a proprietary format that goes against users’ interests and their right to ownership and control of the content they have developed, generally referred to as digital sovereignty.

Users should learn to protect their rights by choosing an open, standard format such as ODF. This guarantees control over content and all that this entails, including protection of privacy, proper management of sensitive data and the ability to decide what to share and with whom.

This is a format whose development process, features and version are known, and whose description corresponds to what happens on the user’s PC, so even the least technical user can understand when a problem occurs and, in many cases, solve it.

In short, it is the standard document format that we would all like to have, but which only a minority use due to a lack of knowledge about the reality of the OOXML format, and the messianic trust that many have in Microsoft. This leads them to believe that there cannot be a commercial strategy behind the document format that protects the company’s interests at the expense of users.

How to resolve common compatibility issues with ODF files

Troubleshooting opening, formatting, and data loss issues with Open Document Format files

ODF files are great for sharing documents across multiple platforms, but they don’t always work perfectly, especially when using Microsoft Office or other software based on proprietary formats. If you’ve encountered problems opening, editing, or preserving the formatting of .odt, .ods, or .odp files, you’re not alone.

Here’s an overview of the most common compatibility issues with ODF files, along with their solutions.

1. The ODF file does not open in Microsoft Office

Opening an .odt file with Word or an .ods file with Excel is unsuccessful, and the file opens with formatting errors. Microsoft Office supports ODF, but not always correctly, and although support has improved in recent versions, files continue to have difficulties with some features.

There are two solutions: updating Microsoft Office, as compatibility improves with each new version; and converting with LibreOffice, which natively handles ODF files and, in compatibility mode, .docx and .xlsx files much better than Microsoft Office does with .odt and .ods files.

2. Formatting changes during transfer between suites

A file may appear perfect in LibreOffice, but when opened in Microsoft Office, the layout, fonts or spacing may change. This happens because the two software programmes interpret elements such as text boxes, tables and styles differently. Line spacing and bullet points may also change.

The solution is to use simple formatting in all cases where the file is shared between multiple office suites, avoiding complex layouts, unusual fonts and embedded elements. If formatting is more important than editability, you can use PDF format for the final version.

3. Images and graphics disappear or become corrupted

Images or graphics embedded in the document disappear, become distorted or can no longer be edited when opened with other software. This is because their formats are specific to the software that created the file – and therefore proprietary – and not standard, as is often the case with Microsoft Office.

The solution is to use standard formats, such as PNG or JPG for bitmap images, and SVG for vector images. In some cases, it is advisable to convert images before embedding them in the document and, if possible, simplify them (without altering them).

4. Macros and scripts do not work

Macros written in one suite do not work (or cause errors) in another. This is a known problem, linked to the fact that the scripting languages – Microsoft Office VBA and LibreOffice Basic – are proprietary and therefore incompatible with each other.

The solution is to avoid macros when sharing files, and if it is really impossible to do without them, you need to rewrite the scripts for each platform, using the respective languages. Unfortunately, there are no shortcuts or interoperable solutions.

5. Some data is lost when saving in proprietary format

In some cases, quite sporadic, saving an ODF file in proprietary format causes data loss. Unfortunately, this is a problem due to the artificial complexity of Microsoft Office proprietary files, which use an XML syntax that is very different from the standard in order to limit file interoperability. The solution is to always keep a copy of the original ODF file, because the format is much more robust and, above all, can be recovered by the user in case of file corruption.

In these cases, LibreOffice is the user’s best friend, because it handles ODF files natively and exports clean .docx, .xlsx and .pptx files with XML syntax that never reaches the level of artificial complexity of Microsoft Office.

Final considerations

ODF is the best open standard format for office documents. It is robust and flexible and was created to protect users’ rights thanks to its features that make it independent, interoperable, neutral and perennial. However, this does not mean that it is perfect and easy for developers to implement when the software has not been developed with the same objectives as LibreOffice, as in the case of Microsoft Office.

If problems arise, the key is to know what each office suite can and cannot handle, bearing in mind that LibreOffice was developed with the aim of protecting the interests of users, while proprietary suites were developed to protect the commercial interests of vendors.

The secret is to keep things simple, focusing on the content rather than the appearance of the document. When in doubt, always use the safest format, which is ODF.

Guide to migrating from proprietary formats to ODF

In the digital world, document formats are essential. Proprietary formats such as Microsoft Word’s DOCX or Excel’s XLSX dominate the workplace, but at the same time they lock users into a specific vendor and its business strategies, which tend to exploit users to the maximum in every way. The Open Document Format (ODF) offers an open, standard alternative that protects users and their privacy, promotes interoperability, long-term access and data ownership.

Migrating documents from proprietary formats to ODF is the solution, and although vendors who rely on proprietary formats – not only Microsoft, but also its freeware clones such as OnlyOffice or WPS Office – do everything they can to prevent it, it is very easy and represents a fundamental step forward for users in terms of privacy and digital sovereignty (i.e., ownership of their own content).

This guide breaks down the migration process to make the transition smooth, efficient and sustainable, both at the individual level (where problems are virtually non-existent) and at the enterprise level, where problems exist due to the lock-in strategies of proprietary formats.

Step 1: Understand ODF and its advantages

  • No dependence on a single vendor: freedom to use any compatible software
  • Better long-term accessibility, robustness and stability of storage
  • Transparency and security, thanks to full compliance with open specifications
  • Better interoperability between platforms and tools

Step 2: Document inventory to define conversion priorities and estimate the effort required for migration

  • Identification of file types (DOCX, XLSX, PPTX) and their number
  • Analysis of documents to distinguish between active (used periodically) documents, those that can be archived and obsolete documents
  • Analysis of documents with complex formatting or embedded multimedia content

Step 3: plan the migration workflow

  • Convert documents in bulk or gradually as needed?
  • Pilot phase with a small group of users to identify any issues with the documents before the mass conversion
  • User training on the migration and creation of a support service for conversions and backup management

Step 4: Converting documents to ODF format

  • Use the LibreOffice export function (‘Save As’)
  • Use batch conversion tools for large volumes (LibreOffice command line scripts)
  • Validate converted files to ensure formatting and data integrity
  • Back up original files until migration is successfully completed

Step 5: Monitoring the migration

  • Updating internal policies to make ODF the default format for document creation and sharing, and to prevent a return to proprietary formats
  • Monitoring user feedback and trends in document creation, and resolving issues in a timely manner
  • Integrating ODF support into enterprise software platforms, and using automatic conversions where possible

Conclusion

Migrating from proprietary formats to ODF is a strategic move, both individually and for businesses, towards openness, content control and document protection for the future. In a business environment, it requires careful planning and user involvement, but the benefits in terms of flexibility, interoperability and cost savings are well worth the effort.

Best practices for creating and editing Open Document Format (ODF) files

Adhering to these guidelines can enhance productivity and guarantee that documents remain consistent, robust and accessible over time, irrespective of the platform.

Firstly, use an editor such as LibreOffice that natively supports the format without conversion. This preserves the nuances of the ODF XML structure, supports all its features and reduces the risk of formatting issues or data loss. It also ensures that documents are fully compatible with the ISO standard.

Secondly, use an up-to-date version of LibreOffice to benefit from continuous improvements in ODF feature management, avoid bugs that could cause file corruption (a rare event thanks to the robustness of the ODF format, but still possible) and enjoy the highest level of security in file management.

Thirdly, use LibreOffice document templates and styles for all elements, such as headings, fonts, paragraphs, and tables, to ensure consistent formatting throughout the document. This allows you to make global changes quickly by changing the style rather than each individual element, and improves accessibility, as screen readers and other assistive technologies rely on a consistent structure. This also results in smaller, more robust ODF files.

Creating and reusing LibreOffice templates is an excellent practice for companies that produce many similar documents (such as invoices or monthly reports). Once all the characteristics of the document have been defined, simply save it in ‘template’ format to obtain a blank copy with all repetitive elements already in place.

The fourth condition is to save and back up documents frequently and regularly. ODF files are compressed XML files, which makes them robust and reliable, but not immune to problems. In a business environment, it is advisable to use a cloud storage solution with a version history, such as Nextcloud, which allows you to revert to an earlier version of a file.

The fifth recommendation is to avoid overly complex formatting to ensure maximum compatibility when sharing ODF files with a diverse audience or converting them to other formats, such as Microsoft Office proprietary formats, because complicated layouts, embedded objects or macros may not work or appear differently.

It is recommended that you use basic styles and standard LibreOffice fonts (open source and available to all users), or fonts that can be installed by any user, independently of the operating system, even if backed by an End User Licence Agreement (such as Microsoft Aptos, which can only be downloaded from the Microsoft website). You should also avoid excessive use of tables or nested text boxes.

The sixth condition is to integrate multimedia content sensibly, optimising images or videos used in presentations to reduce their size without compromising quality.

The seventh and final condition is to always save the original file in ODF format, even when sharing with users who insist on using Microsoft Office’s proprietary format — thereby handing over ownership of their files to Microsoft. Once the document is finalised, save a copy in OOXML format and share this with Microsoft users.

Similarly, when receiving an OOXML document from a Microsoft user, immediately save a copy in ODF format for editing until the document is finalised and the OOXML copy can be shared again.

When sharing a document within a team, it is advisable to use comments to provide feedback instead of editing the body of the document, enabling change tracking so that changes can be reviewed before acceptance or rejection. Where possible, collaborate on a shared ODF platform based on LibreOffice technology and the cloud, such as Collabora Online.

Open standard formats such as ODF allow you to avoid dependence on a single supplier, maintain ownership and control of your documents, and future-proof your work — but only if used wisely. Following best practices will enable you to manage ODF documents more smoothly and conveniently without sacrificing any of the advantages of the ISO standard format.

What’s new in ODF 1.3 and 1.4

ODF ensures that documents remain accessible, portable, and free from restrictions. Now that version 1.3 has been widely adopted and version 1.4 is on the horizon, it’s time to have a look at the new features and upcoming releases.

ODF 1.3: What’s New

ODF 1.3 was finalised in January 2021 by OASIS. It introduced a number of long-awaited improvements, particularly in the areas of security, digital signatures, and document integrity.

1. Digital signatures and document security:

One of the most significant enhancements in ODF 1.3 was the formal specification for digital signatures:

  • It now supports XAdES (XML Advanced Electronic Signatures).
  • You can sign entire documents, individual parts (e.g. only spreadsheets), or even multiple sections.
  • Improved metadata provides information about who signed, when and under what circumstances.

This is a significant development for public administrations and organisations that require reliable document verification.

2. OpenPGP support for encryption

  • ODF 1.3 now offers optional OpenPGP-based encryption in addition to the traditional Blowfish method.
  • Higher cryptographic standards and better integration with tools such as GnuPG are also included.
  • It encourages key-based encryption for personal and business documents.

3. Change management:

  • The format now offers greater granularity for change management.
  • Supports change tracking in tables, which was previously a weak point.
  • Improves compatibility with editing tools that handle collaborative workflows.

4. Metadata:

  • Improved management of custom metadata fields using RDF.
  • Greater richness of semantic descriptions of content (e.g. for archival or academic purposes).
  • Encourages integration with deep graphs and linked data systems.

5. Other changes:

  • New chart types and charting features.
  • Improved text formatting options.
  • Improved compliance with accessibility standards.

ODF 1.3 introduced two new compliance modes: Strict, for clean documents that comply with the specifications, and Extended, which allows specific enhancements by a company for broader feature support.

What’s new in ODF 1.4

ODF 1.4 is still under active development, with the first drafts already implemented in the latest versions of LibreOffice. Although the specifications are not final, the following is planned:

1. Change tracking:

  • Support for tracking style changes (e.g. switching from bold to italic).
  • Better differentiation between insertions, deletions and formatting changes.
  • Change IDs and support for real-time conflict resolution for collaborative editing.

2. Charts:

  • More flexibility in charts, including custom colours, gradient fills, multiple axes and formatted data labels.
  • Better alignment with modern expectations and improved interoperability with Excel.

3. Accessibility:

  • Clearer semantics for assistive technologies.
  • Improved navigation for screen readers.
  • Structural tags for headings, lists and tables make documents easier to analyse programmatically.

4. Form controls:

  • More robust form field types, such as date pickers, drop-down menus and sliders.
  • Better interaction support for forms within spreadsheets and presentations.
  • Cross-platform consistency.

5. Improved spreadsheet features:

  • Native support for named ranges in the sheet.
  • Improved formula representation for functions in edge cases.
  • More complex conditional formatting rules.

6. Compatibility:

  • Mapping of Microsoft Office formats (DOCX, XLSX and PPTX) to reduce conversion issues.
  • Improved handling of embedded media and OOXML-style layouts.

Final considerations

ODF 1.3 represented a significant advancement in terms of security and interoperability. ODF 1.4 adds usability improvements, more modern features, and better alignment with current office suite trends.

With an increasing number of governments and organisations adopting open standards, the evolution of ODF is crucial. The focus is not on competing with Microsoft; it’s about ensuring that your documents remain yours.