The artificial complexity of OOXML files (the DOCX case)
The complexity of the OOXML format is linked to its design and was deliberately created to make the format more difficult for non-Microsoft software developers to implement. Compatibility issues are caused by a veritable “maze” of tags used even for the simplest content, which binds users to the Microsoft ecosystem in the first example of standard-based lock-in. The DOCX case To demonstrate the difference in complexity between the XML schemas of Writer and Word text documents in the ODF and OOXML formats, I used two classic English theatre plays: William Shakespeare’s Hamlet and Oscar Wilde’s The Importance of Being Earnest. I downloaded the text versions of these works from Project Gutenberg — a library of classic texts for which US copyright has expired — and deleted the introductions and conclusions added by Project Gutenberg, without making any other changes. I then repeated this process for both documents. I copied all the text and pasted it, without any formatting, into two newly created blank documents. For Writer, I used the template that I usually use for unstructured documents; for Word, I did not use a template. This means that, in Writer, the XML schema contains information about the template (margins, paragraph
