Behind the scenes at TDF: Infrastructure

With the beginning of 2015, a new year packed with exciting projects and ideas around LibreOffice and The Document Foundation, we today finish our behind-the-scenes series, to share achievements in 2014 with our community and our generous donours, to whom we’d like to express our sincerest gratitude and thanks for their incredible and wonderful support and their invaluable contributions!

I’m Alexander Werner and I am responsible for the infrastructure of The Document Foundation on a contracted basis since March 2014. I have been with the project since its foundation in 2012, and been a longtime supporter of free and open source software. As a volunteer I helped setting up and maintaining our first server and optimizing it to handle the load of the first days.
The infrastructure is one of the most important things The Document Foundation provides for the community. As long as every part is working as expected, it is basically invisible. It is my job to make sure that this is always the case, mostly by orchestrating the different services on our growing number of virtual machines.

When the LibreOffice fork began, we started with only one server where all services were located – mailing lists, both private and public, website, mirror management, wiki and many more. As time went by, this server survived its first slashdot, but soon it became clear that more power was needed. So our infrastructure started growing organically as more and more servers were added. Our admins specialized on different parts of the infrastructure, while the whole configuration was centrally documented in a single ODT file.

It soon became clear that this was not a viable solution – our quest for infra 2.0, as we internally call it, began. The admin team worked under fast escalating load while looking for ways to optimize resource usage, inclusion of new volunteers, configuration documentation and management. Also high availablility of services became increasingly important. In our sparse free time we started creating concepts, tested HA with DRBD, Pacemaker and Heartbeat, evaluated different solutions for centralized documentation and started using tools for centralized configuration management.

It soon became clear that we needed more flexibility for working HA with the solution described above, so as interim solution we started virtualizing services first in paravirtualized guests with LXC and then switched to fully virtualized guests with KVM. For infrastructure documentation I suggested to use the documentation generator Sphinx. The source files for the documentation – human readable RST text files – are located in a git repository, and the online documentation is automatically updated on every push. For configuration management and deployment, I eventually stumbled upon SaltStack.

My daily work consists of working on various small recurring tasks such as helping people with mailing list troubles, adding and removing mirrors in MirrorBrain, installing updates and doing necessary reboots as well as handling unexpected incidents such as the Heartbleed bug.

In spring I started working on our Salt states, made them more reliable and made sure that all user accounts are now managed by Salt. I have setup a new virtualization host with VMs for Gerrit, Jenkins, Bugzilla and Plone. Apart from that I worked on improving the documentation of our services, looking for undocumented and unused services.

I also worked on our AskBot setup. While having set up the initial AskLibO instance, it was decided to contract Evgeny Fadeev, the primary developer of AskBot, to develop additional features needed by our community, which will then be made available upstream again. Despite that, I also did some changes such as enabling the newly-developed multilanguage support, fixed template bugs and administered the list of moderators.

Except for my ongoing work to improve the Salt states and adding more not yet managed servers to our Salt infrastructure, I also continued to concatenate various documentation sources into our centralized repository.

I also worked on a download counter that will be useful to track all our downloads by language, location, version and operating system.

But the most interesting, time consuming and fascinating part of my work was the planning, testing and setup of our new cluster/cloud infrastructure. As it was decided to virtualize all of our services, I looked for a solution that is easy to manage and maintain but provides powerful tools for easily creating highly available services.

After quite some time of evaluating I decided to go for oVirt – a KVM-based virtualization solution that provides a nice out-of-the-box experience, the simplicity of its setup was worlds apart from other solutions. It is also possible to provide fully high available services with only two nodes by having the management engine run as VM on the platform.

During the time of evaluation I also had contact to hardware suppliers and hosters, and after a good offer from manitu we decided to host our new platform on two large, dedicated servers, each with 256 GB RAM and 64 CPU cores. Until the end of the year, over 20 virtual machines were migrated and a third node was ordered that will be used primarily for crash testing and to increase the stability of the platform even more.

If you are interested in learning more about our infrastructure or helping out, consider subscribing to the website mailing list, where infra calls are announced or write a mail to alex@documentfoundation.org

Comments

  1. By Tandra Wunder