A large free and open source software project like LibreOffice requires a lot of infrastructure, to support our users, developers and worldwide community. Today we speak to Guilhem Moulin, who is in charge of TDF’s infrastructure and services, about new developments and how others can get involved…
To start, please give us a quick overview of TDF’s public infrastructure.
The public infrastructure is powered by about 50 Kernel-based Virtual Machines (KVM) spread across 4 hypervisors plugged to an internal 10Gbps switch and hosted at Manitu in St. Wendel (Germany), and managed with libvirt and its KVM/QEMU driver. The virtual disk images are typically stored in GlusterFS volumes — distributed across the hypervisors — except for some transient disks (such as cache) where the IOPS need is higher and the redundancy less important.
All our public VMs run Debian GNU/Linux (currently a mix of Jessie — which are to be upgraded — and Stretch), each typically hosting a single service for better isolation. The rest of the stack is fairly usual: systemd as PID 1 & service manager, a mix of MySQL and PostgreSQL as RDMS, and nginx as SSL/TLS endpoint & reverse HTTP proxy. All of this is orchestrated and managed using saltstack.
About half of our Virtual Machines host public-facing websites; the other half are used for test instances, various production backends and internal services, as well as for tinderboxes and other hacking VMs. Some of these websites are mostly useful for developers, such our Bugzilla or gerrit instances — an overview of the development-focused sites can be found at https://devcentral.libreoffice.org. The remaining sites include the main LibreOffice website, the download page, the Wiki, Askbot, and of course the blog.
Beside these VMs, we also operate a handful of other machines for backups, monitoring, and mail systems, which are hosted offsite for obvious reasons.
What have been the most significant infra developments in the last six months?
Single Sign On (SSO) is probably what’s been the most visible to the community. Traditionally each frontend (Wiki, Bugzilla, Askbot, etc.) has its own private authentication backend, so once someone sign in to multiple services, they would have to remember multiple sets of credentials, which is cumbersome and makes password & email rotation difficult.
We now have a central authentication system (which uses an LDAP DIT as backend), but aren’t pointing individual services to it, as it would 1/ expose the shared credentials to all services hence increase the attack surface; and 2/ doesn’t solve the fact that users would have to enter their password to each service individually. Instead we’re deploying a solution using the SAML 2.0 protocol: unauthenticated users are redirected to an authentication portal against which they can authenticate, and they are redirected to the protected page afterwards.
Not all services have been migrated to SSO yet. An issue is that we have to unify accounts (people use different usernames in different services); and while we want a “critical mass” of active user accounts in LDAP before migrating a service, it’s been rather difficult to reach out to people — even among TDF officials! — and convince them to create an account in the new system. Fortunately since we migrated the authentication system of our wiki, more and more people (among whom a lot of dormant accounts, probably spammers unfortunately) started using the new system.
While it’s only visible to infra team members, we also replaced our Graphite (+ Carbon + Icinga2) based monitoring system with Prometheus (+ data exporters + alert manager). Furthermore, still on the monitoring front but public this time, we just deployed a new service, CachetHQ, to show a quick overview of TDF’s infra status:
Last but not least, earlier this spring we were fairly busy with GDPR compliance.
What are you working on at the moment, and what are your plans for the next six months?
Aside from daily maintenance and occasional emergencies (system crashes, hardware hiccups, performances issues, etc.), infra team members still spends quite a lot of time on the above, as this is not completely finished yet. Projects for next year include working on a better backup solution, in particular regarding database snapshots. The data collection system for download metrics needs some improvement, too.
Finally, what cool things can new volunteer admins do to get involved and help the project?
We have a wide variety of systems, ranging for highly sensitive (election, internal mail, LDAP DIT, whitebox monitoring) to pretty much fully public beside the access logs (bitergia dashboard, blackbox monitoring). We can’t give upfront access to the sensitive side of the spectrum to everyone, but there are things to help with on the other side too (developer-focused services are typically less sensitive, since development is open anyway).
Sometimes we also start fresh and replace a service with something equivalent on a brand new box; in that case there is no sensitive data at stake, and it’s a great way for new volunteer admins to gain trust. I mentioned the monitoring migration earlier; we could also imagine replacing our ageing MirrorBrain deployment with a more modern solution like Mirrorbits, for instance.
Thanks to Guilhem for his time and help. If you’re interested in joining our infra community and gaining valuable experience in a large FOSS project, see here to get started!