Skip to content

noble upgrade architecture

Kunal Mehta edited this page Jan 22, 2025 · 2 revisions

These notes accompany https://github.com/freedomofpress/securedrop/pull/7406

noble upgrade architecture

Upgrading Ubuntu LTS releases is a straightforward process, and they provide a tool for it. However they require you not skip a release, while we want to skip over 22.04.

Ubuntu also unofficially supports upgrading using the "Debian" way, which is to use APT to do everything. There are some migrations that don't get applied this way, but given that SecureDrops are homogenus and rather small, we're taking care of all of those actions ourselves.

The noble migration upgrade script is designed to be run fully automatically. It is run using systemd, which provides logging (journald), process control and daemonization.

Overview

The script is broken down into a series of sequential steps. A systemd timer regularly starts the script every 3 minutes, including after reboots are crashes. The current step is stored in a state file so it can be safely resumed. Each step is internally idempotent, so in case it crashes midway through a step, it'll be safe to resume from the beginning of it.

The timer is invoked so frequently because it should restart pretty quickly after a reboot or a crash. The no-op case is pretty quiet and light. (It probably could be every minute but then I didn't have enough time between runs to manually poke at things while debugging.)

Triggers and bucketing

A configuration file (/usr/share/securedrop/noble-upgrade.json), shipped by the securedrop-config package, provides instructions on when to actually begin the upgrade.

By default the script is started by the timer, but doesn't start the upgrade. On the first run, when it generates the state file, it randomly generates a number between 1 and 5 (inclusive) and saves it.

When we want to begin the automated migration, we'll increment the bucket in the noble-upgrade.json file and ship a new version of the securedrop-config package. If the bucket includes the instance, it'll begin the upgrade process. Otherwise it'll continue to do nothing. Note that the bucketing is private and not visible to FPF until after an app server has finished upgrading. mon bucketing will always be private.

Once an instance has begun the upgrade, it will ignore the noble-upgrade.json file and continue the upgrade regardless.

Manual upgrades will edit this configuration file to set the bucket to 5 to kick off the upgrade.

APT notes

At a super duper high level, this script is just a wrapper around APT. Thankfully APT is generally pretty good at what it does, the main issue is ensuring nothing else is trying to get in our way. As the first preparatory step, we run unattended-upgrades to clear any pending updates, and then mask all of the systemd services related to APT and unattended-upgrades and reboot. This should be sufficient for our script to be the only thing that should be using APT.

But even if there is lock contention, the script will gracefully fail and restart 3 minutes later.

We pass --force-confold to dpkg when we do the upgrade. In theory this ensures if we've edited overridden a configuration file (anything in /etc/), APT will preserve our version. This mostly works, but it doesn't for /etc/iptables/rules.v{4,6}, which I think is because it switches from the iptables-persistent package to netfilter-persistent. In that one case we just restore our customizations manually.

One difference from the documented Debian way is that we don't do a two-step upgrade; we just do it all in one go. The main issue is that some versioned package dependencies are missing, largely because we are jumping an LTS version. We started setting some of them explicitly (like apparmor) in our own package but it became a lot, so just doing one pass means there's more inflight at once but seems to work much better.

OSSEC notifications

The upgrade script only runs on a single host and when on app, it doesn't have any straightforward way to communicate with mon. So in the app case, it will trigger a ton of OSSEC notifications. For mon we temporarily raise the OSSEC alert level so most of them will go through.

For the manual upgrade initiated by the admin workstation, we can have it stop notifications via mon before kicking off the app upgrade.

SecureDrop data

After standard system updates but before we do anything else, we take down the apache2 service and begin creating a backup. Taking down the web services ensures that no data will be changing (e.g. new sources or journalist replies) while the upgrade happens. The backup is stored on the app server; if something goes terribly wrong an admin can obtain it and use it to restore as a fresh install.

The various background services like the shredder and rq are not paused under the assumption that if they are still doing anything, it should finish by the time the package upgrade restarts them. And that if a journalist wanted something deleted, we shouldn't delay that.

The backup is deleted as the final step, after apache2 is turned back on.

Rust

The script was originally written in Python, but ported to Rust to avoid any complications during the upgrade procedure itself when Python 3.8 is replaced by Python 3.12. The Rust script is a statically compiled binary that only links to glibc (if glibc ends up broken, the whole system is doomed).

Clone this wiki locally