Tech:Server lifecycle

This page describes the stages a Miraheze server goes through. Following this guide ensures transitions between stages are done according to existing policies, using best practices, with keeping data protection in mind and while keeping monitoring noise and potential for downtime at a minimum.

Requesting
Anyone from Site Reliability Engineering managing services offered by the Technical Team, such as MediaWiki Engineers and Site Reliability Engineers, can request a server. The request should be tracked in Phabricator, assigned to the Director of Site Reliability Engineering. In emergencies, a Senior Site Reliability Engineer may approve a purchase without talking to the Director of Site Reliability Engineering. If approved, they will purchase the server, or authorise a Site Reliability Engineer to do that.

The need for approval also applies to requesting servers (VMs) for the Proxmox hosts. That is because additional VMs take away resources that could have been used for other services or projects.

Installing
These steps must be performed in order. This list is not exhaustive, but applies to all servers. Certain servers, such as Proxmox hosts, may need an adjusted procedure from your side.
 * 1) Add an entry for the server to the miraheze.org DNS zone. If possible, also setup reverse DNS for the IPs.
 * 2) Change the hostname of the server. This must be in the format .miraheze.org. If you cannot do this via the Service Provider, run the command   via the console.
 * 3) Log in via the console, KVM, or whatever it is called by the Service Provider. In most case, you have received the password via mail. Never share root passwords with other people.
 * 4) Most servers are accessible via SSH by default. In that case, you may find it easier to work via PuTTY or similar. To do that, dump the fingerprint of the SSH host key. For PuTTY,   seems to be appropriate.
 * 5) When connecting, verify the fingerprint matches. If so, you can proceed with the rest of the steps.
 * 6) Add the fingerprint to Tech:SSH fingerprints. Do this early, so you don't forget this.
 * 7) Configure the server via Puppet: Tech:Puppet

Decommissioning
Decommissioning a server means the server will be fully removed from the Miraheze infrastructure. The server must be cancelled (via OVH/RamNode control panel) or deleted from Proxmox (in the case of a VM). Its hostname may not be reused.


 * 1) Depool the server from the services it's in use for. If the server is a master, failover to a replica or secondary server.
 * 2) Set downtime in Icinga for the server and all of its services, to avoid unnecessary Icinga alerts for the server.
 * 3) Ensure the server is removed from the Puppet CA and database.
 * 4) Remove all references to the server from manifests/site.pp. If the hostname and/or IP address is defined in other code (hiera variables, mw-config/Database.php, etc), remove those references as well.
 * 5) Manually remove any traces of PII or other confidential information. On most systems,   does most of the job. If the server was used for database hosting (e.g. MariaDB) or file hosting, please remove such information as well.
 * 6) Cancel the service via the OVH or RamNode control panel. If the server is a Proxmox VM, fully remove the server from the Proxmox inventory.

Reimage
Reimaging a server means the server will be kept in use, but a new OS is going to be installed. While this is not usual (except for Proxmox hosts), there may be use cases where decomissioning a server is not needed. The hostname may be reused, as long the server will serve the same role. If the server is converted to serve another role, the hostname may not ever be used again.


 * 1) Depool the server from the services it's in use for. If the server is a master, failover to a replica or secondary server.
 * 2) Set downtime in Icinga for the server and all of its services, to avoid unnecessary Icinga alerts for the server.
 * 3) Ensure the server is removed from the Puppet CA and database.
 * 4) If the server will not serve the same role: remove all references to the server from manifests/site.pp. If the hostname and/or IP address is defined in other code (hiera variables, mw-config/Database.php, etc), remove those references as well.
 * 5) Manually remove any traces of PII or other confidential information. On most systems,   does most of the job. If the server was used for database hosting (e.g. MariaDB) or file hosting, please remove such information as well.
 * 6) Reimage the server with a fresh copy of Debian.
 * 7) If the server will not serve the same role: readd the server to manifests/site.pp and other files, with a new fresh hostname.
 * 8) Repool the server where necessary.

Upgrade
An upgrade may be necessary in cases where horizontal scaling (having three smaller servers, instead of one big one) is not possible or needed. Upgrades must be approved by the Director of Site Reliability Engineering. An upgrade is considered to require a reboot, but without the need for a reimage. If a reimage is needed, please follow both these steps and the steps for reimaging. The guide expects you have gotten an OK from the service experts, and that they are aware of the upgrades. If a server must be depooled before an upgrade is performed, the guide expects you know how to do that.


 * 1) Depool the server from the services it's in use for. If the server is a master, failover to a replica or secondary server.
 * 2) Set downtime in Icinga for the server and all of its services, to avoid unnecessary Icinga alerts for the server.
 * 3) Perform the upgrades.
 * 4) Confirm the upgrade went well, and ensure all production services are restarted and working properly.
 * 5) Remove downtime in Icinga for the server and all of its servers, this ensures we have proper monitoring back.
 * 6) Repool the server where necessary.