Tech:Organisation/Site Reliability Engineering

This is a guide for all Miraheze Operations. They have access to as well as all Miraheze GitHub repositories and they are in charge of maintaining all Miraheze servers and making sure they function smoothly.

Draft: This guide is still a draft and has not been confirmed by a current operations member

Rules

 * 1) Be respectful to other volunteers and users. You represent the Miraheze project.
 * 2) Don't suddenly change big parts of the infrastructure (MediaWiki, Varnish, Bacula, etc.) (e.g. way how things are done in the current style) without discussing it with the other operations (and any sysadmins)
 * 3) Don't use the servers for non-Miraheze purposes.
 * 4) Don't put abnormally high load(s) on the server(s) if avoidable. (Ganglia can be used for more details)
 * 5) Respect privacy. Don't publish access logs, IP addresses, content of private wikis, or other personally identifiable information. If in doubt, ask before publishing.
 * 6) Don't publish database passwords, private keys, etc as well.

Violation of these rules can result into warnings or revocation of access.

Deployment

 * When deploying a change (SSL certificate, database rename, etc.), you are required to closely watch the change going live.
 * After commiting a change to puppet or dns repo (and being sure it should work), run 'sudo puppet agent -t' on the server involved. It can take a while before the change is actually deployed.
 * Watch the error logs:

Further specifics to be filled in by Operations

Monitoring errors
''To be filled in for specific servers'

Debugging

 * Look at the error logs
 * Try to send the failing HTTP request with the header 'X-Miraheze-Debug: 1', it could be an error that is cached in Varnish