Tech:Varnish

Varnish is an HTTP cache proxy, which can serve almost anything (including articles) without asking the MediaWiki appservers.

Basics
Unlike webservers like Apache and NGINX, Varnish *only* caches stuff visited by people, and will pass uncacheable traffic (dynamic pages, e.g. when the user is logged in) to the MediaWiki appservers.

Configuration
The default.vcl file configures a lot of the stuff, but /etc/default/varnish is also critical here. Some stuff can be done via CLI tools.

How traffic goes through

 * 1) Client asks authoritative DNS for the IP of a cache proxy. GeoDNS returns the IP of the the nearest cache proxy, and the client will use that cache proxy server;
 * 2) NGINX runs on port 80 (for HTTPS redirect) and 443, terminating SSL/TLS. All traffic is passed to Varnish;
 * 3) Varnish will look if the object is (and can be) cached. If yes, traffic will be passed to stunnel (127.0.0.1:8080 for mw1, 127.0.0.1:8081 for mw2). stunnel is needed here because Varnish can't pass traffic to HTTPS backends, that's only for users that pay.

varnishadm
Root privileges are required for this tool.

varnishadm is a tool that gives you an interactive (non-interactive is possible, but personally not preferred) shell prompt for managing the Varnish process. You can ban objects from the cache, show backends, view backend health, change backend health status and more.

varnishlog
varnishlog is a tool that can be used to view real-time web requests. It's main advantages over standard access logs is that it doesn't require disk space, and you can view way more information (all client headers, detailed Varnish debug information, all backend headers, etc. Want to monitor all POST requests?  does the trick. Do you want all cache MISSes?   is all you need.

Or, what do you think about combining criteria:

Using the VSL query language this tool is great for monitoring web requests.

varnishncsa
varnishncsa is almost similiar to varnishlog, but it instead formats the output with the common NCSA log format, which is the same format used by default by (almost) all major webservers. Like varnishlog, this tool also accepts the VSL query language using the -q switch.

varnishstat
TODO

One-off purges (bans)
Credit: https://wikitech.wikimedia.org/wiki/Varnish

Normally, MediaWiki will purge cache objects with the PURGE protocol. However, sometimes you want to purge more than object (or you want an easier way to purge objects across the fleet without using MediaWiki). This can be done using the banning feature in varnishadm. Despite it scary names, banning only ensures that all cached objects matching the given criteria (e.g. req.http.Host == meta.miraheze.org) won't be used anymore.

Due to performance concerns, it's recommended that this feature is only used when needed. While Miraheze's appservers can handle the current amount of traffic (warning: this is a dangerous assumption), we should aim for the highest cache hit rate as possible.

While you can supply the ban command and its arguments directly as an argument to varnishadm, I recommend you just open the varnishadm shell prompt:

Every ban can be supplied with the "ban" command. This must be done on all cache proxies. Using the "==" (strict, no regex) and "~" (regex) operators, you can specify the creteria. For example: to ban all /w/load.php objects for allthetropes.org: varnish> ban req.http.Host == allthetropes.org && req.url ~ "^/w/load.php"

If you remove the "req.http.Host == allthetropes.org && " part, this would match ALL load.php objects, regardless of the wiki.

Or, to remove ALL 301 redirects for meta.miraheze.org: varnish> ban obj.status == 301 && req.http.Host == meta.miraheze.org

Backend health checks
Varnish uses a Miraheze-configured backend health check (called probes). For Miraheze, it checks (for each appserver) each 5 seconds if https://meta.miraheze.org/wiki/Miraheze loads under 4 seconds. If an appserver fails to serve that page under 4 seconds (or the response does not have HTTP status code 200) for at least 2 out of 5 times, it will be marked as sick, and it will be depooled automatically until it looks healthy again.

Using the debug.health and backend.list commands in varnishadm, you can view the current health of all appservers.

Overriding health checks (e.g. depooling an appserver)
Sometimes, you want to override the health checks, for example, because you want to depool an appserver.

Using the backend.set_health command, you can manage the health status of an appserver. For example, to depool mw1 (again, '''please ensure that you run this command on ALL cache proxies): varnish> backend.set_health mw1 sick

This will mark the backend as sick (= depooled), and it will stay depooled regardless of the health checks, as long as Varnish wouldn't be restarted (or the health set to good again).

If the health status of the backend should depend on the health checks (i.e. maintenance completed, appserver can be repooled), set the health status to auto: varnish> backend.set_health mw1 auto

This will allow Varnish probes to manage the backend health state again.