Tech:Varnish

Varnish is an HTTP cache proxy, which can serve almost anything (including articles) without asking the MediaWiki appservers.

Basics
Unlike webservers like Apache and NGINX, Varnish *only* caches stuff visited by people, and will pass uncacheable traffic (dynamic pages, e.g., when the user is logged in) to the MediaWiki appservers.

Configuration
The default.vcl file configures a lot of the stuff, but /etc/default/varnish is also critical here. Some things can be done via CLI tools.

How traffic goes through

 * 1) Client asks authoritative DNS for the IP of a cache proxy. GeoDNS returns the IP of the nearest cache proxy, and the client will use that cache proxy server;
 * 2) NGINX runs on port 80 (for HTTPS redirect) and 443, terminating SSL/TLS. All traffic is passed to Varnish;
 * 3) Varnish will look if the object is (and can be) cached. If yes, traffic will be passed to stunnel (127.0.0.1:8080 for mw121, 127.0.0.1:8081 for mw122). stunnel is needed here because Varnish can't pass traffic to HTTPS backends, that's only for users that pay.

varnishadm
Root privileges are required for this tool.

varnishadm is a tool that gives you an interactive (non-interactive is possible, but personally not preferred) shell prompt for managing the Varnish process. You can ban objects from the cache, show backends, view backend health, change backend health status and more.

varnishlog
varnishlog is a tool that can be used to view real-time web requests. It's main advantages over standard access logs is that it doesn't require disk space, and you can view way more information (all client headers, detailed Varnish debug information, all backend headers, etc.). Want to monitor all POST requests? does the trick. Do you want all cache MISSes? is all you need.

Or, what do you think about combining criteria:

Using the VSL query language this tool is great for monitoring web requests.

varnishncsa
varnishncsa is almost similiar to varnishlog, but it instead formats the output with the common NCSA log format, which is the same format used by default by (almost) all major webservers. Like varnishlog, this tool also accepts the VSL query language using the -q switch.

varnishstat
TODO

X-Cache
On each web request by Varnish, we set an X-Cache header. It will look like: X-Cache: cp20 HIT (5) cp20 is the cache proxy who served the request, HIT indicates the request was served from the cache, (5) indicates the amount of times the object was accessed (so in this case the page was served from the cache for the 5th time). If the request was passed to the backend, the header would contain  instead of. This could be useful to see if Varnish could have accidentally cached an error.
 * Note: due to a bug in Varnish, obj.hits won't be reset after a ban was applied.

X-Miraheze-Debug
Using the  header, you can force the request to be passed to a certain backend. For example, requests with the header  will always be passed (regardless of whether it could be served from the cache or not) to test131.

One-off purges (bans)
Credit: https://wikitech.wikimedia.org/wiki/Varnish

Normally, MediaWiki will purge cache objects with the PURGE protocol. However, sometimes you want to purge more than object (or you want an easier way to purge objects across the fleet without using MediaWiki). This can be done using the banning feature in varnishadm. Despite its scary name, banning only ensures that all cached objects matching the given criteria (e.g. req.http.Host == meta.miraheze.org) won't be used anymore.

Due to performance concerns, it's recommended that this feature is only used when needed. While Miraheze's appservers can handle the current amount of traffic (warning: this is a dangerous assumption), we should aim for the highest cache hit rate as possible.

While you can supply the ban command and its arguments directly as an argument to varnishadm, I recommend you just open the varnishadm shell prompt:

Every ban can be supplied with the "ban" command. This must be done on all cache proxies. Using the  (strict, no regex) and   (regex) operators, you can specify the creteria. For example: to ban all  objects for allthetropes.org: varnish> ban req.http.Host == allthetropes.org && req.url ~ "^/w/load.php"

If you remove the  part, this would match ALL load.php objects, regardless of the wiki.

Or, to remove ALL 301 redirects for meta.miraheze.org: varnish> ban obj.status == 301 && req.http.Host == meta.miraheze.org

Backend health checks
Varnish uses a Miraheze-configured backend health check (called probes). For Miraheze, it checks (for each appserver) each 5 seconds if https://meta.miraheze.org/wiki/Miraheze loads under 4 seconds. If an appserver fails to serve that page under 4 seconds (or the response does not have HTTP status code 200) for at least 2 out of 5 times, it will be marked as sick, and it will be depooled automatically until it looks healthy again.

Using the  and   commands in varnishadm, you can view the current health of all appservers.

Overriding health checks (e.g. depooling an appserver)
Sometimes, you want to override the health checks, for example, because you want to depool an appserver.

Using the  command, you can manage the health status of an appserver. For example, to depool mw121 (again, '''please ensure that you run this command on ALL cache proxies): varnish> backend.set_health mw121 sick

This will mark the backend as sick (= depooled), and it will stay depooled regardless of the health checks, as long as Varnish wouldn't be restarted (or the health set to good again).

If the health status of the backend should depend on the health checks (i.e., maintenance completed, appserver can be repooled), set the health status to auto: varnish> backend.set_health mw121 auto

This allows Varnish probes to manage the backend health state again.