Summary

  • What services were affected?
    • MediaWiki
  • How long was there a visible outage?
    • 8 mins.
  • Was it caused by human error, supply/demand issues or something unknown currently?
    • yes, was caused by syntax error.
  • Was the incident aggravated by human contact, users or investigating?
    • No.

Timeline

  • 21:27 paladox committed 6a1381f - Testing CentralNotice opt out on test1wiki
  • 21:33 icinga-miraheze PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 3 backends are down. mw1 mw2 mw3
  • 21:36 John reverts commit 19a30d8 - Revert "Testing CentralNotice opt out on test1wiki" This reverts commit 6a1381f99902009f30f8c6211ae014b6d0f4510a.
  • 21:38 icinga-miraheze RECOVERY - cp2 HTTPS on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 23567 bytes in 0.501 second response time

Quick facts

  • Are there any known issues with the service in production?
    • Nope.
  • Was the cause preventable by us?
    • Yes, by making sure to look over your change before merging, or when stepping away tell someone to keep a eye out.
  • Have there been any similar incidents?
    • Definitely

Conclusions

  • Was the incident preventable? If so, how?
    • Yes, when stepping away tell someone so they monitor the roll out.
  • Is the issue rooted in our infrastructure design?
    • Nope.
  • State any weaknesses and how they can be addressed.
    • None.
  • State any strengths and how they prevented or assisted in investigating the incident.
    • John was quick to revert my patch.

Meta

  • Who responded to this incident?
    • John
  • What services were affected?
    • MediaWiki
  • Who, therefore, needs to review this report?
    • John
  • Timestamp.
    • 22:38, 4 November 2018 (UTC)