Tech:Incidents/2018-08-31-TimedMediaHandler

Summary
Provide a summary of the incident: No
 * What services were affected? Wikis using TMH
 * How long was there a visible outage? About 20 mins
 * Was it caused by human error, supply/demand issues or something unknown currently? Yes, removing an unmaintained extension that is reqired by TMH.
 * Was the incident aggravated by human contact, users or investigating?

Timeline
Provide a timeline of everything that happened from the first reports to the resolution of the incident. If the time of the very first incident is know (previous incident, the time the service failed, time a patch was applied), include it as well. Time should be in 24-hour standard based on the UTC timezone.
 * 12:39 UTC: MacFan4000 sees that puppet has been failing on all mw servers. He investigates. And notify staff that the GH repo for an extension has been deleted.
 * 12:47 MacFan4000 begins removing the ext.
 * 1:02 reports of broken wikis come in
 * 1:05 MacFan4000 reverts his changes and gets everything back up

Quick facts
Provide any relevant quick facts that may be relevant:
 * Are there any known issues with the service in production? No
 * Was the cause preventable by us? Yes
 * Have there been any similar incidents? No

Conclusions
Provide conclusions that have been drawn from this incident only:
 * Was the incident preventable? If so, how? Yes, the mw.org docs could have been more accurate
 * Is the issue rooted in our infrastructure design? No
 * State any weaknesses and how they can be addressed. None
 * State any strengths and how they prevented or assisted in investigating the incident. None

Meta

 * Who responded to this incident? MacFan4000
 * What services were affected? Wikis with TMH
 * Who, therefore, needs to review this report? Ops
 * Timestamp. 13:52, 31 August 2018 (UTC)