Tech:Incidents/2018-08-31-TimedMediaHandler

All wikis using TimedMediaHandler were down for about an hour, due to an unmaintained extension (that is required by TimedMediaHandler) accidentally being removed.

Summary
Provide a summary of the incident:
 * What services were affected?
 * Wikis using Extension:TimedMediaHandler (TMH)
 * How long was there a visible outage?
 * Wikis visibly down: 59 minutes. Function lost until 5:47 UTC. (was slowly enabled on wikis until then)
 * Was it caused by human error, supply/demand issues or something unknown currently?
 * Yes, removing an unmaintained extension that is reqired by TMH.
 * Was the incident aggravated by human contact, users or investigating?
 * No

Timeline
All times are in UTC.


 * 12:46: MwEmbedSupport is removed by MacFan4000, as he says it no longer exists in the GitHub repository, and it has been archived, after noticing that puppet has been failing on all MediaWiki servers and investigates.
 * 12:47 MacFan4000 begins removing the extension.
 * 12:49 Paladox informs MacFan4000 that the extension should not have been removed, referring to the fact that he previously told him to not remove it.
 * 12:50 Paladox also informs MacFan4000 that the wikis have been broken.
 * 12:51 John also states that the extension is unmaintained, but is a requirement for TimedMediaHandler
 * 13:02 Reports of broken wikis come in, proving the points made by John and Paladox.
 * 13:05 MacFan4000 reverts his changes and changes the submodule url, but clones the wrong branch by mistake. John, disables the extension MacFan4000, fixes the branch, and slowly reenables the extension.

Quick facts
Provide any relevant quick facts that may be relevant:
 * Are there any known issues with the service in production?
 * No
 * Was the cause preventable by us?
 * Yes
 * Have there been any similar incidents?
 * No

Conclusions

 * Was the incident preventable? If so, how?
 * Yes, the docs on MediaWiki.org could have been more accurate in showing that TimedMediaHandler depends on another extension.
 * The TimedMediaHandler extension should have been updated prior to the removal of the extension.
 * Sysadmins should be careful when removing any extension, and make sure there are no dependencies to it

Reporting

 * What services/sites were used to report the downtime?
 * None
 * What other services/sites were available for reporting, but were not used?
 * All.

Meta

 * Who responded to this incident?
 * MacFan4000
 * What services were affected?
 * Wikis with TimedMediaHandler
 * Who, therefore, needs to review this report?
 * Site Reliabilty Engineering
 * Timestamp. 13:52, 31 August 2018 (UTC)