Tech:Incidents/2018-04-26-DataLoss

Summary
At 23:29:53 @Paladox deleted database testwiki. A minute later he realised it was the wrong database. He was meaning to deleted testdeletewiki.

This was caused by human error.

The outage lasted from 23:29:53 until 23:41:39


 * Note: All times are in BST (British Summer Time)

Timeline
All times are in BST (British Summer Time).


 * April 27
 * [23:28:45] <+paladox>	!log DELETE FROM cw_wikis WHERE wiki_dbname = "testwiki"; on db2
 * [23:29:46] <+paladox>	!log /srv/mediawiki/w/extensions/MirahezeMagic/maintenance/removeDeletedWikis.php --wiki testwiki on mw1
 * [23:29:53] <+paladox>	!log delete db testwiki from db4
 * [23:38:22] 	testwiki should be back on db2 now paladox
 * [23:39:10] <+paladox>	Voidwalker i need to move the db over to db4
 * [23:41:39] <+paladox>	Works now
 * [23:41:41] <+paladox>	Voidwalker ^^

Quick facts
Provide any relevant quick facts that may be relevant:
 * Are there any known issues with the service in production?
 * Was the cause preventable by us?
 * Have there been any similar incidents?

Conclusions
Provide conclusions that have been drawn from this incident only:
 * Was the incident preventable? If so, how?
 * Is the issue rooted in our infrastructure design?
 * State any weaknesses and how they can be addressed.
 * State any strengths and how they prevented or assisted in investigating the incident.

Actionables
List all things we can do immediately (or in our current state) to prevent this occurring again. Include links to Phabricator issues which should go into more detail, these should only be one line notes! e.g. ": Monitor service responses with GDNSD and pool/depool servers based on these."

Meta

 * Who responded to this incident?
 * What services were affected?
 * Who, therefore, needs to review this report?
 * Timestamp.