Tech:Incidents/2018-04-26-DataLoss

From Meta
Jump to navigation Jump to search

Paladox accidentally deleted and dropped testwiki.

Summary[edit source]

  • What services were affected?
    • MediaWiki, visual farm wide outage for a period of time.
  • How long was there a visible outage?
    • ~11 minutes
  • Was it caused by human error, supply/demand issues or something unknown currently?
    • Human error. Paladox deleted the wrong database at 22:29 (testwiki) and realised immediately after.
  • Was the incident aggravated by human contact, users or investigating?
    • Can not be aggravated at all.

Timeline[edit source]

All times are in UTC.

April 27
  • [22:28:45] <+paladox> !log DELETE FROM cw_wikis WHERE wiki_dbname = "testwiki"; on db2
  • [22:29:46] <+paladox> !log /srv/mediawiki/w/extensions/MirahezeMagic/maintenance/removeDeletedWikis.php --wiki testwiki on mw1
  • [22:29:53] <+paladox> !log delete db testwiki from db4
  • [22:38:22] <MacFan4000> testwiki should be back on db2 now paladox
  • [22:39:10] <+paladox> Voidwalker i need to move the db over to db4
  • [22:41:39] <+paladox> Works now
  • [22:41:41] <+paladox> Voidwalker ^^

Quick facts[edit source]

  • Are there any known issues with the service in production?
    • No
  • Was the cause preventable by us?
    • If we had a backup we could have restored from that.
  • Have there been any similar incidents?
    • No

Conclusions[edit source]

  • Was the incident preventable? If so, how?
    • Yes, by the DROP DATABASE adding a confirm prompt.
  • Is the issue rooted in our infrastructure design?
  • State any weaknesses and how they can be addressed.
    • Once the DROP DATABASE command is sent, there is no way to cancel, and the wiki is deleted forever.
  • State any strengths and how they prevented or assisted in investigating the incident.
    • Not applicable.

Actionables[edit source]

  • No actionable known.

Meta[edit source]

  • Who responded to this incident?
    • Paladox
  • What services were affected?
    • MediaWiki, visually for a period of time.
    • testwiki for data loss.
  • Who, therefore, needs to review this report?
    • All Operations members.
  • Timestamp: ...