Tech:Incidents/2018-04-26-DataLoss

    From Meta

    Paladox accidentally deleted and dropped testwiki.

    Summary[edit | edit source]

    • What services were affected?
      • MediaWiki, visual farm wide outage for a period of time.
    • How long was there a visible outage?
      • ~11 minutes
    • Was it caused by human error, supply/demand issues or something unknown currently?
      • Human error. Paladox deleted the wrong database at 22:29 (testwiki) and realised immediately after.
    • Was the incident aggravated by human contact, users or investigating?
      • Can not be aggravated at all.

    Timeline[edit | edit source]

    All times are in UTC.

    April 27
    • [22:28:45] <+paladox> !log DELETE FROM cw_wikis WHERE wiki_dbname = "testwiki"; on db2
    • [22:29:46] <+paladox> !log /srv/mediawiki/w/extensions/MirahezeMagic/maintenance/removeDeletedWikis.php --wiki testwiki on mw1
    • [22:29:53] <+paladox> !log delete db testwiki from db4
    • [22:38:22] <MacFan4000> testwiki should be back on db2 now paladox
    • [22:39:10] <+paladox> Voidwalker i need to move the db over to db4
    • [22:41:39] <+paladox> Works now
    • [22:41:41] <+paladox> Voidwalker ^^

    Quick facts[edit | edit source]

    • Are there any known issues with the service in production?
      • No
    • Was the cause preventable by us?
      • If we had a backup we could have restored from that.
    • Have there been any similar incidents?
      • No

    Conclusions[edit | edit source]

    • Was the incident preventable? If so, how?
      • Yes, by the DROP DATABASE adding a confirm prompt.
    • Is the issue rooted in our infrastructure design?
    • State any weaknesses and how they can be addressed.
      • Once the DROP DATABASE command is sent, there is no way to cancel, and the wiki is deleted forever.
    • State any strengths and how they prevented or assisted in investigating the incident.
      • Not applicable.

    Actionables[edit | edit source]

    • No actionable known.

    Meta[edit | edit source]

    • Who responded to this incident?
      • Paladox
    • What services were affected?
      • MediaWiki, visually for a period of time.
      • testwiki for data loss.
    • Who, therefore, needs to review this report?
      • All Operations members.
    • Timestamp: ...