Tech:Incidents/2018-08-22-Database

From Meta
Jump to navigation Jump to search

A bad copy of database grants into the MySQL terminal caused 17 minutes of downtime.

Timeline[edit source]

  • 16:46 Reception123: copy and pasted grants from Phabricator [1] into the MySQL prompt
  • 16:46 Reception123: notices that there is a 503 error and asks John and Paladox for assistance
  • 17:01 Paladox: notices that Reception123 copied it from Puppet instead of Db4 directly, and informs Reception123
  • 17:03 Reception123 copies the file from the correct location in db4, and the wikis go back up

Conclusion[edit source]

Reception123 copied a wrong file from Puppet which was only an .erb file, and only contained Puppet locations, but not actual passwords for MySQL, which caused a connection error to the database. Way more care must be taken when copying things into the MySQL prompt, and Reception123 should have made himself aware of the proper procedure before executing this.

  • Before executing something, sysadmins must be sure of what they are doing, and think about the consequences that their actions have.

The cause here is obviously, human error.

Actionables[edit source]

No actionables for this, since it was caused entirely by human error

Reporting[edit source]

  • What services/sites were used to report the downtime?
    • Twitter
  • What other services/sites were available for reporting, but were not used?
    • Facebook, IRC

Meta[edit source]

  • Online during downtime: Reception123, John, Paladox, NDKilla
  • Who responded to this incident?
    • Reception123, John, Paladox, NDKilla
  • What services were affected?
    • MediaWiki
  • Who, therefore, needs to review this report?
    • All Operations members
  • Timestamp: Reception123 (talk) (C) 16:08, 23 August 2018 (UTC)