Tech:SRE noticeboard

Welcome to the SRE noticeboard. This page is used to post updates from SRE, not for general help. For assistance, see the Help center.

Cloud14 issues
 LATEST UPDATE: 
 * 20:00 (UTC), Monday, 26 December 2022 - We have uploaded the databases to our new database server. We are now working to get the wiki back online. Due to the holidays, this may take a bit longer than usual. Thank you for your understanding.

Originally posted on 19 December, 2022:

We have very important news regarding db141.

Yesterday, we were able to access and recover the data on the corrupted drives, including the drives which contained the original. We intend to begin restoring affected wikis as soon as possible and we will be releasing more information about it once we get details finalised.

Now, during our scheduled maintenance yesterday, we encountered an issue attempting to get new storage drives detected by our  server. We asked our hosting provider to reseat the disks but we then deemed it unnecessary and cancelled the request. Unfortunately, the request was still being processed and our hosting provider mistakenly reseated 's drives while the server was on. Due to this, the server locked up and we had to run some file system repair tools (fsck specifcially) to get it back online.

Once  and all its virtual servers came back up, we discovered that because the new   was running and writing data when that happened, the database had become corrupted. Thankfully, we ran backups for most wikis yesterday so they should be safe. This has made the task of restoring wikis affected by the original  outage a bit easier. What we plan to do is restore all original  wikis from the recovered disks and then, using our backups, merge new edits made on the recreated wikis back into these original wikis. We do not have an ETA for when we will do this but we are thrilled to have recovered the data.

We apologise for yet another downtime on these wikis but this incident has helped foment stronger backup procedures to prevent catastrophic disasters from occurring. We are now working to restore these wikis and we will provide more information once we have it. Thank you for your understanding.

TL;DR: We recovered the data from the broken, original db141 disks but the new db141 was corrupted due to a hosting provider error. We have backups from yesterday so we will now restore the original db141 and merge the edits from recreated wikis back into the old wikis. Miraheze Site Reliability Engineering 00:00, 26 December 2022 (UTC)