Tech:SRE noticeboard

Welcome to the SRE noticeboard. This page is used to post updates from SRE, not for general help. For assistance, see the Help center.

Cloud14 issues
The cloud server (cloud14) which hosts one of our database, db141, experienced a disk issue. As a result, a small number of wikis hosted on db141 are unavailable. We have reinstalled the affected server on new disks and are working to recover the data from the affected disks. We deeply apologise for the inconvenience but rest assured we're working diligently to have this issue fixed ASAP.


 * LATEST UPDATE
 * 3PM (UTC) Monday, Dec. 13 - Void has received the disks and is analyzing them. Initial evaluations are positive but we cannot ascertain anything yet and have no further updates but we promise to update you once we do.

A cloud server (cloud14) hosting one of our database, db141, ran into disk issues. As a result, the database cannot be accessed and some services hosted by the cloud server have been knocked offline. We have reinstalled the affected cloud server on new disks and are working to restore affected services.
 * FAQ
 * What happened?

Only wikis on db141. Affected wikis display an error saying "Action required" and feature a link to Tech:Wiki recreations. Most wikis on Miraheze are fine.
 * Who is affected?

While cloud14 has been reinstalled on new disks, data recovery may take a while and we have no ETA.
 * When will this be fixed?

We are unsure. It may be possible that the disks are not actually faulty but rather that the RAID controller is which would mean your data is safe, or it's possible the actual disks have gone bad. If it is the latter, that would indicate we received a bad batch of SSDs from the manufacturer.
 * Is data loss involved?

At this moment, no other services are affected.
 * What other services are affected?

We have reinstalled the affected cloud server on new disks. We are analyzing the physical disks to see what can be done to restore data.
 * What is the plan for now?

Our number one priority at this moment is restoring wikis. About 500 open public wikis are affected by this so we understand this has certainly caused an impact for many of Miraheze's users. Rest assured we have not forgotten about those wikis. Every one of our 5,500+ wikis is important so we are working very hard to restore these wiki's data and bring them back online. We are so grateful that for the patience our users have had before this unprecedented issue. We will be posting updates here so please stay tuned. If you have any questions, please join us on our Discord. Thank you. Miraheze Site Reliability Engineering 00:00, 12 December 2022 (UTC)