Tech:SRE noticeboard

Welcome to the SRE noticeboard. This page is used to post updates from SRE, not for general help. For assistance, see the Help center.

Cloud14 issues
The cloud server (cloud14) which hosts one of our databases, db141, experienced a disk issue. As a result, a small number of wikis hosted on db141 are unavailable. We are working with our providers to resolve this issue but we have no ETA on when the server will be up. We deeply apologise for the inconvenience but rest assured we're working diligently to have this issue fixed ASAP.


 * UPDATES
 * 6PM (UTC), Fri. - We are working on bringing cloud14 back online, albeit without the databases. Once cloud14 is back, MediaWiki speeds should improve and mail should work once again. We have no ETA for when wikis will return however.

A cloud server (cloud14) hosting one of our database, db141, ran into disk issues. As a result, the database cannot be accessed and some services hosted by the cloud server have been knocked offline.
 * FAQ
 * What happened?

Only wikis on db141. Affected wikis display an error saying "Wiki temporarily unavailable." Most wikis on Miraheze are fine.
 * Who is affected?

We do not have an ETA but will update you as soon as possible.
 * When will this be fixed?

We cannot assert anything. It may be possible that the disks are not actually faulty but rather the RAID controller which would mean your data is safe or it's possible the actual disks have gone bad. If it is the latter, that would indicate we received a bad batch of SSDs from the manufacturer.
 * Is data loss involved?

Most notably affected is mail which means emails for password resets and registration does not work. While all other emails like sre@, cvt@, and stewards@ are able to receive incoming mail, we cannot check them due to an outage of our internal login system so we recommend sending us DMs if you need assistance. For power users, you may notice the Recent Changes feed on IRC is down meaning that some IRC bots may not function correctly and Wiki-Bot's recent changes feature is now likely unavailable too. We are working to restore function.
 * What other services are affected?

At this moment, we are working to reinstall cloud14 on new disks. Once this is done, we will setup new MediaWiki servers to improve speed and we will work to restore other ancillary services. We are also temporarily downgrading our test server from MediaWiki 1.39 to MediaWiki 1.38 to put it into production which means the upgrade to MediaWiki 1.39 will be indefinitely paused.
 * What is the plan for now?

Our number one priority at this moment is restoring wikis. About 500 open public wikis are affected by this so we understand this has certainly caused an impact for many of Miraheze's users. Rest assured we have not forgotten about those wikis. Every one of our 5,500+ wikis is important so we are working very hard to restore these wiki's data and bring them back online. We are so grateful that for the patience our users have had before this unprecedented issu. We will be posting updates here so please stay tuned. If you have any questions, please join us on our Discord. Thank you. Miraheze Site Reliability Engineering 00:00, 18 November 2022 (UTC)