User:NotAracham/Wiki Recovery Guide (db141 outage)

This page has been created by myself (User:NotAracham) in response to the news that data recovery for wikis impacted by the DB141 outage (November 2022) may not happen or may take weeks, necessitating some near-term efforts by our local admins and bureaucrats to save their content.

As most folks might be less-familiar with the options available to them, this guide hopes to cover common recovery options in sufficient detail for any level of technical expertise.

A note to those reading, if you spot anything that appears wrong/would benefit from a rewrite, please feel free to reach out on Discord or the discussion page for this guide and provide details!

How do I know if I am impacted?
The quickest way to confirm if you are one of the impacted cohort is to try to navigate to your wiki's main page. If you receive a message to the effect of 'Wiki Temporarily Unavailable' or 'Action Required', this is likely the case.

Example Images:



Requesting Re-Creation of a Wiki
If you are one of the impacted wikis (and are the original requester for the wiki), you can now request that your domain and wiki be reopened (without article contents) via either Phabricator or Discord.

For up-to-date details on the outage, instructions on filing the request, and your available options, please check out Tech:Wiki_recreations.

Phabricator: https://phabricator.miraheze.org/T10015 Discord: https://discord.com/channels/407504499280707585/1046960363930910781

If you have not previously signed up for an account on Phabricator, please follow the steps found here first before requesting re-creation through Phabricator.

NOTE: images will be unaffected by this outage and will still be available, but all text content on a wiki will be lost until the unlikely case of data recovery from the damaged disks.

Resurrecting Content
Now that your wiki is re-opened, you might notice that it's as empty as when it was first created.

Recovery options for the missing options mostly fall into three categories plus an additional fourth in the form of the official Oct 2021 public backups:


 * Full XML Backup - If this can be located, it will restore all text pages to your wiki as of the backup date. While extensions will still need to be re-enabled, this will recover data up to that date and potentially save quite a lot of work.
 * Archive.org individual page backup - If this can be located, it'll allow you to extract the text of pages one at a time and re-create them on your re-opened wiki
 * Searching caches of individual search engines - This is the last-resort method but should allow you to recover the text of some pages.

For those who were creating Private wikis (e.g. explicitly made private at creation or limited access to content beyond a handful of users), the likelihood of full XML backups or cached pages is slim to none, but the following steps are worth trying.

While not all wikis will have clean XML backups (covering text content) readily available, not all hope is lost.

Option 1: Archive.org search for XML backups
While this won't necessarily solve for everything (especially for those wikis created AFTER June 2022 or private wikis), this is a good first step for troubleshooting.

Some options on how to begin your search:
 * Navigate to https://archive.org/details/wiki-NAMEOFWIKImirahezeorg_w, replacing NAMEOFWIKI with your wiki's subdomain name
 * e.g. if your wiki was at bread.miraheze.org, the URL should read https://archive.org/details/wiki-breadmirahezeorg_w
 * Navigate to the Wiki Collections collection and search for the XML file using your wiki's subdomain, using the 'search this collection' box at left 
 * Navigate to the WikiTeam collection and search for the XML File using your wiki's subdomain, using the 'search this collection' box at left 
 * Lastly, just search https://archive.org for any parts of your wiki's name, it's possible this might succeed.

If you are able to locate an XML backup file, great! Download the file and extract the XML file located in the compressed folder suffixed with -wikidump.7z

Once you've extracted the XML file, you have a few options to proceed: More details: Moving_a_wiki_to_Miraheze While this won't get 100% of content restored, this will most likely put you several steps forward to recovery.
 * Try to import the XML yourself on your wiki's Special:Import page (recommended if xml is under 1MB)
 * File a request via Special:RequestImportDump (recommended if xml is under 250MB)
 * Open a Phabricator Ticket to have the XML file imported. (recommended if file is over 250MB)

Option 2: Searching for individual page backups
In lieu of full XML backups, archive.org (also known as the Wayback Machine) offers some degree of visibility into prior state of webpages, including actual page content and the ability to link between pages.

The process for pursuing these is similar to the above steps in Option 1, except instead of using search terms for your wiki, instead use the full URL of the page of interest.

Once locating a page you need to copy over, either load the source (if possible) or copy the page text verbatim and publish that to a page of the same name on your rebuilt wiki.

While this isn't ideal, this is often better than re-writing pages from scratch.

Option 3A: Use the Web Archives extension
Per suggestions from other users, The Web Archives browser extension can potentially serve as a means to browse history of pages you've visited in the past and compare against archived versions from several different search engines.

This has the benefit of not being reliant on a specific search engine to get a cached version of a page, as different engines may have snapshotted a page at very different times.

Link To Google Chrome version: Link to Firefox version:

DISCLAIMER: Miraheze is not affiliated with nor directly endorses this extension, the user is installing and using this extension at their own risk.

Option 3B: Querying pages directly in different search engines
Individual search engines may have also captured page contents and may serve as a way to recover missing content, albeit in a much slower fashion.

Like the previous Option, locate a cached version of the page you want to restore and copy contents, either through source-editing or copying text verbatim, then paste those contents to an identically-named page on your re-created wiki

Google

 * Enter your search term (or URL) for a specific page into the google search interface, removing the https:// portion if needed
 * Alternately, enter site:blah.miraheze.org as your search term, replacing blah with your wiki's domain to load all pages that google has cached
 * Click the vertical ... next to a search result
 * Click the Cached bubble to load the last good cached version of the page in question

Microsoft Edge

 * Enter your search term (or URL) for a specific page into the Edge interface, removing the https:// portion if needed
 * Click the Drop-Down arrow, then click Cached

Other Browsers
Most other browsers will be some variation of the above two steps. Enterpising editors, please feel free to add browser/search engine specific instructions I have missed.

Option 4: Miraheze Legacy Backups
If all other options have failed AND your wiki was Public AND existed before October 2021, this is the last known date of full backups from official sources. If you wish to have a backup restored to your wiki from the October 2021 backups, please leave a notice on the Stewards' noticeboard and they will assist when able.

Disclaimer
Miraheze does not officially support any of the methods described above for wiki recovery.

However, given the unprecedented nature of the issue, this list was compiled on short notice to document work done by the community towards providing a resolution for those wikis attempting recovery.

All suggestions should be used at user discretion and success is not guaranteed.

Additional Credits
Documenting the contributors on discord or elsewhere who provided helpful suggestions on forming this list of advice.


 * User:Chickadee
 * Discord User: Angry Cider#8111
 * Discord User: -PM-Polskacafe#8382
 * Discord User: Stephanus Tavilrond#1692
 * Discord User: Quadcross#2768
 * Discord User: pipe#4348
 * The entire MH SRE team for the hard work towards incident/recovery management