User:NotAracham/Wiki Recovery Guide (db141 outage)

This page has been created by myself (User:NotAracham) in response to the news that data recovery for wikis impacted by the DB141 outage (November 2022) may not happen or may take weeks, necessitating some near-term efforts by our local admins and bureaucrats to save their content.

As most folks might be less-familiar with the options available to them, this guide hopes to cover common recovery options in sufficient detail for any level of technical expertise.

A note to those reading, if you spot anything that appears wrong/would benefit from a rewrite, please feel free to reach out on Discord or the discussion page for this guide and provide details!

How do I know if I am impacted?
The quickest way to confirm if you are one of the impacted cohort is to try to navigate to your wiki's main page. If you receive a message to the effect of 'Wiki Temporarily Unavailable' or 'Action Required', this is likely the case.

Example Images:



Requesting Re-Creation of a Wiki
If you are one of the impacted wikis (and are the original requester for the wiki), you can now request that your domain and wiki be reopened (without article contents) via either Phabricator or Discord.

For up-to-date details on the outage, instructions on filing the request, and your available options, please check out Tech:Wiki_recreations.

Phabricator: https://phabricator.miraheze.org/T10015 Discord: https://discord.com/channels/407504499280707585/1046960363930910781

If you have not previously signed up for an account on Phabricator, please follow the steps found here first before requesting re-creation through Phabricator.

NOTE: images will be unaffected by this outage and will still be available, but all text content on a wiki will be lost until the unlikely case of data recovery from the damaged disks.

Resurrecting Content
Now that your wiki is re-opened, you might notice that it's as empty as when it was first created.

Recovery options for missing text content mostly fall into three categories, plus an additional fourth in the form of the official Oct 2021 public backups:


 * Full XML Backup - If this can be located, it will restore all text pages to your wiki as of the backup date. While extensions will still need to be re-enabled, this will recover data up to that date and potentially save you lots of work.
 * Archive.org individual page backup - If this can be located, it'll allow you to extract the text of pages one at a time and re-create them on your re-opened wiki
 * Searching caches of individual search engines - This is the last-resort method but should allow you to recover the text of some pages.

For those who were creating Private wikis (e.g. explicitly made private at creation or limited access to content beyond a handful of users), the likelihood of full XML backups or cached pages is slim to none, but the following steps are worth trying regardless.

Option 1: Archive.org search for XML backups
While this won't necessarily solve for everything (especially for those wikis created AFTER June 2022 or private wikis), this is a good first step for troubleshooting.

Some options on how to begin your search:
 * Navigate to https://archive.org/details/wiki-NAMEOFWIKImirahezeorg_w, replacing NAMEOFWIKI with your wiki's subdomain name
 * e.g. if your wiki was at bread.miraheze.org, the URL should read https://archive.org/details/wiki-breadmirahezeorg_w
 * Navigate to the Wiki Collections collection and search for the XML file using your wiki's subdomain, using the 'search this collection' box at left 
 * Navigate to the WikiTeam collection and search for the XML File using your wiki's subdomain, using the 'search this collection' box at left 
 * Lastly, just search https://archive.org for any parts of your wiki's name, it's possible this might succeed.

If you are able to locate an XML backup file, great! Click 'Show All', then Download the file with the suffix -wikidump.7z and extract the XML file located inside the downloaded file.

''NOTE: .7z is an file archive format. If you don't already have a program that can open this format, you can install 7-Zip, a reputable free tool for this purpose available at https://www.7-zip.org/''

Once you've extracted the XML file, you have a few options to proceed: More details: Moving_a_wiki_to_Miraheze While this won't get 100% of content restored, this will most likely put you several steps forward to recovery.
 * Try to import the XML yourself on your wiki's Special:Import page (recommended if xml file is under 1MB)
 * File a request via Special:RequestImportDump (recommended if xml file is under 250MB)
 * Open a Phabricator Ticket to have the XML file imported. (recommended if xml file is over 250MB)

If you receive a type mismatch error on upload of the XML backup (e.g. .xml does not match mime type of the file), please see the XML-specific Troubleshooting towards the end of this guide.

Option 2: Searching for individual page backups
In lieu of full XML backups, archive.org (also known as the Wayback Machine) offers some degree of visibility into prior state of webpages, including actual page content and the ability to link between pages.

The process for pursuing these is similar to the above steps in Option 1, except instead of using search terms for your wiki, instead use the full URL of the page of interest.

Once locating a page you need to copy over, either load the source (if possible) or copy the page text verbatim and publish that to a page of the same name on your rebuilt wiki.

While this isn't ideal, this is often better than re-writing pages from scratch.

Option 3A: Use the Web Archives extension
Per suggestions from other users, The Web Archives browser extension can potentially serve as a means to browse history of pages you've visited in the past and compare against archived versions from several different search engines.

This has the benefit of not being reliant on a specific search engine to get a cached version of a page, as different engines may have snapshotted a page at very different times.

Link To Google Chrome version: Link to Firefox version:

DISCLAIMER: Miraheze is not affiliated with nor directly endorses this extension, the user is installing and using this extension at their own risk.

Option 3B: Querying pages directly in different search engines
Individual search engines may have also captured page contents and may serve as a way to recover missing content, albeit in a much slower fashion.

Like the previous Option, locate a cached version of the page you want to restore and copy contents, either through source-editing or copying text verbatim, then paste those contents to an identically-named page on your re-created wiki

Google

 * Enter your search term (or URL) for a specific page into the google search interface, removing the https:// portion if needed
 * Alternately, enter site:blah.miraheze.org as your search term, replacing blah with your wiki's domain to load all pages that google has cached
 * Click the vertical ... next to a search result
 * Click the Cached bubble to load the last good cached version of the page in question

Microsoft Edge

 * Enter your search term (or URL) for a specific page into the Edge interface, removing the https:// portion if needed
 * Click the Drop-Down arrow, then click Cached

Other Browsers
Most other browsers will be some variation of the above two steps. Enterpising editors, please feel free to add browser/search engine specific instructions I have missed.

Option 4: Miraheze Legacy Backups
If all other options have failed AND your wiki was Public AND existed before October 2021, this is the last known date of full backups from official sources. If you wish to have a backup restored to your wiki from the October 2021 backups, please leave a notice on the Stewards' noticeboard and they will assist when able.

Frequently Asked Questions (FAQ)

 * If I request my wiki be re-created, will it have all of the data I lost by default?
 * No, while images and files were unaffected, you'll need to use the instructions above to try and recover article data
 * If I request my wiki be re-created, will this prevent me from getting the missing data if drive recovery is successful?
 * No! More on this later...
 * If I restore my articles from another backup (like august 2022) will that make me ineligible if data is recovered from the damaged drives?
 * No! More on this later...
 * So what happens if you are able to recover data from the damaged drives?
 * Exact next steps are TBD, but here's what you can expect:
 * If a page does not exist on the reopened wiki but does exist in the backup, it will be created from the recovered backup
 * If a page does exist on the reopened wiki...
 * the backup's copy gets inserted as a version in that page's history at the timestamp of the page's last edit from the backup
 * e.g. even if the XML backup was generated on August 2022, if the page was last edited prior to that date on May 2022, that's the timestamp applied to the inserted version
 * If you've made edits to the pages since reopening, it means the backup version will be marked as an earlier revision and not override the current state of the page
 * With this handling, existing pages will not be overridden, but users will have the option to see the recovered version and merge in content as they see fit

XML-Specific Troubleshooting
In some cases, despite finding (or already having) an XML backup, the user might get a type mismatch error when trying to upload it, e.g. File extension ".xml" does not match the detected MIME type of the file (text/html). Provided that you correctly extracted the file and the file name ends in .xml, this is most likely due to a syntax error in your XML file.

There are two main flavors of this issue, both caused by the backup tools being unable to fully convert a page into a safe-text version:
 * Missing closing tags (e.g. needs a corresponding tag after the page details are complete, needs a tag once you're done with revision related data)
 * broken page halfway through saving (e.g. page text cuts off abruptly and a new starts)

To find and fix these errors:
 * Open the XML file in any web browser, an error message should display at the top telling you what line the error occurs on.
 * Next, open the XML file in any text editor that includes a line counter. I personally use Notepad++.
 * Locate the line mentioned in the error message and identify which flavor of error is occurring.
 * Either remove the problem page in full or add the required closing tags to fix the page code.
 * Save changes on the file
 * Re-open the XML file in any web browser and see if another error appears. Repeat until you no longer receive errors.

Once you are no longer receiving errors, you can again try the upload instructions listed earlier in this guide.

As this workaround is more complex, please feel free to reach out to the Discord community for further assistance.

Disclaimer
Miraheze does not officially support any of the methods described above for wiki recovery.

However, given the unprecedented nature of the issue, this list was compiled on short notice to document work done by the community towards providing a resolution for those wikis attempting recovery.

All suggestions should be used at user discretion and success is not guaranteed.

Additional Credits
Documenting the contributors on discord or elsewhere who provided helpful suggestions on forming this list of advice.


 * User:Chickadee
 * Discord User: Angry Cider#8111
 * Discord User: -PM-Polskacafe#8382
 * Discord User: Stephanus Tavilrond#1692
 * Discord User: Quadcross#2768
 * Discord User: pipe#4348
 * The Discord community, for asking good questions and making sure we gave good answers
 * The entire MH SRE team for the hard work towards incident/recovery management