Help:Robots.txt

From Miraheze Meta, Miraheze's central coordination wiki
This page is a translated version of the page Help:Robots.txt and the translation is 30% complete.

Vad är robots.txt?

Det bestämmer om och när sökmotorns sökrobotar kan besöka en webbsidas sidor och inkludera dem i sökmotorns index.

How can I modify it?

You can modify your own robots.txt from your wiki, on the page MediaWiki:Robots.txt. This will append to our global robots.txt. MediaWiki will never allow indexing of any special pages, or api.php.

What can I put in it?

Robots.txt supports many indexing related keywords. This includes:

  • User-agent:[Required, one or more per group] The directive specifies the name of the automatic client known as search engine crawler that the rule applies to. This is the first line for any rule group. Google user agent names are listed in the Google list of user agents. Using an asterisk (*) will match all crawlers except the various AdsBot crawlers, which must be named explicitly.[1]
  • Disallow:[At least one or more Disallow or Allow entries per rule] A directory or page, relative to the root domain, that you don't want the user agent to crawl. If the rule refers to a page, it should be the full page name as shown in the browser; if it refers to a directory, it should end in a / mark.[1]
  • Allow:[At least one or more Disallow or Allow entries per rule] A directory or page, relative to the root domain, that may be crawled by the user agent just mentioned. This is used to override a Disallow directive to allow crawling of a subdirectory or page in a disallowed directory. For a single page, the full page name as shown in the browser should be specified. In case of a directory, the rule should end in a / mark.[1]

What is the format I should use?

Common practice would be to use each new rule on a new line. Some search engines recognize some patterns you can use in robots.txt. For example, Disallow: /*example$ will match pattern. * means wildcard, indicating that part of the rule can match any part of the URL. $ is used to indicate the URL must end that way.

What are some examples?

This example will disallow crawling for the user-agent in User-agent, and the URL string from Disallow:

User-agent: [user-agent name]
Disallow: [URL string not to be crawled]

This example will disallow crawling for all supported user agents, and the URL string, /example/:

User-agent: *
Disallow: /example/

This example will disallow crawling for all supported user agents, and all content:

User-agent: *
Disallow: /

Var finns det?

Robots.txt finns alltid på subdomain.miraheze.org/robots.txt eller mycustomdomain.tld/robots.txt. Om du nyligen flyttat din wiki till en anpassad domän kan det ta några dagar innan dina robots.txt-filer finns tillgängliga för den nya domänen.

Referenser