Robots exclusion standard

The robots exclusion standard (also called the robots exclusion protocol or robots.txt protocol) is a way of telling Web crawlers and other Web robots which parts of a Web site they can see.

To give robots instructions about which pages of a Web site they can access, site owners put a text file called robots.txt in the main directory of their Web site, e.g. http://www.example.com/robots.txt.^[1] This text file tells robots which parts of the site they can and cannot access. However, robots can ignore robots.txt files, especially malicious (bad) robots.^[2] If the robots.txt file does not exist, Web robots assume that they can see all parts of the site.

Examples of robots.txt files

References

↑ "Robot Exclusion Standard". HelpForWebBeginners.com. Archived from the original on 2011-12-08. Retrieved 2012-02-13.
↑ "About /robots.txt". Robotstxt.org. Retrieved 2012-02-13.

This short article about technology can be made longer. You can help Wikipedia by adding to it.

[1] "Robot Exclusion Standard". HelpForWebBeginners.com. Archived from the original on 2011-12-08. Retrieved 2012-02-13.

[2] "About /robots.txt". Robotstxt.org. Retrieved 2012-02-13.

[1]

[2]