What Is Robots.txt?

What Is Robots.txt?

Robots.txt is nothing but a text file that you usually put on your website in order to indicate search robots which pages you would like them not to visit. However, you must note that Robots.txt is by no means mandatory for search engines. In fact, it is very essential to make clear that robots.txt doesn't provide any way from preventing search engines from crawling your site (i.e. it is not a firewall or a kind of password protection).

Structure of a Robots.txt File

The structure of a robots.txt is quite simple and at the same time very flexible. In fact, it consists of an endless list of user agents and disallowed files and directories. Primarily, the syntax of Robots.txt file is as follows:



Kindly note that here the "User-agent' is search engines' crawlers and the term "disallow: generally lists the files and directories that should be excluded from indexing. In addition to "user-agent:" and "disallow:" entries, you have got the right to include comment lines - just put the # sign at the commencement of the line:

# All user agents are disallowed to see the /temp directory.

User-agent: *

Disallow: /temp/

Where to put robots.txt file?

Without any doubt, the prime location of Robots.txt is definitely very crucial. You must ensure that it remain in the main directory because if it is not there, then user agents (search engines) will not be able to find it. Usually search engines do not search the whole site for a file named robots.txt. In fact, in almost every situation they look first in the main directory (i.e. http://mydomain.com/robots.txt). In case, it is not there or fails to identify it, then they merely assume that website does not have a robots.txt file and therefore they index everything they find along the way. Consequently, it is very important that you put robots.txt in the right place.

As a web site owner, it is very essential that you put it in the right place on your web server for that resulting URL to work. Usually that is the same location where you put your website's main "index.html" welcome page. How to put the file there depends on your web server software.

The file will be ignored unless it is at the root of your host:

  • Used: http://example.com/robots.txt
  • Ignored: http://example.com/site/robots.txt

What program should I use to create /robots.txt?

Well, to create a robots.txt file you can use anything that produces a text file, such as

  • On Microsoft Windows, the best option that you can opt is the usage of notepad.exe, or wordpad.exe (Save as Text Document), or even Microsoft Word (Save as Plain Text)
  • On the Macintosh, you can also make a proper use of TextEdit (Format->Make Plain Text, then Save as Western)
  • On Linux, vi or emacs

How to use /robots.txt on a virtual host?

The term "virtual host" can at times mean different things like:

A "virtual host" web server makes use of the HTTP Host Header to differentiate requests to different domain names on the same IP address. In this case the presence of domain on a shared host makes no big difference to a visiting robot, and you can put a /robots.txt file in the directory dedicated to your