Robot.txt Good or Bad!

Category »  Search Engines
Posted By Guido on 24 March 2006
Comments   |   Print   |   Mail it
Robot.txt Good or Bad!


This article will help you to decide for yourself once you have read it. It shows you how to create a robot.txt file.

Basically, robots.txt is a plain text file which is placed in a server's root directory it includes information on whether search engine robots should index the site or parts of the site. The file (line begins with '#'), then 'User-agent' lines.
Usually, the User-agent line is simply a wildcard, to exclude all robots, like so :

# robots.txt for http://yoursite.com
User-agent: *

although you can write seperate agent/disallow sections for different robots.
Next comes the Disallow section. this is read by the robot and from there, it determines what's off-limits when it comes to indexing your site. 

# robots.txt for http://yoursite.com
User-agent: *
Disallow: /administration/ # nothing under /administration/ should be spidered Disallow: /temp/ # these are temporary files
Disallow: /active.asp # active content here, no point spidering it

Disallowing pages deep into your structure can be good for your users, they won't find themselves halfway through the site with no idea how to get out. Then again, the more search engine entries you have, the better, right? It's up to you to decide what should or shouldn't be excluded.

OK so robot.txt is good for your users and to tell the search engines which pages to list but. Here is the bad part not all robots or bots are good some will ignor the robot.txt file and just index all pages it comes across. So some of your admin pages could get displayed somewhere.
Also now you know about robot.txt it needs to be in the root directory what's to stop someone who reads this article just going around and typing http://yoursite.com/robot.txt this would display your robot.txt file!

That would be like going into a pub and leaving you wallet on the bar and going to the loo. What's the chances of it still being there when you get back - yeh slim!
So now you know a bit about robot.txt it up to you to decide.

Good luck.

Guido



Powered by Active News