Robots.txt and SEO: Overview & Implementation

Robots.txt and SEO: Overview & Implementation

Robots.txt and SEO: Overview & Implementation

News Flash: The Web Is Huge.

By current estimates there are over 38 billion pages that have been indexed by Google, and that represents just a small slice of the what is actually out there. Search engines use automated processes called “robots” to scan the internet. Types of robots include spiders, crawlers, wanderers, and more. They can throw your numbers off, so your hit counter says you’ve got a million visits, and no calls. While that may seem interesting to your average techie, it’s not exactly what business owners are interested in. SEO is all about getting found for what you do.

I get a lot of questions from clients like “I’ve done everything I can for my site, but I’ve heard of this robotic text thing…How much will that cost?” The answer is, roughly five minutes. The next question is “Why do I need this?” A good robots.txt will make your web stats more accurate.

A crawler reads things it finds and adds it to the index search engines use to give results to users. The “good” robots will search your root directory for a robots.txt file, which gives instructions on what not to crawl.

The Point: A robots.txt tells search engines what not to crawl. Adding a good robots.txt (and an XML sitemap) will help your SEO. It makes your site compliant and prevents search engines from crawling information they don’t need.

The Goods: Now that we have an understanding of what robots do and why it’s important to have a properly configured robots.txt file let’s make one.

Making your Own robots.txt

Create a blank document in Notepad (or TextEdit on a Mac). The first line should read as follows:

# robots.txt for http://www.example.com/

Obviously, use the name of your site instead of example.com 🙂

The following line specifies which robot these instructions pertain to. Using “*” is telling all robots that load it to follow the directions in the file.

User-agent: *

To allow a robot to look at everything (no exceptions) we leave a blank disallow command:

Disallow:

The last command generally found in a robots.txt is a Sitemap link. More about Sitemaps and SEO.

Sitemap: http://www.example.com/sitemap.xml

 

 

So your final robots.txt file that allows all robots to properly crawl your site looks like this:

# Robots file for http://www.example.com/

User-agent: *

Disallow:

Sitemap: http://www.example.com/sitemap.xml

When you’re done, save your file (as “robots.txt”) and put it in the main folder of your website. This is always referred to as your root directory.

That covers the basics of the robots.txt and SEO. The greatest resource I’ve found online is the aptly named robotstxt.org. Check it out for more detailed information and implement one on your site today.

Mike Maurin
[email protected]