Is Your Robots.txt File Hurting Traffic?

Earlier this year, I made the mistake of taking the advice of an seo guru, and modified my robots.txt file per his instructions. The issue he brought up was one of duplicate content, and how you don’t want the spiders coming across duplicate content through your archives, tag pages, author pages, etc. So to deal with that, they suggested adding a series of “disallow” statements to the robots.txt file, including one for /feed.

I never gave this much thought – I use the FeedSmith plugin to redirect RSS readers to the FeedBurner version, and life was good. But did you know that Google relies on the original feed (the one we disallowed) in order to pull current posts for Google Blog Search and other properties of theirs?

Yep, I found this out the hard way.. A few days ago, I’m wandering through Google Blog Search looking for some news, and I’m curious why our own site isn’t showing up. I do a quck blogurl:domainname.com and find that the last post from our domain was 7 months ago! Horrified, I quickly punch in a bunch of other websites that we run, only to find the same results.

A few of our sites were still getting indexed despite the disallow in the robots.txt file, but those were the exception to the rule. A Google engineer confirmed that the disallow statement is what caused our blog to stop getting indexed, and once that statement was removed, we started seeing new stories picked up by GBS!

Hope this helps!


Enjoyed this post?
Subscribe to Zander Chance via RSS Feed or E-mail and receive daily news updates!

Submit to Digg  Stumble This Story  Share on Twitter  Post on Facebook  Post on MySpace  Add to del.icio.us  Submit to Reddit  Fave on Technorati

Leave a Reply