Why might GoogleBot get errors when trying to access my robots.txt file?.




GoogleBot




A great thing about Google is, it gives webmasters all the help they need to get their websites into Google's index. There's a nice tool available in Google Webmaster Tools called 'Fetch as GoogleBot'. This tool, as we discussed in our SEO Tips for start-ups, can be a great help diagnosing errors and getting a website in Google's index faster. A robots.txt file is used for crawling efficiency, and preventing certain pages from being crawled etc. Sometimes though, GoogleBot might have difficulty fetching your robots.txt file. Here's a solution from Google to this problem.









The original question asked o the GWT forum had to with crawling inefficiency. The GoogleBot was unable to crawl a robots.txt file 50% of the time, even though the file could be fetched from other hosts with a 100% success rate. It is worth noting that this was on a plain nginx server and a mit.edu host, so that should have a pretty good up-time. So the problem seems to be with Google, right?










Sometimes, people try cloaking on their websites. Cloaking means hiding content from crawlers, so that different content is served to crawlers and users. So what a user might see on their websites might be a lot different than what crawlers such as GoogleBot see. Not only is this a bad SEO practice, it can also have consequences.





During cloaking, people sometimes make a mistake, and end-up reverse-cloaking. So while browsers and user agents see the website fine, crawlers don't see any content. Making such a mistake is like axing your own foot. So this could be one of the reasons to the problem.





As we discussed about at the start, the Fetch as GoogleBot feature in Google Webmaster Tools is a pretty awesome tool. You can use it to fetch your robots.txt file. t will tell you when there's a problem. Many people might not know this, but sometimes, their web hosts might alternate between different systems and hosts. So a 50% success rate might be accounted for one of the hosts being improperly configured. You might want to contact your hosting company about this.





These two could be the most probable causes for robot.txt crawling errors. Did this help? Please do let us know. And stay tuned for more SEO questions and their answers :)

Comments

Popular Posts