Some months ago, I was creating my first robots.txt. I even asked someone to help me make one. After a few months though, I’m a bit more confident in knowing WordPress and so I revisited my robots.txt again and start doing some more research to optimize my search engines traffic even further.
I found out that I still have a lot of duplicated contents that need to be filtered out to get a good SEO (Search Engine Optimization)!
If you don’t know what robots.txt is, it’s the file that search engine bots/crawlers will look first on their visits to your site/blog. The file tells them what to crawl/info to grab and what’s not.
The file robots.txt has to be put on your root site, even if your WordPress is installed on a sub-folder! So since my blog’s URL is http://www.cravingtech.com/ , I still have to put the robots.txt under the http://www.cravingtech.com/ (or your public_html/ folder).
Here is my new robots.txt file: (Feel free to comment about it)
# BEGIN XML-SITEMAP-PLUGIN
# END XML-SITEMAP-PLUGIN
# Google AdSense
# Internet Archiver Wayback Machine
# digg mirror
NOTE: If you want to copy my robots.txt to your WordPress blog, feel free to do so, BUT! This only works if your permalink structure is similar like mine (www……./%posttitle%……. IF your permalink structure has the year or category on it, it will be blocked by this robots.txt configuration! (i.e. the Disallow: /2008/ part)
As always, check whether your posts are accessible by using Google Webmaster Tools.
Once there, go to Tools-Analyze robots.txt.
You should then see your robots.txt contents there. If you’ve just updated your robots.txt file, you may still see the old one. It will be refreshed on the next Google’s crawl which may take a day or two.
Then, test if the crawler bots can access your actual content and can’t access the duplicated contents:
Then, look at the results to see if the bot can access only the actual content.
As you can see, the bot can now only access the actual post content and not the posts on archives, feeds, navigation pages, etc.
Have you re-visited your robots.txt? It’s very important for search engines, especially Google, that you get it right and optimized!