Magento, SEO

Magento robots.txt File

Magento is a very popular e-commerce platform, increasing in popularity for online retailers to develop their retail stores on. It has some great SEO features available, such as a sitemap.xml generator and canonical URL meta tag generation.

This is great, but referencing your sitemap to tell search engine robots its location is something you need to manually do by creating a robots.txt file. Also you may not want particular URL’s to be indexed, so disallowing these in your robots.txt file can also help with your seo strategy.

Disallow URL’s in Magento

The first declaration within the robots.txt file is the following line for the visiting user-agent.

User-agent: *

This statement is stating that all user agents need to follow the forthcoming rules. If you wished to allow certain user-agent’s to index particular pages, then you should create separate declarations for each.

The next stage is to the Magento store directories that we do not want to be indexed, starting each statement with the ‘Disallow’ declaration.

The first block of declarations are to stop Magento specific directories being indexed, along with the files and subdirectories contained in them.

# Directories Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /js/ Disallow: /lib/ Disallow: /magento/ #Disallow: /media/ // *Remove or comment this directory if you require Google Merchant Centre Feeds to access product images Disallow: /pkginfo/ Disallow: /report/ Disallow: /scripts/ Disallow: /shell/ Disallow: /skin/ Disallow: /stats/ Disallow: /var/

*Please note that if you are generating and submitting a product feed to Google Merchant Centre then you should remove the /media/ directory from this file so that product images can be accessed and used for this purpose.

The next set of declarations in the Magento robots.txt file is to disallow specific clean URL’s to specific pages that you do not want to be indexed, many of these are to prevent issues with duplicate content. There are also some statements that disallow the checkout and account related URL’s. If you have any specific page URL’s that you do not want Search Engines to index, then also add them here.

# Paths (clean URLs) Disallow: /catalog/product_compare/ Disallow: /catalog/category/view/ Disallow: /catalog/product/view/ Disallow: /catalogsearch/ Disallow: /checkout/ Disallow: /checkout/onepage/ Disallow: /checkout/onepage/billing/ Disallow: /checkout/onepage/shipping/ Disallow: /checkout/onepage/shipping_method/ Disallow: /checkout/onepage/payment/ Disallow: /checkout/onepage/review/ Disallow: /checkout/onepage/success/ Disallow: /onestepcheckout/ Disallow: /control/ Disallow: /contacts/ Disallow: /customer/ Disallow: /customize/ Disallow: /newsletter/ Disallow: /poll/ Disallow: /review/ Disallow: /sendfriend/ Disallow: /tag/ Disallow: /wishlist/ Disallow: /example-page.html

The next bulk of disallow statements in our Magento robots.txt file are to exclude specific Magento files that are in the root directory. Please note that the Licence files for Magento should not really be present, but many Magento Developers (including myself) generally forget to remove the files when moving from development to live environments.

# Files Disallow: /cron.php Disallow: /cron.sh Disallow: /error_log Disallow: /install.php Disallow: /LICENSE.html Disallow: /LICENSE.txt Disallow: /LICENSE_AFL.txt Disallow: /STATUS.txt

The final stage of our Magento robots.txt file is to put a few statements that firstly disallow our included and structural file’s by type, such as .js, .css and .php files. The second part of these disallow statements is to stop our paged URL’s, search result URL’s and pager limit URL’s that are dynamically generated by Magento when refining results are not indexed.

# Paths (no clean URLs) Disallow: /*.js$ Disallow: /*.css$ Disallow: /*.php$ Disallow: /*?p=*& Disallow: /*?SID= Disallow: /*?limit=all

Magento Sitemap Reference

The final stage is to reference your sitemap.xml or .gz so that a visiting bot detects your files location. Simply add the following line at the end of your robots.txt file, changing the URL to your own.

Sitemap: http://www.yourdomain.co.uk/sitemap.xml

Complete Magento robots.txt File

Your robots.txt file should now be complete with exclusions for directories, paths with clean URL’s, page URL’s, files, paths without clean URL’s and referenced sitemap.xml (your domain requires adding to the sitemap reference). You can download a complete demo version of the robots.txt file here: Full Magento robots.txt file download.

Always be aware that using this type of file will inform Googlebot and other Search Engine robots to exclude the URL’s specified in the file and should be used by a professional SEO with previous knowledge.


5 Responses

  1. Jon
    January 20, 2013

    This is a great post. A little more detailed would of been appreciated. Thanks!!!

    Reply
    • Porter
      January 21, 2013

      The post is more of a resource to be used by SEO’s that need a quick robots.txt file for Magento projects. Feel free to ask for assistance if required.

      Reply
  2. Raleigh Leslie
    February 26, 2013

    Exactly what I was looking for. Will implement on my site and report back.

    Reply
  3. Camera
    May 1, 2013

    Will remove my duplicate content these setting? I want to have the best robots.txt on my site so.. do you think that these rows are the best?

    Reply
    • Porter
      May 1, 2013

      Just implementing this robots.txt file will not help with any current, duplicate content issues unfortunately. Implementing a correct nofollow structure on your navigation, along with canonical meta tags can help with the issue (to name but a few methods) if for instance a product is in multiple categories.

      Reply

Leave a Reply