Search Engines, SEO

Create a List of a Domains Indexed Pages in Google

When putting a new web site or redesign project live that previously had a site, it is very important to redirect the indexed pages in Google to the new URL’s. This process is usually done by adding 301 (permanent redirects) to a .htaccess file within the root directory of your web server that contains your web site files.

The reason for doing these redirects is for a combination of reasons, firstly so that visitors from paid, natural results or links are directed to the relevant page and not just given a 404 page. Secondly and more importantly is for the SEO benefits to reduce the amount of links lost and so that Search Engines such as Google and Bing can identify and re-index the new links to replace the current SERP’s.

Identifying Indexed Pages in Search Engines

Now we know the basic reasoning for redirecting old indexed pages to new ones, we can look at how to identify what is currently indexed, which should preferably be done prior to a new site going live. To display your domains indexed pages in your preferred search engine, which in this article is Google as there are some great tricks for exporting the reports later in this article, simply open your web browser and navigate to Google. Once the search engine is open in your browser enter the following in the search box and click the search button:

site:www.yourdomainhere.com

The search results will now be a complete list of all indexed pages for the domain you searched on (replace; www.yourdomainamehere.com with your actual domain name).

Exporting Indexed Pages

The results may only return a handful of results which you can quite easily ctreate your 301 redirects from, but for larger sites, especially E-Commerce sites, you may want to eport or save the results to a CSV or Excel file. Luckily with the help of Google Docs, known now as Google Drive, you can do exactly that.

Navigate to Google Drive, login to your Google account and then create a new spreadsheet document. In the first cell on the document enter the following code, obviously replacing; www.yourdomainnamehere.com with your web sites actual domain name:

=importXml("https://www.google.com/search?q=site:www.yourdomainnamehere.com&num=100&start=1"; "//cite")

This will give you the first 100 results in Google’s index for your domain. To get the next 100 results, simply increase the ‘&start=1’ to ‘&start=100’ and paste this into cell 101 right after the last lot of returned results, full code below:

=importXml("https://www.google.com/search?q=site:www.yourdomainnamehere.com&num=100&start=100"; "//cite")

A maximum of a 1000 results can be pulled using this method, simply by clicking on the last available cell after the current results and incrementally increasing the ‘&start=’ number each time.

Once you have all of the results in your Google Drive, Spreadsheet Document, simply export the file as a CSV file or other type of file for a full list of indexed URL’s. From this exported file you can then systematically go through and add any required redirects to a .htacces file.


29 Responses