0115 964 8205

It is very important to stay on top of the URLs that Google and other search engines have in their indexes to ensure that removed pages are correctly redirected to appropriate, relevant content.

To maintain external links and social shares to removed content, along with reducing 404 errors and potentially losing visitors, the implementation of 301 redirects is a must for every website. This is why we need to employ the methods detailed in this post to identify, monitor and maintain the indexed pages within search engines.

How to Identify Indexed Pages

The simplest method to identify the pages and URLs that a search engine such as Google or Bing has indexed for your domain is to use the following search modifier.


site:yourdomain.com

Please replace ‘yourdomain.com’ with the domain that you wish to return results for.

You may wish to append your subdomain extension to return specific subdomain results. The example below would return results for the specified subdomain.


site:subdomain.yourdomain.com

This search modifier queries the search engine’s index and returns all of the specified domain’s URLs in the search results.

List & Export Indexed URLs to a Google Drive Spreadsheet

Now that we have an idea of what URLs are indexed for your specified domain, it would be useful to export this data into a document so that the data can be utilised.

The simplest method is to use the XPath syntax to pull the data into a Google Drive (formerly Google Docs) spreadsheet, which can be done by following the steps below.

Firstly, sign in to Google Drive and create a new spreadsheet document.

Secondly, in the first cell within the document (A1), input the following query, replacing ‘yourdomain.com’ with your specific domain name.


=importXml("https://www.google.com/search?q=site:yourdomain.com&num=100&start=1", "//h3/a/@href")

If the above query returns an error such as #NA or Google could not retrieve URL, then adjust the query to use a http:// lookup rather than a https:// lookup, as below.


=importXml("http://www.google.com/search?q=site:yourdomain.com&num=100&start=1", "//h3/a/@href")

This will return the first 100 results within the Google index for your specified domain name. If you have more than 100 URLs indexed, then in the first cell on line 101 (A101) input the amended query, replacing ‘start=1‘ with ‘start=101‘.


=importXml("https://www.google.com/search?q=site:yourdomain.com&num=100&start=101", "//h3/a/@href")

If the above query returns an error such as #NA or Google could not retrieve URL, then adjust the query to use a http:// lookup rather than a https:// lookup, as below.


=importXml("http://www.google.com/search?q=site:yourdomain.com&num=100&start=101", "//h3/a/@href")

Repeat this step as many times are required to pull out all of your sites indexed URLs. Please be aware that there is a limit of 1000 URLs that can be imported.

As you may have noticed, the returned URLs have various query strings appended to them, so our final step is to clean these up so that we are left with a clean list of pages.

The final step is to paste the following code into the first cell in our second column (B1).


=mid(A1,search("?q=",A1)+3,search("&sa=",A1)-(search("?q=",A1)+3))

As you can see, we are now left with the page’s full URL, which is indexed. To extrapolate this data for all of the other rows in our spreadsheet, simply click on the small blue square of the highlighted cell (B1) and drag it down to your last row that contains data.

You should now have a spreadsheet containing all of your chosen domains’ URLs that Google has in its index.

With this data in a spreadsheet sheet it makes life easier to test URLs in a browser and discover broken links that can then be redirected with relevant 301 redirects.

Share

Author Biography

Mathew


A 14 year industry veteran that specialises in wide array of online marketing areas such as PPC, SEO, front end web development, WordPress and Magento development.

Accredited Google Partner & Bing Ads qualifications, BA (Hons) in Digital Marketing. One half of the Director duo at Kumo.