Thursday, 14 February 2013

Dealing With Google Webmaster Tools Frustrations

If you don't understand the mechanics behind Google Webmaster Tool (GWT, not to be confused with Google's Web-Toolkit framework) and of page indexing, trying to obtain valid information about your website can be a very frustrating experience, especially if it is a new website. This has even led me to take counter-productive actions in order to solve some of GWT flaws. This post is about sharing some experience and tips.

First, you need to know that GWT is a very slow tool. It will take days, if not weeks to produce correct results and information, unless your website is very popular and already well indexed. Secondly, GWT is obviously aggregating information from multiple Google systems. Each system is producing its own information and if you compare all this information, it is not always coherent. Some of it is outdated or plain out-of-sync.

Understanding The Indexing Process

  1. Crawling - The first step is having Google's bots crawl your page. It is a required step before indexation. Once a page is crawled, the snapshot is stored in Google's cache. It is analyzed later for indexing by another process.
  2. Indexing - Once a page has been crawled, Google may decide to index it or not. You have no direct influence on this process. The delay can vary according to websites. Once indexed, a page is automatically available in search results (says w3d).
  3. Ranking - An indexed page always has a ranking, unless the corresponding website is penalized. In this case, it can be removed from the index.
  4. Caching - It is a service where Google stores copies of your pages. Google confirms it is the cached version of your page which is used for indexing.
There are several reasons why a page may not be indexed, or will have a very low ranking:
  • The page falls under bad SEO practices, which includes keyword stuffing, keyword dilution, duplicate content, or low quality content.
  • The page is made unreachable in your robots.txt.
  • There is no URL link to your page and it does not appear in any sitemap known to Google.
For the sake of simplicity, let's call a clean page "a page which does not fall under bad SEO practices, which is not blocked by your robots.txt and whose URL is known to Google bots via a internal or external links or a sitemap (i.e., it is reachable)".

Is My Page Indexed?

Here is a little procedure to follow:
  • The site: command 
    1. Start by running the site: command against the URL of your page (with and without the www. prefix). If it returns your page, then it is indexed for sure. If not, it does not mean your page has not been indexed or that it won't be indexed soon. The  site: command provides an estimation of indexed pages.
    2. You can use the  site: command against the URL of your website to have an estimation of the pages Google has indexed for your site.
  • The cached: command
    1. If the site: command has returned your page, then the cached: command will tell you which version (i.e. snapshot) it has used (or will soon use) for indexing (or reindexing). Remember there is a delay between crawling/caching and indexing.
    2. Else, if it returned nothing and the cached: command returned a snapshot of your page, it means Google bots have managed to crawl your page. This means indexing may or may not happen soon, depending on Google's decision.
    3. If the cached: command still does not return your page after a couple of days or weeks, then it may indicate that you don't have a clean page.

What Can I Do About It?

Here is another procedure:
  • No confirmation that your page has been crawled
    1. The first step is to make sure your page's URL is part of a sitemap submitted to Google (eventually using GWT for submission). Don't believe that Google will naturally and quickly find your page for crawling, even if it is backlinked.
    2. Double-check that your page's URL is not blocked by your robots.txt.
    3. Add a link to your sitemap in your robots.txt.
    4. Avoid using the GWT's Fetch As Google feature as Google will penalize excessive use with less frequent visits to your site. It does not accelerate the indexing process. It just notifies Google it should check for new/updated content. Google can be a pacha taking its time.
    5. Always prefer submitting a complete and updated sitemap versus using GWT's Fetch As Google feature. You don't need to resubmit a sitemap if its URL is defined in your robots.txt. Search engines revisit robots.txt from time to time.
    6. Take a look at GWT's crawl stats. It will tell you (with a 2-3 days delay) whether Google bots are processing your site.
    7. Double-check that your page is not suffering from bad SEO practices. Such pages can be excluded from the indexing process.
    8. Be patient, it can take days, and sometimes weeks before Google reacts to your page.
    9. Check GWT's index status page, but never forget it reacts very very slowly to changes. If you are in a hurry, you may obtain faster information by running the site: and cache: commands from time to time.
  • Your page is in the cache, but no confirmation of indexation
    1. Double-check that your page is not suffering from bad SEO practices. Such pages can be excluded from Google's index.
    2. If your site contains thousands of pages, Google will often start by indexing only a subset. Typically, it will be those it thinks have a better chance of matching users' search requests. If your page is not part of them, check whether other pages of your site are indexed using your website URL in the site: command.
    3. If, after being patient, your clean page is still not being indexed, then it probably means Google does not find it interesting enough. You need to improve its content first. Next, try to apply more white hat SEO recommendations. Layout design, readability and navigability are often the culprit when content isn't.
  • Your page is in the index, but does not rank well
    1. Double-check that your page is not suffering from bad SEO practices. Such pages can be included in Google Index with a low ranking.
    2. Make sure you are using proper keywords on your page, title and meta description. Perform the traditional white hat SEO optimization tricks. If you got everything right and still don't get traffic, it means users don't find your content interesting or their is too much competition for what your offer.

About New Websites & Under Construction

Because of the slowness of GWT and a lack of understading of its mechanics, I once tried to accelerate the indexing of new websites by first submitting 'under construction' versions, stuffed with relevant keywords. It did not help at all! Not only Google did not index my sites (or with a very bad ranking), once I uploaded the final version a couple of weeks later, Google took weeks to (re)index them properly. Google's cache was soooo out of sync...

I have noticed that Google gives extra premature exposure to new websites to test their success, before letting them float naturally. It also tries to find out how often your pages are updated. With a new website under construction, not only will you fail the premature exposure because there is no valuable content for users, but if there are weeks before you put the first final version of your site online, Google may decide not to come back to your site for weeks too, even if new content is uploaded in the mean time (pretty frustrating). Of course, you can use GWT's Fetch as Google feature, but there is no guarantee it will accelerate the process (at least this is what I observed).

Nowadays, I don't register my websites in GWT prematurely. I wait until a first final version is available for production. Next, I apply all the relevant white hat SEO tricks. Then, I create a proper sitemap and robots.txt. At least, after having uploaded everything in production, I register and submit everything to GWT and monitor the indexation process with GWT's crawl stats, together with the site: and cache: commands, until GWT starts to display coherent data. It has eliminated a lot of frustration and teeth grinding!

3 comments:

  1. Jerome, it seems like you are confusing some acronyms here: GWT is a Java-based Framework to create RIA's (Rich Internet Application). It is totally distinct to the service provided at Google.com/webmasters, though both come from Google.

    ReplyDelete
  2. I have used the GWT acronym to avoid having to write "Google Webmaster Tools" over and over. There is no confusions here.

    ReplyDelete
  3. @technotes but in recent discussion on webmaster forum , John muller said there is no penalty of it

    ReplyDelete