Google Analytics Sampled Data


Chapter 14

Server

Data sampling can be a major issue. Google Analytics shows sampled data in the reports when appropriate criteria are met.

If a Google Analytics property is collecting an amount of data which has exceeded a single property's data size limit, then this Google Analytics property starts showing sampled data in the reports.

How does data sampling happen?

Data sampling happens when:

  • More than 50,000 unique rows daily from pre-aggregated data show up in one of your reports.
  • More than 500,000 sessions from non-aggregated data are used to compile a report.

When data sampling happens, your reports start losing accuracy in detailed data and Google Analytics may display a message telling you the report is based on sampled data, such as:

“This report is based on 100,000 sessions (10.00% of sessions).”

How can data sampling cause issues?

For example, there were 1,000,000 sessions in your selected date range, Google Analytics took 100,000 sessions (10.00% of sessions) to calculate your report metrics and then multiply by 10 to achieve the totals.

Assume Google Analytics has recorded 10,000 sessions for a particular landing page URL from a total of 1,000,000 sessions. This translates to 1% of all sessions for this particular landing page. With 10% sampling, Google Analytics may randomly select any 100,000 sessions from all the 1,000,000 sessions. Within the 100,000 selected sessions, only 8,000 sessions belong to this particular landing page and this is how Google Analytics reports sessions for the particular landing page.

How can data accuracy be improved on sampled data?

In the Google Analytics reports, you have the option to either increase the sample size for improved accuracy, or decrease the sample size for improved report processing speed. All you have to do is to toggle a slider switch.

If you increased the sampled size, your report will be calculated from a larger sample size of sessions. For example:

“This report is based on 200,000 sessions (20.00% of sessions).”

Is the data sampling issue finally resolved?

With the free version of Google Analytics, sampled data cannot be fully avoided, but can be only minimized.

By toggling the slider switch to increase the sampled data size, your reports may improve accuracy. However, Google Analytics has placed limitation on how much sampled data you will get in your reports. i.e. You cannot fully get rid of sampled data in your reports.

What can be done to reduced sampled data?

You want to limit the amount of sampled data showing in your Google Analytics reports.

One of the ways is to reduce the number of unique URLs by:

  • Consolidate URLs by converting them into all lower cases.
  • Consolidate URLs by using “Exclude URL Query Parameters”.
  • Use only one URL version for a particular web page.

Consolidate URLs by converting them into all lowercase

Consider the two URLs below.

m.example.com/Hotel/List/Shanghai-Hotels/
m.example.com/hotel/list/shanghai-hotels/

One of the URLs has some capital letters, but the other URL has letters all in lowercase. The web browser returns identical page whether you enter the first URL or the second URL.

However, Google Analytics thinks they are two separate URLs. In Google Analytics, they will show up as two separate rows in reports.

This takes up an additional row in your reports which is really unnecessary. When you have many unnecessary rows in your report, your Google Analytics property will quickly reach the daily upper limit of 50,000 unique rows of pre-aggregated data. Therefore, sampled data will quickly come up in your reports.

You can make them becoming one single row in Google Analytics reports by using filters.

View -> Filters -> Add Filter -> Create New Filter
  • Enter “Lowercase URLs” into the Filter Name field.
  • Select the Custom tab, choose Lowercase as the Filter Type, and choose Request URL as the Filter Field.
  • Click Save.

The filter converts any capital letters in all URLs into lowercase.

m.example.com/hotel/list/shanghai-hotels/

Going forward in your Google Analytics reports, you will only see one version of the above URL and it will all be in lowercase.

Consolidate URLs that actually are very similar pages

Consider a hotel booking website which may have a page with URL in which it displays a list of 15 hotels in Shanghai.

m.example.com/hotel/list/shanghai

Typically for a hotel booking business, you display the hotels that are available for a particular check-in-date and a particular check-out-date. To achieve this, many hotel booking websites would have appended parameters and values to the URLs.

m.example.com/hotel/list/shanghai?check-in-date=2015-11-01&check-out-date=2015-11-03
m.example.com/hotel/list/shanghai?check-in-date=2015-11-05&check-out-date=2015-11-06

The two pages with different check-in and check-out date ranges may have slightly different hotels on each of them, but they are essentially the same page. Having dates as values in your URLs, you can easily end up with an indefinite number of URLs. In many cases, it makes it easier to consider all three URLs as the same page and have them reported as one single URL:

m.example.com/hotel/list/shanghai

Under your Google Analytics property, go to:

View -> View Settings
  • In the Exclude URL Query Parameters field, enter the name of parameter which needs to be excluded. If you have more than one parameter that need to be excluded, then enter all the names of the parameters separated by commas.
  • Note that you should not enter question marks (?), ampersands (&), equals signs ( = ), or any other symbols or delimiters into the Exclude URL Query Parameters field.
  • Now click Save.

In the above case, merging the URLs will reduce the number of unique URLs that are going to appear in your Google Analytics reports. This will result in reducing the data size and give your data more rooms before running into the data sampling issue.

Consolidate URLs that actually mean the same page

Consider this case if your website is using multiple URLs for your home page.

m.example.com/
m.example.com/index.aspx
m.example.com/default.html

Practically, you should not be using multiple URLs for a single home page.

  • The different versions of URLs sometimes can be confusing to your users.
  • In your Google Analytics reports, you always end up with three rows reporting your home page's metrics. This will always take up unnecessary rows in your reports and will get your reports to the row limits quicker than normal circumstances.
  • You will have to sum your home page numbers in an unnecessarily clumsy way.

To resolve this, under your Google Analytics property, go to:

View -> Filters -> Add Filter -> Create New Filter
  • Enter “Remove Index and Default” into the Filter Name field.
  • Select the Custom tab, choose Search and Replace as the Filter Type, and choose Request URL as the Filter Field.
  • For the search string, enter (index|default)\.(aspx|html)
  • For the replace string, leave it blank.
  • Click Save.

Going forward in your Google Analytics reports, you will end up seeing only one version of URL:

m.example.com/

Another case with a hotel booking website involves using multiple URL versions of the same page. Consider the list page with 15 hotels in Shanghai. They are two typical ways to represent the same page.

Static URL: http://m.example.com/hotel/list/shanghai
Dynamic URL: http://m.example.com/hotel/list?city=shanghai

In your Google Analytics reports, you only need one of them to appear, and the better option is the first URL (i.e. the Static URL).

Under your Google Analytics property, go to:

View -> Filters -> Add Filter -> Create New Filter
  • Enter “Remove Index and Default” into the Filter Name field.
  • Select the Custom tab, choose Search and Replace as the Filter Type, and choose Request URL as the Filter Field.
  • For the search string, enter \?city\=
  • For the replace string, \/
  • Click Save.

Going forward in your Google Analytics reports, you will end up seeing only one version of URL:

m.example.com/hotel/list/shanghai

If data sampling is a long term problem for your website's data collection, data reporting and data analysis, then consider upgrading to the paid version Google Analytics Premium.

Bad URL consolidation examples

If you make the poor choices to consolidate URLs that should not have been consolidated, then you are going to lose data granularity. Examples of bad choices:

m.example.com/hotel/list/shanghai
m.example.com/hotel/list/shanghai?district=xuhui&brand=hanting
m.example.com/hotel/list/shanghai?district=changning&brand=jinjiang
m.example.com/hotel/list/shanghai?district=baoshan&brand=hanting

All the URLs represent different locations, and it makes no sense to consolidate them into one single URL.


Previous Chapters

Next Chapters


Gordon Choi's Analytics Book has been available since August 2016.







Content on Gordon Choi's Analytics Book is licensed under the CC Attribution-Noncommercial 4.0 International license.

Gordon Choi's Analytics Book

Gordon Choi's Other Books:
The China Mobile SEO Book
Mobile Website Book