- Chapter 19 – Web Server Log Analytics
Normally we won’t use web server log file data as the main data source to build our analytics reports. But web server log file data can complement what web analytics tools may have lacked.
What’s in a Typical Web Server Log File?
The advantage of web server log file data is that it doesn’t require tracking pre-installation. Once the web server of your website goes live and is running, it automatically starts recording data.
- When a user visits a page on your website, your web server logs a line of record.
- At the same time when the web page he / she visits has an image, another line of record is logged.
Basically any files that have been triggered to load by a user’s visit to your website, the action is recorded in the log file as a line.
Below is a typical log file record in which a user (with IP address 192.168.22.10) visited your website’s homepage ( / ) successfully (i.e. http status 200). The traffic source is www.google.com, and the user was on Firefox when visiting the page.
192.168.22.10 - - [21/Nov/2003:11:17:55 -0400] "GET / HTTP/1.1" 200 10801 "http://www.google.com/search?q=china+seo&ie=utf-8&oe=utf-8 &aq=t&rls=org.mozilla:en-US:official&client=firefox-a" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:188.8.131.52) Gecko/20070914 Firefox/184.108.40.206"
Issues of Web Server Log Analytics
Log file data has disadvantages.
A website with daily sessions of 100,000 may generate a web log file that is easily more than 30 gigabytes of pre-processed raw data. That will easily become almost 1 terabyte of raw data per month (or 12 terabytes per year). Processing raw data of such large size into human readable reports everyday can be a difficult and time consuming task. It also takes up a large amount of storage resources (i.e. hard disks) to store the raw data (and the processed data).
Search Engine Spider Data in Web Server Log Files
One major advantage of web server log analytics is search engine spider visits are actually recorded by log files. This is the data typical web analytics aren’t able to collect.
Below is a typical log file record when a search engine spider (i.e. Googlebot) visits your website’s page (/a.html).
220.127.116.11 - - [21/Nov/2003:04:54:20 -0400] "GET /a.html HTTP/1.1" 200 11179 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
This part of the line reveals the visit was from Googlebot:
compatible; Googlebot/2.1; +http://www.google.com/bot.html
What We Can Do with Search Engine Spider Data
When dealing with organic search, the traffic funnel is:
Crawl -> Index -> Ranking -> Traffic
Before a search engine can index and rank your web pages, the very first task is to get search engine’s spider to crawl your web pages.
Log File Data Reveals Website’s Issues
In the log files, whether it is a record of a user’s visit, or a record a search engine spider’s visit, the record shows a http status code. Below are some of the most frequently seen http status codes.
- 200 – OK
- 301 – Permanently moved
- 302 – Temporarily moved
- 404 – Not found
- 500 – Internal server error
- 503 – Service Unavailable
In the log file, all the records that returns with http status codes 200 or 300 show no issues. All the records which returns with 404, 500 and 503 may have potential issues that will require attention.
- Chapter 18 – Definitions of Metrics & Dimensions
- Chapter 20 – Skills Web Analysts & Mobile App Analysts Must Have
- Chapter 21 – The Big List of Analytics Tools
Gordon Choi’s Analytics Book has been available since August 2016.
Content on Gordon Choi’s Analytics Book is licensed under the CC Attribution-Noncommercial 4.0 International license.