Normally we won’t use web server log file data as the main data source to build our analytics reports. But web server log file data can complement what web analytics tools may have lacked.
The advantage of web server log file data is that it doesn’t require tracking pre-installation. Once the web server of your website goes live and is running, it automatically starts recording data.
Basically any files that have been triggered to load by a user’s visit to your website, the action is recorded in the log file as a line.
Below is a typical log file record in which a user (with IP address 192.168.22.10) visited your website’s homepage ( / ) successfully (i.e. http status 200). The traffic source is www.google.com, and the user was on Firefox when visiting the page.
192.168.22.10 - - [21/Nov/2003:11:17:55 -0400] "GET / HTTP/1.1" 200 10801 "http://www.google.com/search?q=china+seo&ie=utf-8&oe=utf-8 &aq=t&rls=org.mozilla:en-US:official&client=firefox-a" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:220.127.116.11) Gecko/20070914 Firefox/18.104.22.168"
Log file data has disadvantages.
A website with daily sessions of 100,000 may generate a web log file that is easily more than 30 gigabytes of pre-processed raw data. That will easily become almost 1 terabyte of raw data per month (or 12 terabytes per year). Processing raw data of such large size into human readable reports everyday can be a difficult and time consuming task. It also takes up a large amount of storage resources (i.e. hard disks) to store the raw data (and the processed data).
One major advantage of web server log analytics is search engine spider visits are actually recorded by log files. This is the data typical web analytics aren’t able to collect.
Below is a typical log file record when a search engine spider (i.e. Googlebot) visits your website’s page (/a.html).
22.214.171.124 - - [21/Nov/2003:04:54:20 -0400] "GET /a.html HTTP/1.1" 200 11179 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
This part of the line reveals the visit was from Googlebot:
compatible; Googlebot/2.1; +http://www.google.com/bot.html
When dealing with organic search, the traffic funnel is:
Crawl -> Index -> Ranking -> Traffic
Before a search engine can index and rank your web pages, the very first task is to get search engine’s spider to crawl your web pages.
In the log files, whether it is a record of a user’s visit, or a record a search engine spider’s visit, the record shows a http status code. Below are some of the most frequently seen http status codes.
In the log file, all the records that returns with http status codes 200 or 300 show no issues. All the records which returns with 404, 500 and 503 may have potential issues that will require attention.
Content on Gordon Choi’s Analytics Book is licensed under the CC Attribution-Noncommercial 4.0 International license.