Wednesday, March 17, 2010

Creating a Variable Definitions Document for Web Analytics

Implementing an analytics program can eventually involve many, many dimensions of data that need to be tracked and involve multiple groups of people. This article outlines how to create a Data Definitions document to help everyone understand what information is available, what the information means, what variable it is kept in, and provides a set of terms and names so everyone has a common understanding. The document is guaranteed to save hours and hours of time, effort, and massive amounts of misunderstanding.

This information collected from a Web site for business analysis is more than a random set of values here and there. The data lives within the context of a defined information architecture and is highly structured. The Data Definitions document is a list of all the metrics that are collected about your site and information about that data. That information is called Meta Data; information about the data. This document may be owned by Marketing, IT, the Analytics group, the Information Architect, or some other group within the organization. It is particularly important for large organizations with many people and for Web sites collecting large sets of data about their pages.

There are several levels of information within the document that I will discuss. At the highest level, we list the data dimensions that are collected, what are they called (and there may be multiple names if multiple systems use them), and what do they mean. Then, for each dimension (also known as a variable), you will want to provide information about what type of data is and is not contained in the variable, how it is intended to be used, what the values are expected to look like, how the information is created, and who decides the values that will be used.


The first step in creating your document is to list all the variables you collect about your site. A variable is just a term that designates a container (the dimension) into which data values are passed. Below are a few examples for a Media site. These are the “Business Names” for the variables and they should be short and matter-of-fact descriptive. (No marketing buzz words, in other words.)

Business Name
Page Name
Site Name
Content Hierarchy
Page Type
Module + Link
Sponsor ID

Collecting this list should be a relatively straight forward task. The list of variables will be in your tracking system. For example: Omniture, Unica or your own proprietary application. If you do not know what they are already, you’re Analytics Product Manager or your IT department should be able to provide them, even if they don’t yet have sensible Business names.

Variable System Names

The variable names that your systems know and that your Development staff knows may not be the Business names. To make sure everyone knows what is being talked about, these system names should be included for each variable. This provides an essential mapping across various systems and individuals (think of it as a kind of Rosetta Stone). This information will come from your System Managers and Developers.

In the example below, the first column is Omniture based, the second is for internal scripting, and the third is for the CMS (Content Management System). The systems you use will likely be different, but you get the concept.

Omniture Field Name De Field CMS Business Name
Account s_cid Account
pageName s_pn Documentum: Page Name (de_w_nm) Page Name
pageType s_er Error Page Indicator
channel s_chn Undefined
Hier1 s_hier1 Content Hierarchy
Prop1 s_site Site Name
Prop2 s_subject

s_cn is a deprecated value.
Documentum: Primary Subject Code (de_r_prm_id) Subject


The heart of this document is information about what the data means. This goes a long way to make sure everyone within the organization is on the same page about what is being tracked and why. More specifically, each variable should have the following information:

  1. What is it?

    Provide a brief description of the type of information the variable is supposed to contain. You should be clear and concise. Make sure you are not sacrificing clarity for trivia that adds no real value. Direct and succinct declarative sentences are almost always better.

  2. How is it used?

    This is a high level description of how the Business will use the information, the reason for all the effort involved to collect it. Again, you should be clear and concise. If you can’t identify specific action items that will be taken as a result of knowing this information, it may only be data noise and not worth collecting.

  3. The format of the values.

    Sometimes the values have a specific format or form they follow. Provide the expected format. This will help both the Developers understand what is expected and the Data Consumers to understand what to expect.

  4. Example values.

    Provide some example values. This goes a long way to helping everyone understand the data. Don’t underestimate the clarity a good example can provide.

  5. Data population rules.

    Values are often set when certain conditions exist or there are rules for how the value is determined. Provide a brief description of these rules so everyone has a clear understanding of what is being tracked.

  6. How is the value collected?

    Values can be set by different systems and in different ways or even manually. Indicate where the value comes from and what person or group owns the system.

  7. Who decides the value?

    Identify who or what group decides what a given value will be. It may be a system that determines the specific value or an individual. This will provide the go-to people for questions about the value.

  8. Related documentation.

    Often there is more information in other documentation; information that is too big or tangential for the definitions document but that staff should be aware of. Add a link to that documentation.

The various items in this list will likely come from multiple sources within your organization. For example, the Business Managers, Marketing, your Analytics Department, or your Developers.

Here are two examples of descriptions:

Simple Example 1:


Publication Source identifies the publication company and the product line for third party content. It is used by Editorial to track the effectiveness of content and manage the third-party contractual obligations.

Format: [companyName]-[publicationSource]

Example values: “timewarner-cookinglight”, “mrthstwrt-everydayfood”

It is a concatenation of two values passed from Documentum: Company Name and Publication Source. The values are separated by a “-“. If no value is passed, the beacon will pass a value of “ntc”. The beacon will change all values to lower case.

The values are determined by the Affiliate group and selected in the CMS to be set on the page.

Complex Example 2:


This variable can contain several different values related to Boards, Blogs, or similar Community applications and depending on the site.

Main Site:

Boards: This is the Alpha-Numeric code identifying the message board action taken by the visitor.

Format: command=[command id]

Example Values: “command=view_thread_summary”, “command=view_category_folder”, “command=read_thread&threadid=8964bb9c”

Examples of board actions include: Post, Edit, View, etc. The individual thread views (read_thread) are identified by individual thread ID. The other commands are roll ups across all threads

The values are passed from the WebCrossing system and the values are system values.

Blog: The variable contains the name of the blog and is a roll up of several blog pages. (Note the individual pages are passed in Page Name).

Format: blog:[blog-name]

Example values: “blog:all-rabbits”, “blog:life-with-dogs”, “blog:sexual-health-of-livestock”

The values are passed from the WebCrossing system and the values are determined by Editorial staff, specifically the blog moderator.

Pet Health Community: The variable contains the identifier of an individual thread.

Format: blog: ph-[threadID]

Example value: ph-dis-147/3

The values are passed from the WebCrossing system and the thread ID is system determined.

Lifestyles Site:

Boards: This is the code identifying the message board action taken by the visitor.

Format: command=[command id]

Example Values: “command=view_discussion”, “command=view_folder”, “command=reply_to_message”

The command values are unique to this implementation and will not collide with Main Site values. The values are passed from the WebCrossing system and are system determined.

Blog: The variable contains the name of the blog and is a roll up of several blog pages. (Note the individual pages are passed in Page Name).

Format: blog:[Vanity-uri]

Example values: “blog:thedifferential”, “blog:runningbackwards”, “blog:rblakeley”, “blog:worsthomerecipies”

Each time a blog page name is passed into the Page Name variable, pass the Vanity URI into this variable. The vanity uri is the vanity URL without the domain.

The values are passed from the Web Crossing system. The vanity URI is derived by the system from the user entered blog name.

Putting It Together

Here is how all our example columns will lay out:

Omniture Field NameDe Field CMS Business Name Definition
Value Value Value Value Value

This can become a big document. I have one on legal sized paper that runs 29 pages, printed on both sides. It takes some effort to create and maintain this document as the Business and resulting tracking needs evolve. But you can see how useful it is to have a Data Definitions reference document. It is well worth the effort. If your organization does not already have one, you should make the effort to create one or have one created.

Lastly, once created, be sure it is put in a place on your network that everyone can find and access it.