Archive for the ‘Aggregation’ Category

Presenting data in the proper context to your organization 

When aggregating data for an organization, it is important to present that data in its proper context.  To explain this concept further, imagine that we are asked to aggregate some customer buying habits for a marking group within our organization.

The individual pattern of a single customer of our business is likely not useful for marketing purposes.  Only when this information is aggregated to a higher level, which removes the context associated with the individual customers, does this information become useful.  The problem is that the context removed may be necessary to understand the customer data.  If enough of the aggregated customer data suffers from this same issue, then the context of the aggregated information may lead to false conclusions.

To counteract the necessary loss of context with aggregation, it is important to fully analyze a subset of the raw customer data and determine all possible reasonable aggregations that may be useful.  It is likely that an almost limitless aggregation can be done with a sufficient amount of raw data.  The set of possible questions can be limited by focusing on questions that will be useful for the organization.  Business context is maintained because the data is being generated with the appropriate context.

Another way to maintain context is to make sure that questions asked are within a reasonable context subset.  One useful technique is to try to see any possible questions that counter, oppose, or generalize the information that you are trying to generate.  The following table lists a sample initial question and other possible related questions.


Question Type

Sample Questions
Initial Question How many customers bought X after buying Y?
Direct counter Question How many customers did not buy Y after buying X?
Opposing trend Question How many customers bought Y after buying X
Generalizing the Initial Question How many customers that are similar to customer that bought X or Y did not buy a subset of X and Y?
Counter to the generalizing of the Initial Question How many customers are not like the customers that bought X, Y, or both?

Instead of presenting the marketing group with a single set of data specifying a single set of customers (customer bought X after buying Y), the extended questions can be answered with the processed data which maintains significantly more context from the original data and can give a broader view of the situation.  Maybe product Y drove people to buy product X instead of the assumed X driving Y sales, maybe a large set of customers buying X did not know about Y, or maybe almost everyone that bought X also bought Y so advertising X to customers may likely drive Y sales.  The raw data used to determine the initial question can also be used to answer the other questions, but this contextual information is lost if only a single question is asked.

Every reasonable effort should be made when processing data to maintain the context.  Many times IT groups will strive to give exact answers to questions, which can lead to incorrect conclusions by the organization.  Giving the exact answer to the question is still valuable, but make sure to also give the surrounding context.

Read Full Post »