Sunday, 21 February 2010

Why Analytics Don't Always Work for Companies - Applied vs. Theoretical Statistics

There are many controversial topics actively discussed among business analysts who follow divergent schools of thought.  The most common schools of thought can be categorized into two groups:  the first group being the theoretical statisticians, and the second being represented by those individuals who embrace "applied" statistics.

Generally, the theoretical statisticians apply what they've learned in an academic setting, and follow the "laws" set forth by their institutions.  On the other end of the spectrum, the applied statisticians rely heavily on market testing and key performance indicators (e.g., financial impact) to determine their own set of experientially-based statistical methods and axioms.

Neither school is inherently good or bad.  All seasoned analytic managers have met new analysts who come straight out of school with misconceptions of the value and place for various mathematical procedures and rules.  We've all also faced analysts with significant career experience who have carried their academic theoretical statistical knowledge with them as an unchanging edict, despite the limited (or detrimental) applicability of some of these doctrines in the marketplace.  Similarly, we've all also encountered business-focused "applied statisticians" whose lack of adherence to theory has resulted in unstable strategic analytic products that look great on paper, but fail in practice.

Of all the points of conflict between theoretical and applied statisticians, one of the most heated relates to the utility of the measurement of colinearity in predictive modeling.  In predictive modeling, colinearity is the amount to which two independent variables correspond to the same dependent variable.  It can also refer to the amount a single independent variable corresponds to a dependent variable.

The theoretical statistician will argue that intensively managing colinearity is of great importance in building predictive models.  A few of the arguments they will cite to support this position include that if colinearity isn't removed:

1. We cannot clearly explain the value of each independent variable in the model's predictive algorithm

2. We are endorsing a final product that may not conform with standard mathematical partiality towards a solution that is parsimonious in nature

3. Parameter estimates might be unstable from sample to sample (or from validation to marketplace execution)

The applied statistician will argue that colinearity is not relevant as:

1. We are seeking lift , not explanation.  If the new model makes more money in the marketplace, the ability to explain "why" becomes academic

2. Parameter estimate stability can be enhanced through various exercises during the model build phase

The reality is that both sides may be correct, at specific application points, and in specific situations.  We just need to moderate academic rigor with real-world findings in order to uncover when to implement a rule, when to bend it, and when to discard it.  To address each of the five points (above):

Explaining an individual variable's contribution to a multivariate prediction may or may not have relevance.

*If you are in a market research company, this is a key concern.  You will need to let your clients know not only "what will be," but "why."

*If you are in a direct marketing company, explanation may not be relevant.  As an example, if you work for a catalog company, maximum incremental financial lift is far more important than explaining the "percent of predictive value" driven by individual model components.

Ideally, we want a parsimonious solution as they tend to be more stable.  But, what if you find that your less parsimonious option (having been tested on multiple out-of-time validation samples) is almost identical in stability?  What if, during those same tests you find that it produces a far more robust prediction?  In short:

1. Generally, you will want to favor a more parsimonious solution

2. But, if you have a model that is relatively less parsimonious, but already proven stable and robust, there may not be any additional value in reworking the solution for the sake of a mathematical preference

If you are conducting a model building strategy that does not manage colinearity, but is laser-focused on lift, and you find that your parameter estimates are not stable, a likely cause is inadequate sample size in the build data set.  As a result:

*You can increase your sample size substantially (which will typically eliminate this issue)

For most predictive model applications in industry, lift is the goal.  But you need to be apprised of the perspective of senior management and clients.  Until they are comfortable with your track record, they may require you to explain the nature, source and quantified relevance of each individual variable in your model...and you'll need to provide this explanation in business terms they can understand

Managing parameter estimate instability can't always be achieved:

1. The most common way to reduce model instability (caused by collinear variables) is to increase the build and validation sample sizes.  But, for many organizations, there simply isn't enough data to do this effectively (especially for smaller organizations that are not engaged in direct marketing).

2. Another potential parameter estimate instability cure is to examine each variable and appropriately bin them relative to the dependent variable in question.  Keep in mind, though, that the more you bin, the more you will also be reducing variable information value...and this may end up reducing the overall predictive power of the model.

Overall, the positions held by the "pure" theoretical statistician and the "pure" applied statistician both have strengths and weaknesses that can be demonstrated in actual market testing.  To improve effectiveness, each group needs to move beyond a mastery of one philosophy, and become a pragmatist of both.
If you want to be fast and effective at your work try out the action machine

Alan Gorenstein can be contacted via

More articles at
Brought to you by