“Add Analytics” is a Terrible Requirement

January 31, 2011 Peter Groves Uncategorized

I’ve seen a number of projects over the years where a company wanted to add analytics software to their bag of tricks for dealing with their carefully collected data. These days I’m seeing “add analytics” as part of more and more marketing pitches by companies selling some form of data analysis software. When projects get funded or products get bought to do something as mushy and open ended as “adding analytics,” it’s a good sign that something is wrong before the project has even started.

This phrase “add analytics” is usually used as a placeholder for an unknown concrete requirement. It’s a symptom of the larger problem that the customer feels the need for better data analysis tools but doesn’t have a deep enough understanding of what’s feasible to know what to ask for. They know what’s possible with the spreadsheets they currently use, but the more powerful tools available are so terrible you (almost literally) need a PhD in Computer Science specializing in artificial intelligence to get anywhere with them.

People believe there is something worthwhile in their data but don’t know what it is or how to look for it. They’re also scared that the next time they look for the root cause of a problem they’ll find out the issue was visible in the data for years but no one knew to look for it. The algorithms needed to solve these problems exist but are not available in a form remotely usable by a typical analyst.

Once a project does commit to a specific algorithm to do a specific type of analysis, a bigger but less obvious problem is the absurd amount of risk. First, the project has the same likelihood of cost overruns if not complete failure of a more straightforward software project. Then there is the very real risk that you’ll look for patterns in data where there are none. The only way to know if interesting patterns are present in your data is to hook everything up, run the analysis, and look at the results. After all, the whole point of the exercise is to look for patterns you can’t see without the aid of a machine, so you can’t start by doing a little exploratory data analysis and deciding if it looks promising enough to proceed.

Many organizations also have such high pressure to justify the cost of the project that people look for ways to spin the results to be something positive no matter what. In academia this phenomena is called “publication bias,” where the pressure to get an encouraging result comes from the fact that only positive results are worth publishing. Similarly, let’s call it “delivery bias” when a corporate project must reach a positive conclusion or the participants risk a career setback.

In the earlier days of my career, I actually ran into this several times – I was told if a certain accuracy metric wasn’t hit by a predictive model, follow on funding wouldn’t be available. Guess what? The higher-ups saw the charts that showed we were consistently hitting our target and the charts that showed there might be a problem didn’t make it into the final report. At the time it seemed like we were setting objective goals and then achieving them because that’s what professionals do. In hindsight, these incentives were really just encouraging us to torture the data until an arbitrary number popped up on the screen. While we actually had a concrete requirement set at the beginning of the project, it was the wrong requirement and the project was still doomed from the start.

This doesn’t bode well for the widespread adoption of advanced analytics technology. The only way to get adequate funding in the short term is if poor results are allowed to be spun into success stories. In the long run, of course, any company that does this will be crushed in the marketplace as the negative effects of misguided decisions accumulate.

There are alternatives to these kinds of projects, of course. They are unappealing. Customers can buy a Business Intelligence tool that delivers analytics results all over the organization using a web interface but only handles simple queries and graphs. They could buy a statistics package and have their analysts use it in much the same way they currently use spreadsheets to load a data set, create some visualizations, and write a report. They could even hire developers and build custom analytics software in house, but you have to fight hedge funds and Google for the developers that know how to do that, so it’s currently rather expensive.

In any event, I’m hesitant to try to make a career of doing project based data analysis projects either inside a company or as an outside consultant. There’s plenty of demand from companies to improve their data analysis process, but project-based consulting is often miserable given the problems above. Most companies that actually fund data mining projects do so because they want to see what it looks like to try. They want to see how much effort it is to get the data from their databases and what the steps are to clean it up and how you select an algorithm and what concrete results look like. For companies that can afford something so speculative, I can certainly walk them through the process. The problem is I can’t claim I’m getting them up to speed on a runway that gets them to a worthwhile destination.

First Post!!!! The Recombination of Labor

Comments are currently closed.

“Add Analytics” is a Terrible Requirement

Recent Posts