Category: Principles

Another Convert To Bigdataism

IMG_0016Tom Davenport, research fellow at the MIT Center for Digital Business, makes a confession on page two of his new book Big Data At Work. He was a big data skeptic. A doubter. In fact, he says, he considered taking his previous books and just doing a global replace of analytics with big data and publishing the result. Davenport says he’s kidding about that last bit, but one of those books was actually called Analytics At Work, so it can’t be too far off the mark.

But this begs a larger question — why? Why would a man who preaches the gospel of analytics for a living doubt big data? Davenport blames technology vendors and their puffery. He says that big data evangelists (full disclosure: that’s what I do for a living) have oversold the idea’s potential.

This may be true. Any industry with winner-take-all economics is constantly tempted to promise too much. Let me pause for a moment of soul-searching.

Ok, done. I am squarely in the big-data-is-a-big-deal camp. But let me make my argument plain so you can judge it for yourself:

1. The scientific method is a good way of dealing with a complicated world. Systematically observing the world, analyzing what you saw, and applying what you learned helps you make decisions and take actions that leave you better off.
2. The cheaper something is, the more of it you get. As the cost to observe the world, analyze the observations, and use what you learned falls, more people will do this more often.
3. Digital technology makes the scientific method orders of magnitude cheaper than it used to be, encouraging its use in orders of magnitude more situations.
4. Therefore, the world will make a digital copy of itself and infuse data-driven decisions and actions into countless daily activities in commercial, civic, and private life. This is a change of such degree that it’s a change in kind, and we call it big data.

There you have it –the big data creed. It’s not about data formats, technologies, or statistics. It’s about a disruptive drop in the time, cost and effort to know your world empirically.

And Davenport uses another 200 pages to show his orthodoxy.

Because Data Science

In a famous lecture, the great physicist Richard Feynman defined the key to science.

“If it disagrees with experiment, it’s wrong.”

He added that someone’s fame or credentials can’t make a wrong idea right. If it disagrees with experiment, it’s wrong. Because science.

From the earliest days of science, when it was still called natural philosophy, empirical experimentation was the strongest weapon against the attacks of superstition. But there was always the risk of substituting one dogma for another.

Scientists knew that one experiment was almost never enough. Over time they adopted the reproducibility of results and peer review as hedges against bias or mistakes that might produce false results. But even these methods have their shortcomings.

This matters for the reputation of data science. In a data science world, correlations found through experimentation with data are rumored to be as useful as causality. Take the correlation uncovered by Kaggle that in some used car markets orange cars are more reliable than cars of other colors . Why? Because data science. The true cause behind this is unknown, and assumed to be immaterial.

Let’s grant for a minute that the cause really is immaterial, that buying orange used cars leaves you better off so often that knowing why isn’t worth it. The risk is that these successes lead to a blind faith in correlation that fails when sneaky used car dealers take advantage by painting clunkers to offload them. Sure, this will drag down the correlation, indicating you shouldn’t act on it anymore. But you’ll only find out after it’s too late.

This is nothing more than a restatement of timeless good advice — use the right tools for the job and don’t believe everything you read. But it bears repeating. Because human frailty.