Category: Data Science

Because Data Science

In a famous lecture, the great physicist Richard Feynman defined the key to science.

“If it disagrees with experiment, it’s wrong.”

He added that someone’s fame or credentials can’t make a wrong idea right. If it disagrees with experiment, it’s wrong. Because science.

From the earliest days of science, when it was still called natural philosophy, empirical experimentation was the strongest weapon against the attacks of superstition. But there was always the risk of substituting one dogma for another.

Scientists knew that one experiment was almost never enough. Over time they adopted the reproducibility of results and peer review as hedges against bias or mistakes that might produce false results. But even these methods have their shortcomings.

This matters for the reputation of data science. In a data science world, correlations found through experimentation with data are rumored to be as useful as causality. Take the correlation uncovered by Kaggle that in some used car markets orange cars are more reliable than cars of other colors . Why? Because data science. The true cause behind this is unknown, and assumed to be immaterial.

Let’s grant for a minute that the cause really is immaterial, that buying orange used cars leaves you better off so often that knowing why isn’t worth it. The risk is that these successes lead to a blind faith in correlation that fails when sneaky used car dealers take advantage by painting clunkers to offload them. Sure, this will drag down the correlation, indicating you shouldn’t act on it anymore. But you’ll only find out after it’s too late.

This is nothing more than a restatement of timeless good advice — use the right tools for the job and don’t believe everything you read. But it bears repeating. Because human frailty.