In a recent post, Tim O’Reilly declared data is not the new oil. It’s the new sand. Or the new Oxycontin. Possibly both. This is a clear sign we’ve reached peak data metaphor.
It’s not so much that O’Reilly’s metaphors are wrong or that a different one would be better. It’s that our attempts to understand data in terms of things we already know obscures more than it reveals.
It’s time to recognize that data is unlike anything else in our economic, public, or private lives. To understand data’s pros and cons, we have to think about it on its own merits.
To be fair, sometimes data is like sand where a huge pile of it is highly valuable, but any single piece isn’t worth much. Like browsing history, for example. But sometimes data is like diamonds, each entry a priceless gem. Like your daughter’s browsing history when she was supposed to be in Zoom math class, for instance.
Data can be harmful or helpful like Oxycontin, says O’Reilly, depending on how it’s used. But in the words of the great historian Melvin Kranzberg, “Technology is neither good nor bad; nor is it neutral.” The data we agree can be collected and the uses to which we agree it can be put determines the balance of harm and help it delivers.
But who gets to say what data can be collected and how it can be used? How do we determine the value of a company’s data when value is wholly dependent on use? How do we create overarching rules governing different uses when keeping a dataset proprietary creates a competitive advantage in one case, but sharing it saves lives in others?
These are weighty questions that will take years of debate among activists, businesses, and governments to answer. But to argue well in these crucial discussions, we must see data clearly for what it is.
Data are observations. Each piece of data crystallizes an aspect of a person, place, thing, activity, or event. The accumulation of these observations can produce an extraordinarily detailed mosaic of whatever an observer is watching.
How those observations are used is separate from the observations themselves. Any piece of data can be put to multiple uses, each of which can create different kinds of value.
For example, a diagnostic lab running COVID tests captures test results, allowing each individual to know his or her infection status. The lab may also aggregate anonymized results as part of an overall positivity metric for the county in which those individuals live. In addition, the lab might use those anonymized results to help pharmaceutical companies target their recruitment efforts for vaccine clinical trials at the county level. These three uses require the same core observations.
The very fact that we’re talking about observations implies that there’s an observer and an observed. The two are in a collaboration of sorts. But it’s usually an unequal one. The observer generally has a greater incentive and power to decide what observations get captured and the various ways to use them.
Sometimes, the observed don’t care about this power imbalance. Think of sensors on a gas pipeline or sea lions tagged in the wild.
But sometimes the observed care a lot. Think of warehouse workers whose every action is tracked and analyzed to improve productivity. Or of the growing debate worldwide in which consumers want greater visibility into data about them captured by the digital services they consume.
The reason that clarifying these concepts—observation and use, observer and observed—is so important is that they affect everyone and everything that exists, and every subsequent action that occurs. As a result, digital data is not only the most valuable asset in the world, it is also the most powerful. How we describe it and think about it so we can maximize its benefits and minimize its risks matters a great deal. This is why we shouldn’t reduce the discussion to the level of metaphor. We need to discuss data on its own, real terms.