Category: Big Data

The Services Conundrum

Thomas Piketty’s doorstop, Capital In the Twenty-First Century, is so massive it gave a new name to a classic index of unreadness. But it’s actually really good.

And, it includes the observation that, historically, the services sector has seen lower productivity gains than the industrialized goods sector because services tend to be less sensitive to technological advances.

This is a big deal because, according to the US Bureau Of Labor Statistics, as cited in Mary Meeker’s latest internet trends presentation, services jobs represent 86% of all US jobs, up from 56% a little more than 70 years ago.

What if we’re at the beginning of a long boom in services productivity the way we were at the beginning of a long boom in industrialized goods productivity in the early 1800s?

Happy Bloomsday, Big Data!

Around the world today the literati celebrate Bloomsday by drinking deep of James Joyce’s intoxicating prose. And beer. Lots of literary beer. I remember this day every year because James Joyce taught me to love big data.

Bloomsday commemorates Joyce’s life and his masterwork Ulysses, a massive creation that captures the universal in the specifics of one day: June 16, 1904. We follow the misadventures of Milo Bloom in a decidedly unheroic odyssey that reveals the frailties of language, the desperation of love, the doggedness of uncertainty, and the sobering realization that you have to go through all this sound and fury no matter how smart you think you are. In Ulysses, Joyce single-handedly invents post-modernism. Take that, Internet.

But the most remarkable thing about the book is the way it combines rigorous schema with riotous mess. This is the very heart of big data.

For each of the 18 chapters, Joyce assigned a title drawn from Homer’s Odyssey (like Calypso), a scene (The House), a time, an organ of the body, an art (economics), a color, a symbol (nymph), and a technique (narrative). Then, in each chapter he follows a relatively simple plot while free-associating across the history of the world.

For example, in chapter three, Proteus, Stephen Dedalus, the main character until we meet Bloom, walks along the seashore on his way to teach a history class at the job he loathes. So, he daydreams.

He sees two midwives walking, one carrying a bag. He imagines there’s afterbirth in the bag which makes him think of umbilical cords which makes him think of a cable stretching back through all generations which makes him think of making a telephone call to Adam and Eve, using his nickname, Kinch.

The cords of all link back, strandentwining cable of all flesh… Hello. Kinch here. Put me on to Edenville. Aleph, alpha: nought, nought, one.”

Stephen is playing fast and loose with what’s connected to what in what context. He draws out the attributes and values of the things he sees, finding links in his over-educated mind to other things that possess those same attribute-value pairs or something equivalent. But with each link comes a new surrounding context.  Stephen flashes from one otherwise isolated idea to another through unexpected connections.

This is a lot like memory, a lot like the Internet, and a lot like what happens when you pour a bunch of disparate data into Hadoop and start crunching through correlations.

And this is what I see in big data: a riot of observations that can be linked in new ways to show us connections we couldn’t see before. Some will be priceless, some worthless, some spurious and misleading. But the effort required to figure out which is which is worth it. Because we’re going to have to go through all this sound and fury anyway. Might as well try to understand it better.

Playing, The Numbers

E3, the biggest video game conference in the world, takes place this week in Los Angeles. In addition to raising questions about why violence and mayhem sell so well, it also offers insight into the datafication of play.

Take Destiny, one of — if not the — most expensive games ever produced. If you’re not familiar with this kind of thing, Destiny is a sprawling shoot-em-up that manages to combine the otherwise distinct genres of first-person shooter, multiplayer competition, and collaborative role-play. It’s immersive, visually stunning, and highly addictive.

Created by the powerhouse game studio Bungie, Destiny is a peek into a future where products tell their makers how customers use (and abuse) them.

For example, just days after the launch of an expansion module that pits three-player fireteams against each other, Bungie reported that 3,798,561 of these matches had been played. The players racked up 118,627,301 kills (you can get killed more than once per match). And 299,001 of these folks had achieved perfect scorecards, winning all nine rounds of a match and, by implication, utterly humiliating the other team.

But these raw tallies are just bragging rights. What’s more interesting is Bungie’s observation of players’ behavior, like cheating. The gamemakers could see some players bailing out as soon as they saw tough opponents on the other team. Bungie spread the word through its weekly newsletter that this welching can get you banned from matches if you keep it up. A similar warning went out earlier when Bungie saw that some players were hanging back in certain sections of the game, letting their teammates do all the work but reaping the rewards of victory anyway.

This may sound juvenile and irrelevant. However, this freeloading is a first-person-shooter version of an economic concept called, well, freeloading. Freeloading happens when you get the fruits of someone else’s investment without making the outlay yourself. When shoppers research a product at a retailer’s meticulously well-designed site but then buy from the lowest-priced discounter, the discounter is freeloading on the other retailer’s investment in design, photos, and information.

In Destiny’s case, freeloading is a particular problem because in many cases teamwork is essential to the value of the game. No teamwork, no fun. No fun, no play. No play, no return on the most expensive game ever made.

And this is why digital strategists should play video games. The best ones are complex digital worlds that provide a preview of the real world fully digitized. At least, that’s my excuse.

Writer, Interrupted


My achievements as a procrastinauteur seem to know no bounds. On June 1st, I said I would write something on big data every day for the entire month of June. I managed to keep that up for five days and then fell off the wagon for nearly twice that, going on a non-writing bender.

As penance, let me offer the fascinating work of Matthew Jockers who is using big data to study writers who actually manage to write something. Jockers, a professor of English at the University of Nebraska-Lincoln, created an R package to analyze the connections between plot and sentiment in 50,000 works of fiction. In the process, he’s creating a new way of big reading that enlightens individual reading.

Data Is Not A Commodity

A common refrain among the digerati is that data is a commodity but insight is not. Or wisdom, or judgment — insert your favorite word for good thinking.

It’s true that good thinking is in short supply. That’s why there are escape handles in the trunks of rental cars.


And why Napoleon got his hat handed to him at Waterloo.

But it’s not true that data is a commodity. Yes, there’s a lot of it, which often correlates with commoditization. But data lacks a key attribute of actual commodities — fungibility. You can’t substitute one piece of data for another because they carry different information.

For example, if I ask you what time Mad Max: Fury Road is playing tonight at your favorite movie theater. Only showtime data will answer that question. The 98% approval rating at Rotten Tomatoes or the 89% at Metacritic won’t do it. In fact, those can’t even be substituted for each other.

The non-fungibility of data raises the competitive stakes at companies race to digitize and datafy the key value-creating activities in their industries. Because, while there may be a lot of data out there, the vast majority of it is unique. And if you’re the one to capture it, it’s yours and your rivals would have to turn back time to get it.

There Is No Such Thing As Metadata


The USA Freedom Act, just voted into law, ended the government’s bulk collection of phone call metadata. While people of good conscience can disagree on whether this is a good or bad thing, here’s an uncontestable fact: There is no such thing as metadata.

David Weinberger, Internet polemicist, former joke writer for Woody Allen, and all around fun guy, pointed this out in his great book, Everything Is Miscellaneous. He gave the example of someone looking for plays by Shakespeare.

Imagine you type Shakespeare into Google and get a list back that includes King Lear. You click that, read a bit of the play and come across the famous quote, “How sharper than a serpent’s tooth it is to have a thankless child!” Now imagine you hear that quote one day but you forget where it comes from. You Google “How sharper than a serpent’s tooth” and the search results show you it comes from King Lear  by Shakespeare. In the first case, you used the author’s name as metadata for his plays. In the second, you used a line from the play as metadata for the author. Weinberger sums it up this way:

“In the miscellaneous order, the only distinction between metadata and data is that metadata is what you already know and data is what you’re trying to find out.”

To be fair, Weinberger drew the opposite conclusion from mine. Rather than saying there’s no such thing as metadata because the stuff is, itself, data, he said that everything is metadata because every thing points to something else.

The metafact that both of these facts are true makes the case more effectively than either alone.

A Fireside Chat On Data Capital

Jason Pontin, publisher of MIT Technology Review, is a gentleman and a scholar. His Cambridge, MA office is decorated with meticulous stacks of books worth reading from the past few decades. He’s a man of letters in a world of bits. Totally my kind of guy. 

Yesterday at MIT EmTech Digital we sat down for a brief chat about data as a new kind of capital and what that means for competitive strategy. 

This kind of firesiding can be much more engaging than a straight-up presentation. But it requires a gracious and astute host. Jason is most certainly both.