Around the world today the literati celebrate Bloomsday by drinking deep of James Joyce’s intoxicating prose. And beer. Lots of literary beer. I remember this day every year because James Joyce taught me to love big data.
Bloomsday commemorates Joyce’s life and his masterwork Ulysses, a massive creation that captures the universal in the specifics of one day: June 16, 1904. We follow the misadventures of Milo Bloom in a decidedly unheroic odyssey that reveals the frailties of language, the desperation of love, the doggedness of uncertainty, and the sobering realization that you have to go through all this sound and fury no matter how smart you think you are. In Ulysses, Joyce single-handedly invents post-modernism. Take that, Internet.
But the most remarkable thing about the book is the way it combines rigorous schema with riotous mess. This is the very heart of big data.
For each of the 18 chapters, Joyce assigned a title drawn from Homer’s Odyssey (like Calypso), a scene (The House), a time, an organ of the body, an art (economics), a color, a symbol (nymph), and a technique (narrative). Then, in each chapter he follows a relatively simple plot while free-associating across the history of the world.
For example, in chapter three, Proteus, Stephen Dedalus, the main character until we meet Bloom, walks along the seashore on his way to teach a history class at the job he loathes. So, he daydreams.
He sees two midwives walking, one carrying a bag. He imagines there’s afterbirth in the bag which makes him think of umbilical cords which makes him think of a cable stretching back through all generations which makes him think of making a telephone call to Adam and Eve, using his nickname, Kinch.
The cords of all link back, strandentwining cable of all flesh… Hello. Kinch here. Put me on to Edenville. Aleph, alpha: nought, nought, one.”
Stephen is playing fast and loose with what’s connected to what in what context. He draws out the attributes and values of the things he sees, finding links in his over-educated mind to other things that possess those same attribute-value pairs or something equivalent. But with each link comes a new surrounding context. Stephen flashes from one otherwise isolated idea to another through unexpected connections.
This is a lot like memory, a lot like the Internet, and a lot like what happens when you pour a bunch of disparate data into Hadoop and start crunching through correlations.
And this is what I see in big data: a riot of observations that can be linked in new ways to show us connections we couldn’t see before. Some will be priceless, some worthless, some spurious and misleading. But the effort required to figure out which is which is worth it. Because we’re going to have to go through all this sound and fury anyway. Might as well try to understand it better.