Category: Graphs

Flatworms and Erdös with Bacon

Six_degrees_of_separation_02On June 4th this year we’ll celebrate the 15th anniversary of a famous letter to the journal Nature. Start planning your Small World Day festivities now.

In 1998, Duncan Watts and Steven Strogatz wrote about a nifty little effect in networks. Most networks, including the neurons of a flatworm, mathematical paper citations, and Hollywood are neither regularly nor randomly connected. Instead, they have clusters of tightly-connected nodes, linked together by a few cross-cluster nodes, like Paul Erdös or Kevin Bacon. Duncan and Strogatz called these small world networks.

They weren’t the first to think of this idea. That prize goes to a Hungarian novelist from the 1920s, Frigyes Karinthy, who figured we’re all probably connected somehow. They weren’t the first to test the idea. Stanley Milgram, a Harvard psychologist, did that in 1967 with a chain-letter experiment. And they weren’t the first to give this phenomenon a catchy name. John Guare did that in his 1990 play Six Degrees of Separation.

So what, if anything, did Duncan and Strogatz do?

They demonstrated that small world networks are a reliable feature of just about any network, including the electrical power grid in the western US. Since then this effect has been demonstrated in Twitter and Facebook, where people are connected by an average of 4.67 and 4.74 hops, respectively. This  is closer than the 5.2 hops Milgram found among 296 volunteers but, curiously, not by much.

This seems encouraging at first. The world is more connected. People are .39 to .46 hops closer. Peace is at hand. But Guare pointed out the problem with this kind of connectedness: “Six degrees of separation between me and everyone else on this planet. But to find the right six people.”

It’s not really a small world after all. It’s many small worlds, loosely joined (apologies to David Weinberger).

Herb Simon, a pioneer of what we now call behavioral economics, saw this problem at work in human decision making. For Simon, the factors we weigh when making choices are connected. But not every factor is connected to every other factor. He used buying a car as an example. When you shop for a car, you consider factors like how much you make and how you like to live. But you might not consider others, like whether you might move to a different city where you won’t need a car or the relative merits of spending your money on entertaining friends at dinner versus getting the sport package.

Simon summed it up: “We live in what might be called a nearly empty world– one in which there are millions of variables that in principle could affect each other but that most of the time don’t.” Everything’s connected, but we’re constantly looking for the right set of connections to focus on at the moment. Six degrees of separation. But to find the right six.

Why does this matter? Because most of the analytical technology we use to help us make better decisions assumes we know the right factors ahead of time. We build models, predetermining the factors. We fill the models with conforming data. For things we already understand really well, this works. For things we don’t like derivatives trading, not so much.

We need a new set of analytical tools to help us find the right six degrees of separation between possible choices and their potential outcomes. This is not a question of building better models. It’s a question of better exploration of unmodelled connections.

All The World’s A Graph

walrus network visualization copy 3Last week saw two big announcements about searching graphs. There was Facebook’s graph search which helps you discover, say, married people who like prostitutes. And there was the revelation that people in public DNA projects can be individually identified. These two events are in fact the same thing.

Welcome to the miracle of graphs. Graph is a mathematical term for a network, which is a collection of things (or nodes) with connections (or edges) between them. When we picture a network, like a social network, we tend to think of the nodes being people and the edges being the relationship between them, like friend, relative or colleague.

But that’s not what’s really happening. In graph search the nodes are properties of things and the edges indicate which things have those properties. Say you want to find out which of your friends who grew up in Boston like the NRA. If you had to go to each of their pages, look up their hometown and then, if they’re from Boston, scan their likes, it would take a while. This is what a search would do if it had to look up the people first and then look at their properties. But there’s another way. Start with the properties and let those lead you to the people.

This is what happend when a couple of genetics researchers wondered if they could tie anonymous DNA sequences to the individuals who contributed them. In a paper published in Science, they showed how using free public Internet tools could do just that. The paper describes one of the key steps like this:

“… year of birth and state alone are weak identifiers and searches based on their combination would match at least 60,000 U.S. males in 50% of cases. However, when surname information is added to the search, the median list size shrinks to only 12 males, which are few enough matches to investigate individually.”

This phenomenon is a standard feature of searching a graph of properties. When you start with the properties you care about, you narrow down your target group to only the things that have those properties. You’ve probably used this at an online store. When you use the left-hand navigation at Nike’s online store to get to the men’s running shoes with a barefoot-like ride (15 shoes out of 671 total), that’s graph search.

So, does this matter? Damn straight it does. Our world is full of networks, like financial and supply systems, whose interconnections are not well understood. We need better ways to find “all the non-financial institutions with counterparty risk if Spain defaults on its debts” or “all the semi-conductor manufacturing facilities within 100km of earthquakes greater than 4.0 on the Richter scale in the last ten years.” It’s time for our ability to investigate these networks to match our ability to create them.