Last week saw two big announcements about searching graphs. There was Facebook’s graph search which helps you discover, say, married people who like prostitutes. And there was the revelation that people in public DNA projects can be individually identified. These two events are in fact the same thing.
Welcome to the miracle of graphs. Graph is a mathematical term for a network, which is a collection of things (or nodes) with connections (or edges) between them. When we picture a network, like a social network, we tend to think of the nodes being people and the edges being the relationship between them, like friend, relative or colleague.
But that’s not what’s really happening. In graph search the nodes are properties of things and the edges indicate which things have those properties. Say you want to find out which of your friends who grew up in Boston like the NRA. If you had to go to each of their pages, look up their hometown and then, if they’re from Boston, scan their likes, it would take a while. This is what a search would do if it had to look up the people first and then look at their properties. But there’s another way. Start with the properties and let those lead you to the people.
This is what happend when a couple of genetics researchers wondered if they could tie anonymous DNA sequences to the individuals who contributed them. In a paper published in Science, they showed how using free public Internet tools could do just that. The paper describes one of the key steps like this:
“… year of birth and state alone are weak identifiers and searches based on their combination would match at least 60,000 U.S. males in 50% of cases. However, when surname information is added to the search, the median list size shrinks to only 12 males, which are few enough matches to investigate individually.”
This phenomenon is a standard feature of searching a graph of properties. When you start with the properties you care about, you narrow down your target group to only the things that have those properties. You’ve probably used this at an online store. When you use the left-hand navigation at Nike’s online store to get to the men’s running shoes with a barefoot-like ride (15 shoes out of 671 total), that’s graph search.
So, does this matter? Damn straight it does. Our world is full of networks, like financial and supply systems, whose interconnections are not well understood. We need better ways to find “all the non-financial institutions with counterparty risk if Spain defaults on its debts” or “all the semi-conductor manufacturing facilities within 100km of earthquakes greater than 4.0 on the Richter scale in the last ten years.” It’s time for our ability to investigate these networks to match our ability to create them.