Friday, January 26, 2007

Six Degrees of Wikipedia

First of all, I am stealing this idea from Gazebo.

One of the things I study is real-world networks, and one of the major issues is how they are connected, which includes the even harder problem of how to navigate them when you don't know their structure. This field started with a brilliant sociology experiment by Stanley Milgram. Duncan Watts and Steve Strogatz were inspired by this experiment in their pioneering small world networks paper that appeared in Nature in 1998 (and was the subject of Duncan's Ph.D. thesis). This area of research has absolutely exploded since then, and I am one of the participants (though not one of the centrality players---though I do know some of the main people pretty well and have co-authored a couple papers with one of them and had another of them on my thesis committee), so I was obviously very excited to see Gazebo's blog entry. More recently, there was an e-mail version of Milgram's experiment (by Duncan Watts, Dodds, and I think one other person) and, of course, Jon Kleinberg recently won the Nevanlinna prize partly due to his research on network navigation.

As Gazebo writes, here are the rules:

1. Go to Wikipedia.
2. Click the random article link in the sidebar.
3. Open a second random article in another tab.
4. Try to find a chain of links (as short as possible) starting from the first article that leads to the second.


This is exactly the problem of network navigation, with the wikipedia network as the graph one is trying to navigate between one randomly chosen node to another. As with the Milgram and Watts experiments, we are using local information, though with one advantage---we can work both forwards and backwards. (We can't necessarily use backward links, but we can use the information obtained on the end page, which is much more extensive information than the information about the target that was given in the Milgram and Watts experiments.)

Thus, assuming a path exists (which a priori need not be the case, though it clearly usually will be) we want to find one of the paths of minimal path length. This is a hard problem. :)

My starting point was (1) Yvonne_John_Lewis and my target page was (N) Smethwick (UK Parliament constituency). N refers to the fact that this is the N'th page, so I will need N - 1 hops to get there.

I first found a path for which N = 8, but if I were a little more careful, I could have found N = 7 (I'll explain below) and I bet a couple other shortcuts could be found.

Here's what I did (with reasoning included):


2. England (because Lewis is a UK singer and I need a UK parliament constituency)
3. Parliament of England (because I need a UK parliament constituency)

note: parliament of UK didn't seem to be there; that's the one I really wanted

4. Parliament of the UK (that's what I ideally would have done with 3 if the link were there)
5. British House of Commons (perhaps I could have gotten here right from England?)
6. List of Parliamentary Constituencies in the United Kingdom (ok, so this is presumably on this website)

the one I want is in the West Midlands; why in bloody Hell is this not shown on the list of the constituencies in the West Midlands that are tabulated on this website; it should be here!!!! (ok crap, I just checked on the wikipedia entry I want that this was abolished in the 1974 elections... fuck! I should have been working backwards!!!!!)

7. List of former United Kingdom Parliamentary constituencies
8. Smethwick, my goal, so N = 8


I subsequently found a link directly from (4) to (6), reducing things to N = 7. (I checked and there is currently a link from England to British House of Commons, which have gotten me to N = 6.)

Anyway, thanks Gazebo for a very cool time-waster!

2 comments:

ChaosBook said...

I don't know where to stick a comment, so here it goes:
a “perfect” work like the Mona Lisa may be a matter of circular reasoning
http://cornellalumnimagazine.com/index.php?option=com_content&task=view&id=1724

Mason said...

Thanks for the link.

You stuck it on a very old blog entry, so likely I will be the only one to notice it here. :)