Thursday, June 29, 2006

Crawling and navigating the Web using Matlab

Go here and download Cleve Moler's surfer.m file, which takes a url as one of its arguments and starts building a network by navigating the web starting from that page (up through whatever number of unique pages you want). This is very cool for network studies, and one of my collaborators has already adapted the program for a specific data retrieval purpose. [In fact, what he has done has given me an awesome of idea of getting a particular set of baseball data from a particular group of baseball pages in order to study Hall-of-Fame selection using network theory. (I can already see how this will work, so now we just need to do it.) That is going to be so fucking awesome! (Now we just need to find a student to work on this project. Maybe this will come from the students my collaborator is currently recruiting.)]

The script got stuck when running from my web page as a start, so it may require some finagling in terms of URLs to ignore. (I'm going to try running it from .../cover.html instead of .../index.html to see if that fixes it, but I won't be spending any serious time with the finagling, especially when I ought to be doing work instead of writing this blog entry.) I'll try running it from Travis's blog too, unless he wants to start the computation first.

No comments: