Programming Collective Intelligence with Ruby

cover There's a lot of data out there, and paying attention to it in order to make decisions is a good idea. But where do you begin?

I began by browsing through some of the preview chapters (don't think they're up anymore) of Speech and Language Processing. I didn't get far. I also found some of Norvig's reviews on Amazon, one which pertained to this gem (the sell: "But if someone told me I had to make a million bucks in one year, and I could only refer to one book to do it, I'd grab a copy of this book and start a web text-processing company."). After previewing it online, I knew that sans a whip at my back, it wouldn't help me get any better.

What I really wanted was something that was written like this, but with some examples.

I ordered a copy of Toby Segaran's Programming Collective Intelligence after browsing the table of contents and reading some reviews. In 330 pages, the author covers building a link recommendation engine, building a search engine, stochastic optimization (wheeee), spam filters, and genetic programming (list not exhaustive).

What makes the above fun is that in each case you're working with data from sites you're probably very familiar with already (del.icio.us, kayak, ebay, facebook).

Still, much of the book is code, so to get the most out of it, you should really try the examples. You can, of course, download the source code to the book. The examples are written in Python, which concise and readable.

I decided to comprehend the book by writing the examples in Ruby as I went along. While I've flipped through a few of the chapters already, I've only actually worked through Chapter 2 (Chapter 1 is intro stuff).

It took me an embarrassing amount of time to work through Chapter 2, which surprised me. Ruby and Python are similar syntactically, but a few weird bugs in my translation tied me up.

One thing Segaran makes much use of is Python's list comprehensions, which I once attempted to dazzle (nay distract!) my Google interviewers with, to no avail.

Not having list comprehensions in Ruby made things a little rough, but there are good methods in Enumerable that come to the rescue.

I also didn't have the pydelicious library for writing the del.icio.us recommendation engine. It wasn't very hard to implement the necessary functions though. I didn't spend any time looking for a Ruby version, if there is one.

The link to the MovieLens dataset didn't work; it seems to have moved.

I also had some trouble with passing a function as a parameter. I know there are blocks, procs, and beats, but the solution wasn't apparent. I settled on passing a symbol and just picking the right scoring method based on its value.

Interested brethren, you can grab the Ruby code for Chapter 2 <-- there.