Twitter, Google, Amazon, del.icio.us etc. all give you a lot of data to play with, all for free. There’s also a lot of textual data available through initiatives like Project Gutenberg. And that, it seems, is just the tip of the iceberg.
I have been wondering how you could use this data for fun. I’m a first year IT student, so I have no knowledge of statistics, machine learning, collaborative filtering etc. My interest in this area was piqued by the book Programming Collective Intelligence by Toby Segaran, and now I want to take a deeper look at what you can do with data. I don’t know where to start. Any ideas?
I have also been pondering whether I should go and buy something like Paradigms of Artificial Intelligence Programming. Is it worth the trip across the city?
7
Answers
Try firing books in different styles from Guttenberg through a Markov Chain generator – there’s one in Perl here to get you started.
Visualizations, do them, share them.
You can make puzzles like hangman games. Or a mashup or try Yahoo pipes to join information.
You can use some of that data to make money (if you’re really good!)
http://www.netflixprize.com/ Netflix has made available an anonymized dataset, and are asking for better algorithms to predict customer choices.
Predict future stockmarket trends from the data. Profit!
If you’re familiar with Python try playing around with the nltk. It has tons of libraries for text mining and even machine learning in general. Try working your way through nltk book.
If you want to start off with a easy AI problem, you might try clustering.
http://en.wikipedia.org/wiki/Data_clustering
You could use it to group flickr images together by tag or something cool like that.