skip to Main Content

Twitter, Google, Amazon, del.icio.us etc. all give you a lot of data to play with, all for free. There’s also a lot of textual data available through initiatives like Project Gutenberg. And that, it seems, is just the tip of the iceberg.

I have been wondering how you could use this data for fun. I’m a first year IT student, so I have no knowledge of statistics, machine learning, collaborative filtering etc. My interest in this area was piqued by the book Programming Collective Intelligence by Toby Segaran, and now I want to take a deeper look at what you can do with data. I don’t know where to start. Any ideas?

I have also been pondering whether I should go and buy something like Paradigms of Artificial Intelligence Programming. Is it worth the trip across the city?

7

Answers


  1. Try firing books in different styles from Guttenberg through a Markov Chain generator – there’s one in Perl here to get you started.

    Login or Signup to reply.
  2. Visualizations, do them, share them.

    Login or Signup to reply.
  3. You can make puzzles like hangman games. Or a mashup or try Yahoo pipes to join information.

    Login or Signup to reply.
  4. You can use some of that data to make money (if you’re really good!)
    http://www.netflixprize.com/ Netflix has made available an anonymized dataset, and are asking for better algorithms to predict customer choices.

    Login or Signup to reply.
  5. Predict future stockmarket trends from the data. Profit!

    Login or Signup to reply.
  6. If you’re familiar with Python try playing around with the nltk. It has tons of libraries for text mining and even machine learning in general. Try working your way through nltk book.

    Login or Signup to reply.
  7. If you want to start off with a easy AI problem, you might try clustering.

    http://en.wikipedia.org/wiki/Data_clustering

    You could use it to group flickr images together by tag or something cool like that.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search