Machine learning libraries to use for Project Miracle
For the machine learning portion of Project Miracle, I may end up trying several different languages and libraries to see which works best. This will also give me a chance to learn more about the various algorithms and approaches. Here are a few I’m considering:
- R and related libraries: I’ve learned about R through Coursera’s courses Computing for Data Analysis and Data Analysis; I think I have one or more books on R lying around somewhere in either paper or ebook form, but I can’t seem to find them at the moment.
- Mahout: Familiar with this one because of its usage within the Java and Hadoop communities.
- Octave/Matlab: Familiar with this language/environment through Andrew Ng’s machine learning course from both Coursera and iTunes U.
- Breeze from scalanlp.org: My friend Devon is using this library for his Master’s thesis. I’m interested in using Scala, so may have to learn more about this library.
- Weka: Another general purpose machine learning library written in Java. Also appears to be used by the folks at Pentaho.
- MLC++: A machine learning library written in C++ at Stanford; now distributed by SGI.
- Python and related machine learning libraries. Need to research the best options available in this environment.
Any other suggestions?
Looks like quite the list! For Python, I would take a look at http://scikit-learn.org/
Weka is very nice if you are working on the JVM.