What is a good first-implementation for learning machine learning? [closed]
Want to improve this question? Update the question so it foc开发者_JAVA百科uses on one problem only by editing this post.
Closed 8 years ago.
Improve this questionI find learning new topics comes best with an easy implementation to code to get the idea. This is how I learned genetic algorithms and genetic programming. What would be some good introductory programs to write to get started with machine learning?
Preferably, let any referenced resources be accessible online so the community can benefit
What language(s) will you develop in? If you are flexible, I recommend Matlab, python and R as good candidates. These are some of the more common languages used to develop and evaluate algorithms. They facilitate rapid algorithm development and evaluation, data manipulation and visualization. Most of the popular ML algorithms are also available as libraries (with source).
I'd start by focusing on basic classification and/or clustering exercises in R2. It's easier to visualize, and it's usually sufficient for exploring issues in ML, like risk, class imbalance, noisy labels, online vs. offline training, etc. Create a data set from everyday life, or a problem you are interested in. Or use a classic, like the Iris data set, so you can compare your progress to published literature. You can find the Iris data set at:
- http://en.wikipedia.org/wiki/Iris_flower_data_set , or
- http://archive.ics.uci.edu/ml/datasets/Iris
One of its nice features is that it has one class, 'setosa', that is easily linearly separable from the others.
Once you pick a couple of interesting data sets, begin by implementing some standard classifiers and examining their performance. This is a good short list of classifiers to learn:
- k-nearest neighbors
- linear discriminant analysis
- decision trees (e.g., C4.5)
- support vector machines (e.g., via LibSVM)
- boosting (with stumps)
- naive bayes classifier
With the Iris data set and one of the languages I mention, you can easily do a mini-study using any of the classifiers quickly (minutes to hours, depending on your speed).
Edit: You can google "Iris data classification" to find lots of examples. Here is a classification demo document by Mathworks using Iris data set:
http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/classdemo.html
I think you can write a "Naive Bayes" classifier for junk email filtering. You can get a lot of information from this book.
http://nlp.stanford.edu/IR-book/information-retrieval-book.html
Decision tree. It is frequently used in classification tasks and has a lot of variants. Tom Mitchell's book is a good reference to implement it.
Neural nets may be the easiest thing to implement first, and they're fairly thoroughly covered throughout literature.
There is something called books; are you familiar with those? When I was exploring AI two decades ago, there were many books. I guess now that the internet exists, books are archaic, but you can probably find some in an ancient library.
精彩评论