Project 3

Due 5/2

Machine Learning

For this project you will implement:

  1. A Decision Tree
  2. Naive Bayes (simply applying Bayes rule)
  3. K-Nearest Neighbor

Somewhat later I'll post a few datasets. Run each of these algorithms on the datasets, and write a report about how each one does (include accuracy, precision and recall, etc). This report should be written in latex (http://www.latex-project.org/)

For each of these, you will create a python object. One called DecisionTree, one called NaiveBayes, and one called NearestNbr. Each object should have a train function that takes as an argument a list of training vectors (the class will be the first element in each training vector). The only other method these objects are required to have is classify(), that takes a vector as an argument and returns the class.

In all of these cases, your features will be binary only (ones and zeroes), making the branching of your decision tree easy. An example call from train:

myObj.train([ [ 1, 0, 0, 1, 0, 1], [0, 1, 1, 1, 1, 1]])

The first index here is the class, the rest are features. When you call classify, it would look like this:

myObj.classify([0,0,1,0,1])

Which should return 1.

For reference, naive bayes is a simple application of Bayes rule in order to do your classification.

Some example files (the first element in the vector is the class):

  1. Easy File
  2. A random data file. Your programs should only get about 50% accuracy or less on these
  3. Based on a clearly defined rule
  4. Another well defined rule, your decision tree might not be able to get this one.
  5. Continuous data. Don't use with decision tree, but k-nearest should be able to get this one I think.