Machine Learning/moa

From Noisebridge
Jump to navigation Jump to search

Setup Instructions[edit]

OR

Training MOA models[edit]

  • Your data will need to be in ARFF format
  • To evaluate the performance of different models, you can run varying prequential classifiers and look at their performance; for example,
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "EvaluatePrequential -l NaiveBayes -s (ArffFileStream -f atrain.arff -c -1) -O amodel_bayes.moa"
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "EvaluatePrequential -l HoeffdingTree -s (ArffFileStream -f atrain.arff -c -1) -O amodel_hoeffding.moa"
  • To actually generate the final model, you can run a command line like the following:
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "LearnModel -l NaiveBayes -s (ArffFileStream -f atrain.arff -c -1) -O amodel_bayes.moa"

Generating MOA model predictions[edit]

To generate predictions for a test set, you will need your test set to be in ARFF format, with the same columns as the training data (including output class; I just set this to all-0's)

To do this, you will also need the moa_personal.jar file in the same directory as your other jar files; you can get all the jar files needed from http://thomaslotze.com/kdd/jarfiles.tgz

You can then run the following (after generating a model using the above steps)

java -cp .:moa_personal.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "EvaluateModel -e BasicLoggingClassificationPerformanceEvaluator -m file:amodel_bayes.moa -s (ArffFileStream -f atest.arff -c -1)" > a_bayes_predicted.txt

This generates a comma-separated file, which contains the item number as the first column and the probability of class 1 (in our case, cfa=1) as the second column

Thomas is going to develop the evaluator to be more general and robust, and hopefully submit it back for inclusion in the main MOA trunk. Right now, it will only work for examples with two classes.

Other Resources[edit]