Machine Learning/moa

From Noisebridge
< Machine Learning(Difference between revisions)
Jump to: navigation, search
(Generating MOA model predictions)
 
Line 1: Line 1:
 
==Setup Instructions==
 
==Setup Instructions==
 +
* Download and unzip http://thomaslotze.com/kdd/moa_prep.tgz
 +
OR
 
* Create a directory to run your moa programs from; we'll assume it is ~/moa
 
* Create a directory to run your moa programs from; we'll assume it is ~/moa
 
* Download the moa release .tar.gz file from http://sourceforge.net/projects/moa-datastream/ and extract it
 
* Download the moa release .tar.gz file from http://sourceforge.net/projects/moa-datastream/ and extract it

Latest revision as of 20:14, 16 June 2010

Contents

[edit] Setup Instructions

OR

[edit] Training MOA models

  • Your data will need to be in ARFF format
  • To evaluate the performance of different models, you can run varying prequential classifiers and look at their performance; for example,
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "EvaluatePrequential -l NaiveBayes -s (ArffFileStream -f atrain.arff -c -1) -O amodel_bayes.moa"
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "EvaluatePrequential -l HoeffdingTree -s (ArffFileStream -f atrain.arff -c -1) -O amodel_hoeffding.moa"
  • To actually generate the final model, you can run a command line like the following:
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "LearnModel -l NaiveBayes -s (ArffFileStream -f atrain.arff -c -1) -O amodel_bayes.moa"

[edit] Generating MOA model predictions

To generate predictions for a test set, you will need your test set to be in ARFF format, with the same columns as the training data (including output class; I just set this to all-0's)

To do this, you will also need the moa_personal.jar file in the same directory as your other jar files; you can get all the jar files needed from http://thomaslotze.com/kdd/jarfiles.tgz

You can then run the following (after generating a model using the above steps)

java -cp .:moa_personal.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "EvaluateModel -e BasicLoggingClassificationPerformanceEvaluator -m file:amodel_bayes.moa -s (ArffFileStream -f atest.arff -c -1)" > a_bayes_predicted.txt

This generates a comma-separated file, which contains the item number as the first column and the probability of class 1 (in our case, cfa=1) as the second column

Thomas is going to develop the evaluator to be more general and robust, and hopefully submit it back for inclusion in the main MOA trunk. Right now, it will only work for examples with two classes.

[edit] Other Resources

Personal tools