Editing KDD Competition 2010
Jump to navigation
Jump to search
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 2: | Line 2: | ||
==Resources== | ==Resources== | ||
* [https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp KDD Rules and Data Format] | * [https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp KDD Rules and Data Format] | ||
* [http://cran.r-project.org/ R language] | * [http://cran.r-project.org/ R language] | ||
Line 18: | Line 17: | ||
* [https://www.noisebridge.net/mailman/listinfo/ml Our mailing list] | * [https://www.noisebridge.net/mailman/listinfo/ml Our mailing list] | ||
* [http://www.s3fox.net/ S3Fox] | * [http://www.s3fox.net/ S3Fox] | ||
* [[Machine_Learning/ | * [https://www.noisebridge.net/wiki/Machine_Learning/SVM Thomas' great libSVM writeup] | ||
* [[Machine_Learning/ | |||
* | ==TODOs== | ||
* Vikram -- will create a guide for Mahout setup | |||
* Thomas -- will get libsvm working on the data and put together a "how to" guide for doing so | |||
** put together a [[Machine_Learning/kdd_sample | perl script]] which will take random samples from the data, for working on smaller instances | |||
** put together a [[Machine_Learning/kdd_r | simple R script]] for loading the data | |||
* Andy -- | |||
* Erin -- Will put meeting notes of 5/19 on https://www.noisebridge.net/wiki/Machine_Learning; will work on data transformations and ways to create better representations of the data; will provide the orthogonalized data sets | |||
== Notes == | == Notes == | ||
Line 39: | Line 47: | ||
== How to run Weka (quick 'n | == How to run Weka (quick 'n dirty tutorial) == | ||
* Download and install Weka | * Download and install Weka | ||
* Get your KDD data | * Get your KDD data | ||
this command takes 1000 lines from the given training data set and converts it into .csv file | * preprocess your data: this command takes 1000 lines from the given training data set and converts it into .csv file | ||
attention, in the last sed command you need to replace the long whitespace with a tab. In OSX terminal, you do that by pressing CONTROL+V and then tab. (Copying and pasting the command below won't work, since it interprets the whitespace as spaces) | * attention, in the last sed command you need to replace the long whitespace with a tab. In OSX terminal, you do that by pressing CONTROL+V and then tab. (Copying and pasting the command below won't work, since it interprets the whitespace as spaces) | ||
head -n 1000 algebra_2006_2007_train.txt | sed -e 's/[",]/ /g' | sed 's/ /,/g' > algebra_2006_2007_train_1kFormatted.csv | head -n 1000 algebra_2006_2007_train.txt | sed -e 's/[",]/ /g' | sed 's/ /,/g' > algebra_2006_2007_train_1kFormatted.csv | ||
* The following screencast shows you how to do these steps: | * The following screencast shows you how to do these steps: | ||
Line 51: | Line 59: | ||
* [http://swarmfinancial.com/screencasts/nb/kddWekaUsage2.swf Screencast2] | * [http://swarmfinancial.com/screencasts/nb/kddWekaUsage2.swf Screencast2] | ||
== How to run SVM == | |||
== How to run | |||
* See the notes at [[Machine Learning/SVM]] | * See the notes at [[Machine Learning/SVM]] | ||