KDD Competition 2010: Difference between revisions

From Noisebridge

Revision as of 23:24, 19 May 2010

We're interested in working on the KDD Competition, as a way to focus our machine learning exploration -- and maybe even finding some interesting aspects to the data! If you're interested, drop us a note, show up at a weekly Machine Learning meeting, and we'll use this space to keep track of our ideas.

Resources

TODOs

Vikram -- will help setting up Hadoop for the rest of us & create a guide for Mahout setup
Thomas -- will get libsvm working on the data and put together a "how to" guide for doing so
- put together a perl script which will take random samples from the data, for working on smaller instances
- put together a simple R script for loading the data
Andy -- will get Weka working on the data and put together a "how to" guide for doing so
Erin -- Will put meeting notes of 5/19 on https://www.noisebridge.net/wiki/Machine_Learning; will work on data transformations and ways to create better representations of the data; will provide the orthogonalized data sets

We will need to make sure we don't get disqualified for people belonging to multiple teams! Do not sign up anybody else for the competition without asking first.

Notes

For KDD submission: to zip the submission file on OSX: use command line, otherwise will complain about __MACOSX file: e.g.: zip asdf.zip algebra_2008_2009_submission.txt

Ideas

Add new features by computing their values from existing columns -- e.g. correlation between skills based on their co-occurence within problems. Could use Decision tree to define boundaries between e.g. new "good student, medium student, bad student" feature
Dimensionality reduction -- transform into numerical values appropriate for consumption by SVM

Who we are

Andy
Thomas
Erin
Vikram

(insert your name/contact info here)

Retrieved from "https://www.noisebridge.net/index.php?title=KDD_Competition_2010&oldid=11258"

@@ Line 3: / Line 3: @@
 ==Resources==
 * [https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp KDD Rules and Data Format]
-* [http://cran.r-project.org/ R]
+* [http://cran.r-project.org/ R language]
 * [http://www.csie.ntu.edu.tw/~cjlin/libsvm/ libsvm]
 * [http://www.cs.waikato.ac.nz/ml/weka/ Weka]
@@ Line 9: / Line 9: @@
 * [[Machine Learning/Hadoop | Hadoop]]
 * [http://lucene.apache.org/mahout/ Mahout -- machine learning libraries for Hadoop]
+* [http://hadoop.apache.org/pig/ Pig language]
+* [http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html Pig Latin Manual]
+* [http://www.cloudera.com/ Cloudera -- see videos for Hadoop intro]
+* [http://github.com/voberoi/hadoop-mrutils Vikram's awesome Hadoop/EC2 scripts]
+* [https://www.noisebridge.net/mailman/listinfo/ml Our mailing list]
 ==TODOs==
@@ Line 17: / Line 22: @@
 ** put together a [[Machine_Learning/kdd_r | simple R script]] for loading the data
 * Andy -- will get Weka working on the data and put together a "how to" guide for doing so
-* Erin -- will work on data transformations and ways to create better representations of the data; will provide the orthogonalized data sets
+* Erin -- Will put meeting notes of 5/19 on https://www.noisebridge.net/wiki/Machine_Learning; will work on data transformations and ways to create better representations of the data; will provide the orthogonalized data sets
 * We will need to make sure we don't get disqualified for people belonging to multiple teams! Do not sign up anybody else for the competition without asking first.
 == Notes ==
-* to zip the file on OSX: use command line, otherwise will complain about __MACOSX file: e.g.:  zip asdf.zip algebra_2008_2009_submission.txt
+* For KDD submission: to zip the submission file on OSX: use command line, otherwise will complain about __MACOSX file: e.g.:  zip asdf.zip algebra_2008_2009_submission.txt
@@ Line 28: / Line 33: @@
 * Add new features by computing their values from existing columns -- e.g. correlation between skills based on their co-occurence within problems. Could use Decision tree to define boundaries between e.g. new "good student, medium student, bad student" feature
 * Dimensionality reduction -- transform into numerical values appropriate for consumption by SVM
+== Who we are ==
+* Andy
+* Thomas
+* Erin
+* Vikram
+(insert your name/contact info here)

KDD Competition 2010: Difference between revisions

Revision as of 23:24, 19 May 2010

Contents

Resources

TODOs

Notes

Ideas

Who we are

Navigation menu

KDD Competition 2010: Difference between revisions

Revision as of 23:24, 19 May 2010

Resources

TODOs

Notes

Ideas

Who we are

Navigation menu

Search