KDD Competition 2010: Difference between revisions

From Noisebridge
Jump to navigation Jump to search
(Created page with 'I'm interested in working on the KDD Competition, as a way to focus our machine learning exploration -- and maybe even finding some interesting aspects to the data! If you're in…')
 
No edit summary
Line 3: Line 3:
==Resources==
==Resources==
* [https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp KDD Rules and Data Format]
* [https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp KDD Rules and Data Format]
* [http://cran.r-project.org/ R]
* [http://www.csie.ntu.edu.tw/~cjlin/libsvm/ libsvm]
* [http://www.cs.waikato.ac.nz/ml/weka/ Weka]


==Ideas==
==Plan for Next Week==
* SVM -- let's actually put into practice what we were just talking about this week!
Next week, after the Hadoop presentation, we'll show each other how to get the tools working on the data (what one needs to download, any data transformations needed, how to produce submission output) and share any insights on the data gleaned so far
* Vikarem -- will present on [[Machine Learning/Hadoop | Hadoop]] next week!
* Thomas -- will get libsvm working on the data and put together a "how to" guide for doing so
** put together a perl script which will take random samples from the data, for working on smaller instances
** put together a simple R script for loading the data
* Andy -- will get Weka working on the data and put together a "how to" guide for doing so
* Erin -- will work on data transformations and ways to create better representations of the data

Revision as of 21:51, 12 May 2010

I'm interested in working on the KDD Competition, as a way to focus our machine learning exploration -- and maybe even finding some interesting aspects to the data! If you're interested, drop me a note, show up at a weekly Machine Learning meeting, and we'll use this space to keep track of our ideas.

Resources

Plan for Next Week

Next week, after the Hadoop presentation, we'll show each other how to get the tools working on the data (what one needs to download, any data transformations needed, how to produce submission output) and share any insights on the data gleaned so far

  • Vikarem -- will present on Hadoop next week!
  • Thomas -- will get libsvm working on the data and put together a "how to" guide for doing so
    • put together a perl script which will take random samples from the data, for working on smaller instances
    • put together a simple R script for loading the data
  • Andy -- will get Weka working on the data and put together a "how to" guide for doing so
  • Erin -- will work on data transformations and ways to create better representations of the data