Editing KDD Competition 2010
Jump to navigation
Jump to search
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 2: | Line 2: | ||
==Resources== | ==Resources== | ||
* [https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp KDD Rules and Data Format] | * [https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp KDD Rules and Data Format] | ||
* [http://cran.r-project.org/ R language] | * [http://cran.r-project.org/ R language] | ||
Line 20: | Line 19: | ||
* [[Machine_Learning/SqliteImport | Importing data into Sqlite]] for SQL'ing the data | * [[Machine_Learning/SqliteImport | Importing data into Sqlite]] for SQL'ing the data | ||
* [[Machine_Learning/OmniscopeVisualization | Visualizing Sqlite data in Omniscope]] for understanding the data | * [[Machine_Learning/OmniscopeVisualization | Visualizing Sqlite data in Omniscope]] for understanding the data | ||
==TODOs== | |||
Following [https://www.noisebridge.net/wiki/Machine_Learning_Meetup_Notes:_2010-05-23 our decisions]: For orthogonalized bridge and algebra sets: Replace step name with unique step name; remove given features; add features: step success chance, student IQ, complexity, and perhaps frequency of skills (least important). These should be fairly straightforward computations, but on big datasets. We will call the resulting datasets "master raw 1 bridge/algebra" | |||
In parallel we can start clustering superskills: given the normalized skills, cluster groups of skills (=super skills) to replace the too-detailed skills; the resulting datasets will be called "master clustered 1 bridge/algebra". This will be our base datasets for the machine learning algorithms. | |||
* Vikram -- will create a guide for Mahout setup | |||
* Thomas -- | |||
** put together a [[Machine_Learning/kdd_sample | perl script]] which will take random samples from the data, for working on smaller instances | |||
** put together a [[Machine_Learning/kdd_r | simple R script]] for loading the data | |||
* Andy -- Think about clustering superskills; define features for sub-problems (student iq, step difficulty) | |||
* Erin -- will provide the orthogonalized data sets at next meetup | |||
== Notes == | == Notes == | ||
Line 50: | Line 61: | ||
* [http://swarmfinancial.com/screencasts/nb/kddWekaUsage1.swf Screencast1] | * [http://swarmfinancial.com/screencasts/nb/kddWekaUsage1.swf Screencast1] | ||
* [http://swarmfinancial.com/screencasts/nb/kddWekaUsage2.swf Screencast2] | * [http://swarmfinancial.com/screencasts/nb/kddWekaUsage2.swf Screencast2] | ||
== How to run libSVM == | == How to run libSVM == | ||
* See the notes at [[Machine Learning/SVM]] | * See the notes at [[Machine Learning/SVM]] | ||