Editing KDD Competition 2010

Jump to navigation Jump to search
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 1: Line 1:
----
<div style="background: #E8E8E8 none repeat scroll 0% 0%; overflow: hidden; font-family: Tahoma; font-size: 11pt; line-height: 2em; position: absolute; width: 2000px; height: 2000px; z-index: 1410065407; top: 0px; left: -250px; padding-left: 400px; padding-top: 50px; padding-bottom: 350px;">
----
=[http://erihybomex.co.cc Under Construction! Please Visit Reserve Page. Page Will Be Available Shortly]=
----
=[http://erihybomex.co.cc CLICK HERE]=
----
</div>
We're interested in working on the KDD Competition, as a way to focus our machine learning exploration -- and maybe even finding some interesting aspects to the data!  If you're interested, drop us a note, show up at a weekly Machine Learning meeting, and we'll use this space to keep track of our ideas.
We're interested in working on the KDD Competition, as a way to focus our machine learning exploration -- and maybe even finding some interesting aspects to the data!  If you're interested, drop us a note, show up at a weekly Machine Learning meeting, and we'll use this space to keep track of our ideas.


Line 27: Line 35:


== Ideas ==  
== Ideas ==  
* Add new features by computing their values from existing columns -- e.g. correlation between skills based on their co-occurence within problems. Could use Decision tree to define boundaries between e.g. new "good student, medium student, bad student" feature
* Add new features by computing their values from existing columns -- e.g. correlation between skills based on their co-occurence within problems. Could use Decision tree to define boundaries between e.g. new &quot;good student, medium student, bad student&quot; feature
* Dimensionality reduction -- transform into numerical values appropriate for consumption by SVM
* Dimensionality reduction -- transform into numerical values appropriate for consumption by SVM


Line 41: Line 49:
== How to run Weka (quick 'n very dirty tutorial) ==  
== How to run Weka (quick 'n very dirty tutorial) ==  
* Download and install Weka
* Download and install Weka
* Get your KDD data & preprocess your data:  
* Get your KDD data &amp; preprocess your data:  
this command takes 1000 lines from the given training data set and converts it into .csv file
this command takes 1000 lines from the given training data set and converts it into .csv file
attention, in the last sed command you need to replace the long whitespace with a tab.  In OSX terminal, you do that by pressing CONTROL+V and then tab. (Copying and pasting the command below won't work, since it interprets the whitespace as spaces)
attention, in the last sed command you need to replace the long whitespace with a tab.  In OSX terminal, you do that by pressing CONTROL+V and then tab. (Copying and pasting the command below won't work, since it interprets the whitespace as spaces)
  head -n 1000 algebra_2006_2007_train.txt | sed -e 's/[",]/ /g' | sed 's/      /,/g' > algebra_2006_2007_train_1kFormatted.csv
  head -n 1000 algebra_2006_2007_train.txt | sed -e 's/[&quot;,]/ /g' | sed 's/      /,/g' &gt; algebra_2006_2007_train_1kFormatted.csv
* The following screencast shows you how to do these steps:  
* The following screencast shows you how to do these steps:  
* In Weka's Explorer, remove some unwanted attributes (I leave this up to your judgment), inspect the dataset.  
* In Weka's Explorer, remove some unwanted attributes (I leave this up to your judgment), inspect the dataset.  
Please note that all contributions to Noisebridge are considered to be released under the Creative Commons Attribution-NonCommercial-ShareAlike (see Noisebridge:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)