Editing
Machine Learning/SVM
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Downloading and Installing LibSVM == * To run LibSVM, you will want Python. If you don't have it installed, you can download and install it from [http://www.python.org/download/]. Either current stable version should work. * You will also need to download and install LibSVM itself, which you can do by downloading the zip file from [http://www.csie.ntu.edu.tw/~cjlin/cgi-bin/libsvm.cgi?+http://www.csie.ntu.edu.tw/~cjlin/libsvm+zip]. Windows users should copy the libsvm.dll file from the /windows directory into their C:\Windows\System32 directory. Mac/Linux users should be able to simply cd into the libsvm directory and run make then add the libsvm directory to your PATH. * If you can, install [[http://www.gnuplot.info/ gnuplot]]. If you have trouble with this (as I did), you can use the modified files with the gnuplot dependency stripped out. Simply replace tools/easy.py with [[Machine Learning/easy.py]] and tools/grid.py with [[Machine Learning/grid.py]] == Converting the Data == As with most (if not all) data problems, choosing and formatting the data is the most time-consuming step but also one of the most important. One approach for reducing the data is to take a subset; you can use Thomas' perl script to take a sample of some number of the training set and test set, by choosing a random subset of the students and only including lines which include them. You can use the perl script [[Machine_Learning/kdd_sample | sample_training.pl]] to do this, by running: perl sample_training.pl -numitems 100 ~/kdd/algebra_2008_2009_train.txt (assuming your data is located in ~/kdd) For SVM, ultimately we need to format the data in two files: a training file and a test file. Each of these will have a numeric class and several numeric predictors. The general format is as follows: <class> 1:<value> 2:<value> 3:<value> ... with an entry (1:, 2:, 3:,...) for each numeric predictor. For example, 0 1:0 2:0 3:0 4:0 5:0 6:1 7:0 8:0 9:0 10:0 11:0 12:0 13:0 14:0 15:0 16:0 17:0 18:0 Thomas created a [[Machine Learning/convert_features.pl | perl script]] to take a training set and convert it (and the corresponding test set) into the correct format by using "correct on first attempt" as the output class and converting student and problem id into a series of binary flag variables (one for each student and problem, indicating whether this class regards this student or this problem). However, this results in a fairly obscene number of predictor variables, even on a stripped-down dataset. So there is almost certainly a better way. But if you don't have one, you can download this script and run perl convert_features.pl ~/kdd/algebra_2008_2009_train.txt_sample_100_random_students.csv Assuming your data files are in ~/kdd, this will generate output files ~/kdd/algebra_2008_2009_train.txt_sample_10_random_students.csv_converted.txt and ~/kdd/algebra_2008_2009_train.txt_sample_10_random_students.csv_converted.t in the appropriate format. == Running SVM on the Data == * cd into your libsvm installation's tools directory and run the following command (assuming your training and test files are in ~/kdd and named appropriately): python easy.py ~/kdd/algebra_2008_2009_train.txt_sample_10_random_students.csv_converted.txt ~/kdd/algebra_2008_2009_train.txt_sample_10_random_students.csv_converted.t | tee output.txt * If you have many predictor variables, this will take a long time. Prohibitively long, probably. * This will automatically scale your training and test data and iteratively search over the parameter space for penalty parameter c and kernel parameters, using cross-validation in order to find the best fit for the training data. * It will generate an output for each item in the test set with a 0/1 classification. Apparently there is a way to get libSVM to output real values between 0 and 1 (depending on confidence), but we haven't yet investigated doing this. == Other references == * [http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf The basic guide] for LibSVM * [http://www.csie.ntu.edu.tw/~cjlin/libsvm/ LibSVM site] * [[Machine_Learning_Meetup_Notes:_2010-04-28 | Noisebridge ML talk on SVMs]]
Summary:
Please note that all contributions to Noisebridge are considered to be released under the Creative Commons Attribution-NonCommercial-ShareAlike (see
Noisebridge:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Log in
Request account
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Dig in!
Noisebridge
- Status: MOVED
- Donate
- ABOUT
- Accessibility
- Vision
- Blog
Manual
MANUAL
Visitors
Participation
Community Standards
Channels
Operations
Events
EVENTS
Guilds
GUILDS
- Meta
- Electronics
- Fabrication
- Games
- Music
- Library
- Neuro
- Philosophy
- Funding
- Art
- Crypto
- Documentation/Wiki
Wiki
Recent Changes
Random Page
Help
Categories
(Edit)
Tools
What links here
Related changes
Special pages
Page information