Editing Machine Learning/Datasets

Jump to navigation Jump to search
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 1: Line 1:
Machine learning is a vast field and there are many different types of problems to be solved. If you find a dataset interesting, try to categorize it (or add a new category) and add it to the links below.
This page describes in detail the datasets used for the [[NBML Course]].
 
===Classification===
*[http://yann.lecun.com/exdb/mnist/ MNIST Handwritten Digits]
**Classify handwritten digits using this dataset, a very popular one with lots of training examples.
*[http://archive.ics.uci.edu/ml/datasets/Heart+Disease Heart Disease]
**Predict whether a person will have heart disease based on a subset of 76 factors.
*[http://archive.ics.uci.edu/ml/datasets/Census-Income+%28KDD%29 Census Income]
**Try to predict whether a person has an income greater than or less than 50k
 
===Regression===
*[http://www.sci.usq.edu.au/staff/dunn/Datasets/Books/Hand/Hand-R/alps-R.html Boiling point in the Alps]
**The boiling point of water at different barometric pressures.
*[http://www.sci.usq.edu.au/staff/dunn/Datasets/Books/Hand/Hand-R/shocking-R.html Shocking Rats]
**How does shocking a rat affect it's ability to complete a maze?
*[http://www.sci.usq.edu.au/staff/dunn/Datasets/Books/Hand/Hand-R/icecream-R.html Ice Cream Sales]
**Predict the quantity of ice cream consumed based on some other variables.
*[http://www.sci.usq.edu.au/staff/dunn/Datasets/applications/health/fev.html Smoking and Respiratory Function]
**How does smoking affect lung capacity?
 
===Time Series===
*[http://robjhyndman.com/tsdldata/data/ausgundeaths.dat Gun-related Deaths in Australia]
**"Deaths from gun-related homicides and suicides and non-gun-related homicides and suicides. Australia: 1915-2004. Source: Neill and Leigh (2007)."
*[http://robjhyndman.com/tsdldata/data/immig.dat Immigration Rates]
**"Annual immigration into the United States: thousands. 1820 – 1962. From Kendall & Ord (1990), p.13."
*[http://robjhyndman.com/tsdldata/roberts/beards.dat Percent of Men with Beards 1866-1911]
**"Percent of Men with full beards, 1866 – 1911. Source: Hipel and Mcleod (1994)."
*[http://robjhyndman.com/tsdldata/roberts/velmon.dat Velocity of Money in America 1869-1960]
**The [http://en.wikipedia.org/wiki/Velocity_of_money velocity of money] is basically the number of times a single unit of money changes hands over a period of time.  Theory goes, MV=PY, or Velocity = Prices * Economic Output / Quantity of Money.
*[http://robjhyndman.com/tsdldata/annual/globtp.dat Changes in Global Air Temperature 1880-1985]
**"Surface air temperature change for the globe, 1880-1985, Temperature change actually means temperature against an arbitrary zero point. From James Hansen and Sergej Lebedeff, "Global Trends of Measured Surface Air Temperature", `Journal of Geophysical Research`, Vol. 92, No. D11, pages 13,345-13,372, November 20, 1987."
*[http://robjhyndman.com/tsdldata/data/earthq.dat Number of Earthquakes per Year 1900-1988 (>= 7.0)]
**"Source: National Earthquake Information Center. Different lists will give different numbers depending on the formula used for calculating the magnitude."
 
===Clustering===
*[http://archive.ics.uci.edu/ml/datasets/Plants USDA Plants Data]
**Automatically cluster plants based on 70 attributes.
*[http://www.uni-koeln.de/themen/statistik/data/cluster/ Nutriens in Meat, Fish and Fowl]
**Can you cluster into animal type given the data?
 
===Text Data===
*[http://www.cs.cmu.edu/~enron/ Enron Emails]
**Search through Enron's publicly accessible emails.
*[http://archive.ics.uci.edu/ml/datasets/Bag+of+Words Bag of Words]
**Collection of word counts for various types of documents, including Enron emails, scientific papers, and New York Times articles.
 
 
===Reinforcement Learning===
Please note that all contributions to Noisebridge are considered to be released under the Creative Commons Attribution-NonCommercial-ShareAlike (see Noisebridge:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)