Machine Learning/Datasets: Difference between revisions

From Noisebridge
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
Line 32: Line 32:
*[http://robjhyndman.com/tsdldata/data/earthq.dat Number of Earthquakes per Year 1900-1988 (>= 7.0)]
*[http://robjhyndman.com/tsdldata/data/earthq.dat Number of Earthquakes per Year 1900-1988 (>= 7.0)]
**"Source: National Earthquake Information Center. Different lists will give different numbers depending on the formula used for calculating the magnitude."
**"Source: National Earthquake Information Center. Different lists will give different numbers depending on the formula used for calculating the magnitude."


===Clustering===
===Clustering===
*[http://archive.ics.uci.edu/ml/datasets/Plants USDA Plants Data]
*[http://archive.ics.uci.edu/ml/datasets/Plants USDA Plants Data]
**Automatically cluster plants based on 70 attributes.
**Automatically cluster plants based on 70 attributes.


===Text Data===
===Text Data===
*[http://www.cs.cmu.edu/~enron/ Enron Emails]
*[http://www.cs.cmu.edu/~enron/ Enron Emails]
**Search through Enron's publicly accessible emails.
**Search through Enron's publicly accessible emails.

Revision as of 00:08, 15 March 2011

Machine learning is a vast field and there are many different types of problems to be solved. If you find a dataset interesting, try to categorize it (or add a new category) and add it to the links below.

Classification

  • MNIST Handwritten Digits
    • Classify handwritten digits using this dataset, a very popular one with lots of training examples.
  • Heart Disease
    • Predict whether a person will have heart disease based on a subset of 76 factors.
  • Census Income
    • Try to predict whether a person has an income greater than or less than 50k

Regression

Time Series

Clustering

Text Data

  • Enron Emails
    • Search through Enron's publicly accessible emails.