Machine Learning/Datasets

From Noisebridge
< Machine Learning(Difference between revisions)
Jump to: navigation, search
m (Text Data)
m
Line 36: Line 36:
 
*[http://archive.ics.uci.edu/ml/datasets/Plants USDA Plants Data]
 
*[http://archive.ics.uci.edu/ml/datasets/Plants USDA Plants Data]
 
**Automatically cluster plants based on 70 attributes.
 
**Automatically cluster plants based on 70 attributes.
 +
*[http://www.uni-koeln.de/themen/statistik/data/cluster/ Nutriens in Meat, Fish and Fowl]
 +
**Can you cluster into animal type given the data?
  
 
===Text Data===
 
===Text Data===

Revision as of 00:57, 15 March 2011

Machine learning is a vast field and there are many different types of problems to be solved. If you find a dataset interesting, try to categorize it (or add a new category) and add it to the links below.

Contents

Classification

  • MNIST Handwritten Digits
    • Classify handwritten digits using this dataset, a very popular one with lots of training examples.
  • Heart Disease
    • Predict whether a person will have heart disease based on a subset of 76 factors.
  • Census Income
    • Try to predict whether a person has an income greater than or less than 50k

Regression

Time Series

Clustering

Text Data

  • Enron Emails
    • Search through Enron's publicly accessible emails.
  • Bag of Words
    • Collection of word counts for various types of documents, including Enron emails, scientific papers, and New York Times articles.
Personal tools