# Machine Learning

From Noisebridge

(Difference between revisions)

JeffreyATW (Talk | contribs) m (Reverted edits by Mischeif (talk) to last revision by JeffreyATW) |
|||

(9 intermediate revisions by 5 users not shown) | |||

Line 5: | Line 5: | ||

=== Next Meeting=== | === Next Meeting=== | ||

− | *When: Thursday, | + | *When: Thursday, August 14, 2014 @ 6:00pm |

*Where: 2169 Mission St. (Church classroom) | *Where: 2169 Mission St. (Church classroom) | ||

− | *Topic: | + | *Topic: |

*Details: | *Details: | ||

− | *Who: | + | *Who: Andy McMurry |

=== Learn about Data Science and Machine Learning === | === Learn about Data Science and Machine Learning === | ||

Line 20: | Line 20: | ||

*[http://www.cs.cmu.edu/~tom/10701_sp11/lectures.shtml Carnegie Mellon Machine Learning Course with Tom Mitchell] | *[http://www.cs.cmu.edu/~tom/10701_sp11/lectures.shtml Carnegie Mellon Machine Learning Course with Tom Mitchell] | ||

*[http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/ Linear Algebra with Gilbert Strang] | *[http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/ Linear Algebra with Gilbert Strang] | ||

+ | *[https://www.youtube.com/playlist?list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH Neural Networks Class with Hugo Larochelle] | ||

==== Books ==== | ==== Books ==== | ||

Line 88: | Line 89: | ||

*[http://www.prio.no/Data/Armed-Conflict/ UCDP/PRIO Armed Conflict Datasets] | *[http://www.prio.no/Data/Armed-Conflict/ UCDP/PRIO Armed Conflict Datasets] | ||

*[https://opendata.socrata.com/browse Socrata Government Datasets] | *[https://opendata.socrata.com/browse Socrata Government Datasets] | ||

+ | *[http://us-city.census.okfn.org/ US City Census Data] | ||

+ | *[http://webscope.sandbox.yahoo.com/catalog.php Yahoo Labs Datasets] | ||

=== Software Tools === | === Software Tools === | ||

Line 198: | Line 201: | ||

*[http://luscinia.sourceforge.net/ Luscinia] | *[http://luscinia.sourceforge.net/ Luscinia] | ||

**Software for archiving, measuring, and analyzing bioacoustic data | **Software for archiving, measuring, and analyzing bioacoustic data | ||

− | |||

*[http://wiki.python.org/moin/PythonInMusic List of Sound Tools for Python] | *[http://wiki.python.org/moin/PythonInMusic List of Sound Tools for Python] | ||

+ | *[http://jasperproject.github.io/ Jasper] | ||

+ | **Voice-control anything! | ||

==== Data Visualization ==== | ==== Data Visualization ==== |

## Latest revision as of 19:13, 20 October 2014

## Contents |

### [edit] Join the Mailing List

https://www.noisebridge.net/mailman/listinfo/ml

### [edit] Next Meeting

- When: Thursday, August 14, 2014 @ 6:00pm
- Where: 2169 Mission St. (Church classroom)
- Topic:
- Details:
- Who: Andy McMurry

### [edit] Learn about Data Science and Machine Learning

##### [edit] Classes

- Coursera Machine Learning Course with Andrew Ng
- Coursera Computational Neuroscience Course with Adrienne Fairhall
- MIT Machine Learning Class with Tommi Jaakkola
- Stanford CS229
- Carnegie Mellon Machine Learning Course with Tom Mitchell
- Linear Algebra with Gilbert Strang
- Neural Networks Class with Hugo Larochelle

#### [edit] Books

- Elements of Statistical Learning
- Pattern Recognition and Machine Learning
- Information Theory, Inference, and Learning Algorithms
- Interactive Data Visualization for the Web (D3)
- Introduction to R
- Introduction to Probability (Grinstead and Snell)
- Modern Introduction to Probability and Statistics (Kraaikamp and Meester)
- Bayesian Reasoning and Machine Learning

#### [edit] Tutorials

#### [edit] Noisebridge ML Class Slides

- Intro to Machine Learning
- A Brief Tour of Statistics
- Generalized Linear Models
- Neural Nets Workshop
- Support Vector Machines
- Random Forests
- Independent Components Analysis
- Deep Nets

### [edit] Code and SourceForge Site

- We have a Sourceforge Project
- We have a git repository on the project page, accessible as:

git clone git://ml-noisebridge.git.sourceforge.net/gitroot/ml-noisebridge/ml-noisebridge

- Send an email to the list if you want to become an administrator on the site to get write access to the git repo!

### [edit] Future Talks and Topics, Ideas

- Random Forests in R
- Restricted Boltzmann Machines (Mike S, some day)
- Analyzing brain cells (Mike S)
- Deep Nets w/ Stacked Autoencoders (Mike S, some day)
- Generalized Linear Models (Mike S, Erin L? some day)
- Graphical Models
- Working with the Kinect
- Computer Vision with OpenCV

### [edit] Projects

- Small Group Subproblems
- Fundraising
- Noisebridge Machine Learning Course
- Kaggle Social Network Contest
- KDD Competition 2010
- HIV

### [edit] Datasets and Websites

- UCI Machine Learning Repository
- DataSF.org
- Infochimps
- Face Recognition Databases
- Time Series Data Library
- Data Q&A Forum
- Metaoptimize
- Quora ML Page
- A ton of Weather Data
- MLcomp
- Upload your algorithm and objectively compare it's performance to other algorithms

- Social Security Death Master File!
- SIPRI Social Databases
- Wealth of information on international arms transfers and peace missions.

- Amazon AWS Public Datasets
- UCDP/PRIO Armed Conflict Datasets
- Socrata Government Datasets
- US City Census Data
- Yahoo Labs Datasets

### [edit] Software Tools

#### [edit] Generic ML Libraries

- Weka
- a collection of data mining tools and machine learning algorithms.

- scikits.learn
- Machine learning Python package

- scikits.statsmodels
- Statistical models to go with scipy

- PyBrain
- Does feedforward, recurrent, SOM, deep belief nets.

- LIBSVM
- c-based SVM package

- PyML
- MDP
- Modular framework, has lots of stuff!

- VirtualBox Virtual Box Image with Pre-installed Libraries listed here
- sympy Does symbolic math
- Waffles
- Open source C++ set of machine learning command line tools.

- RapidMiner
- Mobile Robotic Programming Toolkit
- nitime
- NeuroImaging in Python, has some good time series analysis stuff and multi-variate response fitting.

- Pandas
- Data analysis workflow in python

- PyTables
- Adds querying capabilities to HDF5 files

- statsmodels
- Regression, time series analysis, statistics stuff for python

- Vowpal Wabbit
- "Intrinsically Fast" implementation of gradient descent for large datasets

- Shogun
- Fast implementations of SVMs

- MLPACK
- High performance scalable ML Library

- Torch
- MATLAB-like environment for state-of-the art ML libraries written in LUA

#### [edit] Deep Nets

- Theano
- Symbolic Expressions and Transparent GPU Integration

- Caffe
- Convolutional Neural Networks on GPU

- Neurolab
- Has support for recurrent neural nets

#### [edit] Online ML

- MOA (Massive Online Analysis)
- Offshoot of weka, has all online-algorithms

- Jubatus
- Distributed Online ML

- DOGMA
- MATLAB-based online learning stuff

- libol
- oll
- scw-learning

#### [edit] Graphical Models

- BUGS
- MCMC for Bayesian Models

- JAGS
- Hierarchical Bayesian Models

- Stan
- A graphical model compiler

- Jayes
- Bayesian networks in Java

- ToPS
- Probabilistic models of sequences

- PyMC
- Bayesian Models in Python

#### [edit] Text Stuff

- Beautiful Soup
- Screen-scraping tools

- SALLY
- Tool for embedding strings into vector spaces

- Gensim
- Topic modeling

#### [edit] Collaborative Filtering

- PREA
- Personalized Recommendation Algorithms Toolkit

- SVDFeature
- Collaborative Filtering and Ranking Toolkit

#### [edit] Computer Vision

- OpenCV
- Computer Vision Library
- Has ML component (SVM, trees, etc)
- Online tutorials here

- DARWIN
- Generic C++ ML and Computer Vision Library

- PetaVision
- Developing a real-time, full-scale model of the primate visual cortex.

#### [edit] Audio Processing

- Friture
- Real-time spectrogram generation

- pyo
- Real-time audio signal processing

- PYMir
- A library for reading mp3's into python, and doing analysis

- PRAAT
- Speech analysis toolkit

- Sound Analysis Pro
- Tool for analyzing animal sounds

- Luscinia
- Software for archiving, measuring, and analyzing bioacoustic data

- List of Sound Tools for Python
- Jasper
- Voice-control anything!

#### [edit] Data Visualization

- Orange
- Strong data visualization component

- Gephi
- Graph Visualization

- ggplot
- Nice plotting package for R

- MayaVi2
- 3D Scientific Data Visualization

- Cytoscape
- A JavaScript graph library for analysis and visualisation

- plot.ly
- Web-based plotting

- D3 Ebook
- Has a good list of HTML/CSS/Javascript data visualization tools.

- plotly
- Python plotting tool

#### [edit] Cluster Computing

- Mahout
- Hadoop cluster based ML package.

- STAR: Cluster
- Easily build your own Python computing cluster on Amazon EC2

#### [edit] Database Stuff

#### [edit] Neural Simulation

#### [edit] Other

### [edit] Presentations and other Materials

- Awesome Machine Learning Applications -- A list of cool applications of ML
- Hands-on Machine Learning, a presentation jbm gave on 2009-01-07.
- http://www.youtube.com/user/StanfordUniversity#g/c/A89DCFA6ADACE599 Stanford Machine Learning online course videos]
- Media:Brief_statistics_slides.pdf, a presentation given on statistics for the machine learning group
- LinkedIn discussion on good resources for data mining and predictive analytics
- Face Recognition Algorithms
- Max Welling's ML classnotes

### [edit] Topics to Learn and Teach

NBML Course - Noisebridge Machine Learning Curriculum (work-in-progress)

CS229 - The Stanford Machine learning Course @ noisebridge

- Supervised Learning
- Linear Regression
- Linear Discriminants
- Neural Nets/Radial Basis Functions
- Support Vector Machines
- Classifier Combination [1]
- A basic decision tree builder, recursive and using entropy metrics

- Unsupervised Learning
- Hidden Markov Models
- Clustering: PCA, k-Means, Expectation-Maximization
- Graphical Modeling
- Generative Models: gaussian distribution, multinomial distributions, HMMs, Naive Bayes
- Deep Belief Networks & Restricted Boltzmann Machines

- Reinforcement Learning
- Temporal Difference Learning

- Math, Probability & Statistics
- Metric spaces and what they mean
- Fundamentals of probabilities
- Decision Theory (Bayesian)
- Maximum Likelihood
- Bias/Variance Tradeoff, VC Dimension
- Bagging, Bootstrap, Jacknife [2]
- Information Theory: Entropy, Mutual Information, Gaussian Channels
- Estimation of Misclassification [3]
- No-Free Lunch Theorem [4]

- Machine Learning SDK's

- Applications
- Collective Intelligence & Recommendation Engines