Machine Learning: Difference between revisions

From Noisebridge
Jump to navigation Jump to search
(228 intermediate revisions by 23 users not shown)
Line 1: Line 1:
=== Join the Mailing List ===
https://www.noisebridge.net/mailman/listinfo/ml
=== Next Meeting===
=== Next Meeting===


*When: Wednesday, 1/26/2010 @ 7:00-9:00pm
*When: Thursday, February 13, 2014 @ 6:30pm
*Where: 2169 Mission St. (back corner classroom)
*Where: 2169 Mission St. (Church classroom)
*Topic: [[Neural Network Workshop]]!!
*Topic: Bayesian Inference for everyone
*Details: Check the link!
*Details:
*Presenter: Mike S
*Who: Sam Tepper
 
=== Learn about Data Science and Machine Learning ===
 
===== Classes =====
*[https://www.coursera.org/course/ml Coursera Machine Learning Course with Andrew Ng]
*[https://www.coursera.org/course/compneuro Coursera Computational Neuroscience Course with Adrienne Fairhall]
*[http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-867-machine-learning-fall-2006/ MIT Machine Learning Class with Tommi Jaakkola]
*[http://cs229.stanford.edu/materials.html Stanford CS229]
*[http://www.cs.cmu.edu/~tom/10701_sp11/lectures.shtml Carnegie Mellon Machine Learning Course with Tom Mitchell]
*[http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/ Linear Algebra with Gilbert Strang]
*[https://www.youtube.com/playlist?list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH Neural Networks Class with Hugo Larochelle]
 
==== Books ====
*[http://statweb.stanford.edu/~tibs/ElemStatLearn/ Elements of Statistical Learning]
*[https://www.google.com/search?client=ubuntu&channel=fs&q=pattern+recognition+and+machine+learning&ie=utf-8&oe=utf-8#channel=fs&q=pattern+recognition+and+machine+learning+pdf Pattern Recognition and Machine Learning]
*[https://www.google.com/search?&channel=fs&q=+Information+Theory%2C+Inference%2C+and+Learning+Algorithms.&ie=utf-8&oe=utf-8#channel=fs&q=Information+Theory%2C+Inference%2C+and+Learning+Algorithms+pdf Information Theory, Inference, and Learning Algorithms]
*[http://chimera.labs.oreilly.com/books/1230000000345 Interactive Data Visualization for the Web (D3)]
*[http://cran.r-project.org/doc/manuals/R-intro.pdf Introduction to R]
*[http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook.mac.pdf Introduction to Probability (Grinstead and Snell)]
*[http://www.cis.temple.edu/~latecki/Courses/CIS2033-Spring12/A_modern_intro_probability_statistics_Dekking05.pdf Modern Introduction to Probability and Statistics (Kraaikamp and Meester)]
*[http://web4.cs.ucl.ac.uk/staff/D.Barber/textbook/090310.pdf Bayesian Reasoning and Machine Learning]


=== Future Talks and Topics ===
==== Tutorials ====
* Graphical Models, Tony
*[http://nbviewer.ipython.org/github/unpingco/Python-for-Signal-Processing/tree/master/ Signal Processing IPython Notebooks]
* Recurrent Neural Networks, Boltzmann Machines (Mike S, February 2011)
*[http://scikit-learn.org/stable/tutorial/basic/tutorial.html Introduction to ML with scikits.learn]
* Boosting and Bagging (Thomas, unscheduled)
*[http://www.sagemath.org/doc/tutorial/ Learn how to use SAGE]
* [[CS229]] second problem set
* RPy?


=== Mailing List ===
==== Noisebridge ML Class Slides ====
*[[NBML/Workshops/Intro to Machine Learning|Intro to Machine Learning]]
*[[NBML/Workshops/Brief Tour of Statistics|A Brief Tour of Statistics]]
*[[NBML/Workshops/Generalized Linear Models|Generalized Linear Models]]
*[[NBML/Workshops/Neural Nets|Neural Nets Workshop]]
*[[NBML/Workshops/Support Vector Machines|Support Vector Machines]]
*[[NBML/Workshops/Random Forests|Random Forests]]
*[[NBML/Workshops/Independent Components Analysis|Independent Components Analysis]]
*[[NBML/Workshops/Deep Nets|Deep Nets]]


https://www.noisebridge.net/mailman/listinfo/ml
=== Code and SourceForge Site ===
*We have a [http://sourceforge.net/projects/ml-noisebridge Sourceforge Project]
*We have a git repository on the project page, accessible as:
    git clone git://ml-noisebridge.git.sourceforge.net/gitroot/ml-noisebridge/ml-noisebridge
*Send an email to the list if you want to become an administrator on the site to get write access to the git repo!
 
=== Future Talks and Topics, Ideas ===
*Random Forests in R
*Restricted Boltzmann Machines (Mike S, some day)
*Analyzing brain cells (Mike S)
*Deep Nets w/ Stacked Autoencoders (Mike S, some day)
*Generalized Linear Models (Mike S, Erin L? some day)
*Graphical Models
*Working with the Kinect
*Computer Vision with OpenCV


=== Projects ===
=== Projects ===
*[[Small Group Subproblems]]
*[[Machine Learning/Fundraising | Fundraising]]
*[[Machine Learning/Fundraising | Fundraising]]
*[[NBML_Course|Noisebridge Machine Learning Course]]
*[[NBML_Course|Noisebridge Machine Learning Course]]
Line 25: Line 71:
*[[Machine Learning/Kaggle HIV | HIV]]
*[[Machine Learning/Kaggle HIV | HIV]]


=== [[Machine_Learning/Datasets|Datasets]] ===
=== [[Machine_Learning/Datasets|Datasets and Websites]] ===
*[http://archive.ics.uci.edu/ml/ UCI Machine Learning Repository]
*[http://archive.ics.uci.edu/ml/ UCI Machine Learning Repository]
*[[DataSF.org]]
*[[DataSF.org]]
*[http://infochimps.com/ Infochimps]
*[http://infochimps.com/ Infochimps]
*[http://www.face-rec.org/databases/ Face Recognition Databases]
*[http://www.face-rec.org/databases/ Face Recognition Databases]
*[http://robjhyndman.com/TSDL/ Time Series Data Library]
*[http://getthedata.org/ Data Q&A Forum]
*[http://metaoptimize.com/qa/ Metaoptimize]
*[http://www.quora.com/Machine-Learning Quora ML Page]
*[http://www.metoffice.gov.uk/research/climate/climate-monitoring/land-and-atmosphere/surface-station-records A ton of Weather Data]
*[http://mlcomp.org/ MLcomp]
**Upload your algorithm and objectively compare it's performance to other algorithms
*[http://www.ntis.gov/products/ssa-dmf.aspx Social Security Death Master File!]
*[http://www.sipri.org/databases SIPRI Social Databases]
**Wealth of information on international arms transfers and peace missions.
*[http://aws.amazon.com/publicdatasets/ Amazon AWS Public Datasets]
*[http://www.prio.no/Data/Armed-Conflict/ UCDP/PRIO Armed Conflict Datasets]
*[https://opendata.socrata.com/browse Socrata Government Datasets]
*[http://us-city.census.okfn.org/ US City Census Data]
*[http://webscope.sandbox.yahoo.com/catalog.php Yahoo Labs Datasets]


=== [[Machine Learning/Tools | Software Tools]] ===
=== Software Tools ===
*[http://opencv.willowgarage.com/documentation/index.html OpenCV]
 
**Computer Vision Library
==== Generic ML Libraries ====
**Has ML component (SVM, trees, etc)
**Online tutorials [http://www.pages.drexel.edu/~nk752/tutorials.html here]
*[http://lucene.apache.org/mahout/ Mahout]
**Hadoop cluster based ML package.
*[http://www.cs.waikato.ac.nz/ml/weka/ Weka]
*[http://www.cs.waikato.ac.nz/ml/weka/ Weka]
**a collection of data mining tools and machine learning algorithms.
**a collection of data mining tools and machine learning algorithms.
*[http://moa.cs.waikato.ac.nz/ MOA (Massive Online Analysis)]
**Offshoot of weka, has all online-algorithms
*[http://scikit-learn.sourceforge.net/ scikits.learn]
*[http://scikit-learn.sourceforge.net/ scikits.learn]
**Machine learning Python package
**Machine learning Python package
*[http://pypi.python.org/pypi/scikits.statsmodels scikits.statsmodels]
**Statistical models to go with scipy
*[http://pybrain.org PyBrain]
**Does feedforward, recurrent, SOM, deep belief nets.
*[http://www.csie.ntu.edu.tw/~cjlin/libsvm/ LIBSVM]
*[http://www.csie.ntu.edu.tw/~cjlin/libsvm/ LIBSVM]
**c-based SVM package
**c-based SVM package
*[http://pyml.sourceforge.net PyML]
*[http://pyml.sourceforge.net PyML]
*[http://mdp-toolkit.sourceforge.net/ MDP]
*[http://mdp-toolkit.sourceforge.net/ MDP]
**Modular framework, has lots of stuff!
*[[Machine Learning/VirtualBox|VirtualBox]] Virtual Box Image with Pre-installed Libraries listed here
*[http://sympy.org sympy] Does symbolic math
*[http://waffles.sourceforge.net/ Waffles]
**Open source C++ set of machine learning command line tools.
*[http://rapid-i.com/content/view/181/196/ RapidMiner]
*[http://www.mrpt.org/ Mobile Robotic Programming Toolkit]
*[http://nipy.sourceforge.net/nitime/ nitime]
**NeuroImaging in Python, has some good time series analysis stuff and multi-variate response fitting.
*[http://pandas.pydata.org/ Pandas]
**Data analysis workflow in python
*[http://www.pytables.org/moin PyTables]
**Adds querying capabilities to HDF5 files
*[http://statsmodels.sourceforge.net/ statsmodels]
**Regression, time series analysis, statistics stuff for python
*[https://github.com/JohnLangford/vowpal_wabbit/wiki Vowpal Wabbit]
**"Intrinsically Fast" implementation of gradient descent for large datasets
*[http://www.shogun-toolbox.org/ Shogun]
**Fast implementations of SVMs
*[http://www.mlpack.org/ MLPACK]
**High performance scalable ML Library
*[http://www.torch.ch/ Torch]
**MATLAB-like environment for state-of-the art ML libraries written in LUA
==== Deep Nets ====
*[http://deeplearning.net/software/theano/ Theano]
**Symbolic Expressions and Transparent GPU Integration
*[http://caffe.berkeleyvision.org/ Caffe]
**Convolutional Neural Networks on GPU
*[https://code.google.com/p/neurolab/ Neurolab]
**Has support for recurrent neural nets
==== Online ML ====
*[http://moa.cs.waikato.ac.nz/ MOA (Massive Online Analysis)]
**Offshoot of weka, has all online-algorithms
*[http://jubat.us/en/ Jubatus]
**Distributed Online ML
*[http://dogma.sourceforge.net/ DOGMA]
**MATLAB-based online learning stuff
*[http://code.google.com/p/libol/ libol]
*[http://code.google.com/p/oll/ oll]
*[http://code.google.com/p/scw-learning/ scw-learning]
==== Graphical Models ====
*[http://www.mrc-bsu.cam.ac.uk/bugs/ BUGS]
**MCMC for Bayesian Models
*[http://mcmc-jags.sourceforge.net/ JAGS]
**Hierarchical Bayesian Models
*[http://mc-stan.org/ Stan]
**A graphical model compiler
*[https://github.com/kutschkem/Jayes Jayes]
**Bayesian networks in Java
*[http://tops.sourceforge.net/ ToPS]
**Probabilistic models of sequences
*[http://pymc-devs.github.io/pymc/ PyMC]
**Bayesian Models in Python
==== Text Stuff ====
*[http://www.crummy.com/software/BeautifulSoup/ Beautiful Soup]
**Screen-scraping tools
*[http://www.mlsec.org/sally/ SALLY]
**Tool for embedding strings into vector spaces
*[http://radimrehurek.com/gensim/ Gensim]
**Topic modeling
==== Collaborative Filtering ====
*[http://prea.gatech.edu/ PREA]
**Personalized Recommendation Algorithms Toolkit
*[http://svdfeature.apexlab.org/wiki/Main_Page SVDFeature]
**Collaborative Filtering and Ranking Toolkit
==== Computer Vision ====
*[http://opencv.willowgarage.com/documentation/index.html OpenCV]
**Computer Vision Library
**Has ML component (SVM, trees, etc)
**Online tutorials [http://www.pages.drexel.edu/~nk752/tutorials.html here]
*[http://drwn.anu.edu.au/ DARWIN]
**Generic C++ ML and Computer Vision Library
*[http://sourceforge.net/projects/petavision/ PetaVision]
**Developing a real-time, full-scale model of the primate visual cortex.
==== Audio Processing ====
*[http://tlecomte.github.com/friture/ Friture]
**Real-time spectrogram generation
*[http://code.google.com/p/pyo/ pyo]
**Real-time audio signal processing
*[https://github.com/jsawruk/pymir PYMir]
**A library for reading mp3's into python, and doing analysis
*[http://www.fon.hum.uva.nl/praat/ PRAAT]
**Speech analysis toolkit
*[http://ofer.sci.ccny.cuny.edu/sound_analysis_pro Sound Analysis Pro]
**Tool for analyzing animal sounds
*[http://luscinia.sourceforge.net/ Luscinia]
**Software for archiving, measuring, and analyzing bioacoustic data
*[http://wiki.python.org/moin/PythonInMusic List of Sound Tools for Python]
*[http://jasperproject.github.io/ Jasper]
**Voice-control anything!
==== Data Visualization ====
*[http://www.ailab.si/orange/ Orange]
*[http://www.ailab.si/orange/ Orange]
**Strong data visualization component
**Strong data visualization component
*[http://gephi.org/ Gephi]
**Graph Visualization
*[http://had.co.nz/ggplot2/ ggplot]
**Nice plotting package for R
*[http://code.enthought.com/projects/mayavi/ MayaVi2]
**3D Scientific Data Visualization
*[http://cytoscape.github.io/cytoscape.js/ Cytoscape]
**A JavaScript graph library for analysis and visualisation
*[https://plot.ly/ plot.ly]
**Web-based plotting
*[http://chimera.labs.oreilly.com/books/1230000000345/ch02.html D3 Ebook]
**Has a good list of HTML/CSS/Javascript data visualization tools.
*[https://plot.ly/ plotly]
**Python plotting tool
==== Cluster Computing ====
*[http://lucene.apache.org/mahout/ Mahout]
**Hadoop cluster based ML package.
*[http://web.mit.edu/star/cluster/ STAR: Cluster]
**Easily build your own Python computing cluster on Amazon EC2
==== Database Stuff ====
*[http://madlib.net/ MADlib]
**Machine learning algorithms for in-database data
*[http://www.joyent.com/products/manta Manta]
**Distributed object storage
==== Neural Simulation ====
*[http://nengo.ca/ Nengo]
==== Other ====
*[http://jmlr.csail.mit.edu/mloss/ Journal of Machine Learning Software List]
*[http://jmlr.csail.mit.edu/mloss/ Journal of Machine Learning Software List]


Line 59: Line 247:
* [http://www.linkedin.com/groupAnswers?viewQuestionAndAnswers=&discussionID=20096092&gid=77616&trk=EML_anet_qa_ttle-0Nt79xs2RVr6JBpnsJt7dBpSBA LinkedIn] discussion on good resources for data mining and predictive analytics
* [http://www.linkedin.com/groupAnswers?viewQuestionAndAnswers=&discussionID=20096092&gid=77616&trk=EML_anet_qa_ttle-0Nt79xs2RVr6JBpnsJt7dBpSBA LinkedIn] discussion on good resources for data mining and predictive analytics
* [http://www.face-rec.org/algorithms/ Face Recognition Algorithms]
* [http://www.face-rec.org/algorithms/ Face Recognition Algorithms]
* [http://www.ics.uci.edu/~welling/classnotes/classnotes.html Max Welling's ML classnotes]


=== Topics to Learn and Teach ===
=== Topics to Learn and Teach ===
[[NBML Course]] - Noisebridge Machine Learning Curriculum (work-in-progress)
[[CS229]] - The Stanford Machine learning Course @ noisebridge
[[CS229]] - The Stanford Machine learning Course @ noisebridge
*Supervised Learning
*Supervised Learning
**Linear Regression
**Linear Regression
Line 100: Line 292:


=== [[Machine Learning/Meeting Notes|Meeting Notes]]===
=== [[Machine Learning/Meeting Notes|Meeting Notes]]===
[[Category:Events]]
[[Category:Projects]]

Revision as of 10:49, 9 April 2014

Join the Mailing List

https://www.noisebridge.net/mailman/listinfo/ml

Next Meeting

  • When: Thursday, February 13, 2014 @ 6:30pm
  • Where: 2169 Mission St. (Church classroom)
  • Topic: Bayesian Inference for everyone
  • Details:
  • Who: Sam Tepper

Learn about Data Science and Machine Learning

Classes

Books

Tutorials

Noisebridge ML Class Slides

Code and SourceForge Site

    git clone git://ml-noisebridge.git.sourceforge.net/gitroot/ml-noisebridge/ml-noisebridge
  • Send an email to the list if you want to become an administrator on the site to get write access to the git repo!

Future Talks and Topics, Ideas

  • Random Forests in R
  • Restricted Boltzmann Machines (Mike S, some day)
  • Analyzing brain cells (Mike S)
  • Deep Nets w/ Stacked Autoencoders (Mike S, some day)
  • Generalized Linear Models (Mike S, Erin L? some day)
  • Graphical Models
  • Working with the Kinect
  • Computer Vision with OpenCV

Projects

Datasets and Websites

Software Tools

Generic ML Libraries

  • Weka
    • a collection of data mining tools and machine learning algorithms.
  • scikits.learn
    • Machine learning Python package
  • scikits.statsmodels
    • Statistical models to go with scipy
  • PyBrain
    • Does feedforward, recurrent, SOM, deep belief nets.
  • LIBSVM
    • c-based SVM package
  • PyML
  • MDP
    • Modular framework, has lots of stuff!
  • VirtualBox Virtual Box Image with Pre-installed Libraries listed here
  • sympy Does symbolic math
  • Waffles
    • Open source C++ set of machine learning command line tools.
  • RapidMiner
  • Mobile Robotic Programming Toolkit
  • nitime
    • NeuroImaging in Python, has some good time series analysis stuff and multi-variate response fitting.
  • Pandas
    • Data analysis workflow in python
  • PyTables
    • Adds querying capabilities to HDF5 files
  • statsmodels
    • Regression, time series analysis, statistics stuff for python
  • Vowpal Wabbit
    • "Intrinsically Fast" implementation of gradient descent for large datasets
  • Shogun
    • Fast implementations of SVMs
  • MLPACK
    • High performance scalable ML Library
  • Torch
    • MATLAB-like environment for state-of-the art ML libraries written in LUA

Deep Nets

  • Theano
    • Symbolic Expressions and Transparent GPU Integration
  • Caffe
    • Convolutional Neural Networks on GPU
  • Neurolab
    • Has support for recurrent neural nets

Online ML

Graphical Models

  • BUGS
    • MCMC for Bayesian Models
  • JAGS
    • Hierarchical Bayesian Models
  • Stan
    • A graphical model compiler
  • Jayes
    • Bayesian networks in Java
  • ToPS
    • Probabilistic models of sequences
  • PyMC
    • Bayesian Models in Python

Text Stuff

Collaborative Filtering

  • PREA
    • Personalized Recommendation Algorithms Toolkit
  • SVDFeature
    • Collaborative Filtering and Ranking Toolkit

Computer Vision

  • OpenCV
    • Computer Vision Library
    • Has ML component (SVM, trees, etc)
    • Online tutorials here
  • DARWIN
    • Generic C++ ML and Computer Vision Library
  • PetaVision
    • Developing a real-time, full-scale model of the primate visual cortex.

Audio Processing

Data Visualization

  • Orange
    • Strong data visualization component
  • Gephi
    • Graph Visualization
  • ggplot
    • Nice plotting package for R
  • MayaVi2
    • 3D Scientific Data Visualization
  • Cytoscape
    • A JavaScript graph library for analysis and visualisation
  • plot.ly
    • Web-based plotting
  • D3 Ebook
    • Has a good list of HTML/CSS/Javascript data visualization tools.
  • plotly
    • Python plotting tool

Cluster Computing

  • Mahout
    • Hadoop cluster based ML package.
  • STAR: Cluster
    • Easily build your own Python computing cluster on Amazon EC2

Database Stuff

  • MADlib
    • Machine learning algorithms for in-database data
  • Manta
    • Distributed object storage

Neural Simulation

Other

Presentations and other Materials

Topics to Learn and Teach

NBML Course - Noisebridge Machine Learning Curriculum (work-in-progress)

CS229 - The Stanford Machine learning Course @ noisebridge

  • Supervised Learning
    • Linear Regression
    • Linear Discriminants
    • Neural Nets/Radial Basis Functions
    • Support Vector Machines
    • Classifier Combination [1]
    • A basic decision tree builder, recursive and using entropy metrics
  • Reinforcement Learning
    • Temporal Difference Learning
  • Math, Probability & Statistics
    • Metric spaces and what they mean
    • Fundamentals of probabilities
    • Decision Theory (Bayesian)
    • Maximum Likelihood
    • Bias/Variance Tradeoff, VC Dimension
    • Bagging, Bootstrap, Jacknife [2]
    • Information Theory: Entropy, Mutual Information, Gaussian Channels
    • Estimation of Misclassification [3]
    • No-Free Lunch Theorem [4]
  • Machine Learning SDK's
    • OpenCV ML component (SVM, trees, etc)
    • Mahout a Hadoop cluster based ML package.
    • Weka a collection of data mining tools and machine learning algorithms.
  • Applications
    • Collective Intelligence & Recommendation Engines

Meeting Notes