Machine Learning Meetup Notes: 2010-07-07: Difference between revisions

From Noisebridge
Jump to navigation Jump to search
(Created page with 'Col-1: patient ID Col-2: responder status ("1" for patients who improved and "0" otherwise) Col-3: Protease nucleotide sequence (if available) Col-4: Reverse Transciptase nucleot…')
 
No edit summary
Line 1: Line 1:
Col-1: patient ID
*Col-1: patient ID
Col-2: responder status ("1" for patients who improved and "0" otherwise)
*Col-2: responder status ("1" for patients who improved and "0" otherwise)
Col-3: Protease nucleotide sequence (if available)
*Col-3: Protease nucleotide sequence (if available)
Col-4: Reverse Transciptase nucleotide sequence (if available)
*Col-4: Reverse Transciptase nucleotide sequence (if available)
Col-5: viral load at the beginning of therapy (log-10 units)
*Col-5: viral load at the beginning of therapy (log-10 units)
Col-6: CD4 count at the beginning of therapy
*Col-6: CD4 count at the beginning of therapy


molecular weight and length of "PR Sequence" and "RT Sequence" from the training data
molecular weight and length of "PR Sequence" and "RT Sequence" from the training data
start weka
#start weka
open mweight.csv
#open mweight.csv
remove patient
#remove patient
select resp
#select resp
filter->unsupervised->attribute->numerictonominal
#filter->unsupervised->attribute->numerictonominal
click to change to first only
#click to change to first only
apply
#apply


neural network
neural network
classify->functions->multilayerperceptron
classify->functions->multilayerperceptron
resp
#resp
start
#start
738 correct predictions a=0 no improvement
*738 correct predictions a=0 no improvement
66 correct predictions b=1 improvement
*66 correct predictions b=1 improvement


56 no improvement classified as improvement
*56 no improvement classified as improvement
140 improvement classified as no improvement
*140 improvement classified as no improvement


how well did it do?  80.4% accuracy
how well did it do?  80.4% accuracy
 
*rows tell you what really happenned
rows tell you what really happenned
*columns tell you what was predicted
columns tell you what was predicted


cluster simplekmeans
cluster simplekmeans
  change num clusters 5
#change num clusters 5
  ok->start
#ok->start


scipy cluster.hierarchy
scipy cluster.hierarchy
Line 41: Line 40:


single linkage clustering: start with n clusters, take the ones that have the shortest distance between them and make that a cluster.  then keep going until you have 1 cluster.   
single linkage clustering: start with n clusters, take the ones that have the shortest distance between them and make that a cluster.  then keep going until you have 1 cluster.   
-when you join two points, you always check both of the distances in that cluster against other points, and then take whatever is smaller
*when you join two points, you always check both of the distances in that cluster against other points, and then take whatever is smaller
complete linkage: you take the largest distance instead
complete linkage: you take the largest distance instead
-there is also one that takes the average
*there is also one that takes the average

Revision as of 22:18, 21 July 2010

  • Col-1: patient ID
  • Col-2: responder status ("1" for patients who improved and "0" otherwise)
  • Col-3: Protease nucleotide sequence (if available)
  • Col-4: Reverse Transciptase nucleotide sequence (if available)
  • Col-5: viral load at the beginning of therapy (log-10 units)
  • Col-6: CD4 count at the beginning of therapy

molecular weight and length of "PR Sequence" and "RT Sequence" from the training data

  1. start weka
  2. open mweight.csv
  3. remove patient
  4. select resp
  5. filter->unsupervised->attribute->numerictonominal
  6. click to change to first only
  7. apply

neural network classify->functions->multilayerperceptron

  1. resp
  2. start
  • 738 correct predictions a=0 no improvement
  • 66 correct predictions b=1 improvement
  • 56 no improvement classified as improvement
  • 140 improvement classified as no improvement

how well did it do? 80.4% accuracy

  • rows tell you what really happenned
  • columns tell you what was predicted

cluster simplekmeans

  1. change num clusters 5
  2. ok->start

scipy cluster.hierarchy main function called linkage ldist takes levenstein distance of each parts of the set result is a matrix distance hierarchical clustering

single linkage clustering: start with n clusters, take the ones that have the shortest distance between them and make that a cluster. then keep going until you have 1 cluster.

  • when you join two points, you always check both of the distances in that cluster against other points, and then take whatever is smaller

complete linkage: you take the largest distance instead

  • there is also one that takes the average