Machine Learning Meetup Notes: 2010-07-07: Difference between revisions
Jump to navigation
Jump to search
(Created page with 'Col-1: patient ID Col-2: responder status ("1" for patients who improved and "0" otherwise) Col-3: Protease nucleotide sequence (if available) Col-4: Reverse Transciptase nucleot…') |
No edit summary |
||
Line 1: | Line 1: | ||
Col-1: patient ID | *Col-1: patient ID | ||
Col-2: responder status ("1" for patients who improved and "0" otherwise) | *Col-2: responder status ("1" for patients who improved and "0" otherwise) | ||
Col-3: Protease nucleotide sequence (if available) | *Col-3: Protease nucleotide sequence (if available) | ||
Col-4: Reverse Transciptase nucleotide sequence (if available) | *Col-4: Reverse Transciptase nucleotide sequence (if available) | ||
Col-5: viral load at the beginning of therapy (log-10 units) | *Col-5: viral load at the beginning of therapy (log-10 units) | ||
Col-6: CD4 count at the beginning of therapy | *Col-6: CD4 count at the beginning of therapy | ||
molecular weight and length of "PR Sequence" and "RT Sequence" from the training data | molecular weight and length of "PR Sequence" and "RT Sequence" from the training data | ||
start weka | #start weka | ||
open mweight.csv | #open mweight.csv | ||
remove patient | #remove patient | ||
select resp | #select resp | ||
filter->unsupervised->attribute->numerictonominal | #filter->unsupervised->attribute->numerictonominal | ||
click to change to first only | #click to change to first only | ||
apply | #apply | ||
neural network | neural network | ||
classify->functions->multilayerperceptron | classify->functions->multilayerperceptron | ||
resp | #resp | ||
start | #start | ||
738 correct predictions a=0 no improvement | *738 correct predictions a=0 no improvement | ||
66 correct predictions b=1 improvement | *66 correct predictions b=1 improvement | ||
56 no improvement classified as improvement | *56 no improvement classified as improvement | ||
140 improvement classified as no improvement | *140 improvement classified as no improvement | ||
how well did it do? 80.4% accuracy | how well did it do? 80.4% accuracy | ||
*rows tell you what really happenned | |||
rows tell you what really happenned | *columns tell you what was predicted | ||
columns tell you what was predicted | |||
cluster simplekmeans | cluster simplekmeans | ||
#change num clusters 5 | |||
#ok->start | |||
scipy cluster.hierarchy | scipy cluster.hierarchy | ||
Line 41: | Line 40: | ||
single linkage clustering: start with n clusters, take the ones that have the shortest distance between them and make that a cluster. then keep going until you have 1 cluster. | single linkage clustering: start with n clusters, take the ones that have the shortest distance between them and make that a cluster. then keep going until you have 1 cluster. | ||
*when you join two points, you always check both of the distances in that cluster against other points, and then take whatever is smaller | |||
complete linkage: you take the largest distance instead | complete linkage: you take the largest distance instead | ||
*there is also one that takes the average |
Revision as of 22:18, 21 July 2010
- Col-1: patient ID
- Col-2: responder status ("1" for patients who improved and "0" otherwise)
- Col-3: Protease nucleotide sequence (if available)
- Col-4: Reverse Transciptase nucleotide sequence (if available)
- Col-5: viral load at the beginning of therapy (log-10 units)
- Col-6: CD4 count at the beginning of therapy
molecular weight and length of "PR Sequence" and "RT Sequence" from the training data
- start weka
- open mweight.csv
- remove patient
- select resp
- filter->unsupervised->attribute->numerictonominal
- click to change to first only
- apply
neural network classify->functions->multilayerperceptron
- resp
- start
- 738 correct predictions a=0 no improvement
- 66 correct predictions b=1 improvement
- 56 no improvement classified as improvement
- 140 improvement classified as no improvement
how well did it do? 80.4% accuracy
- rows tell you what really happenned
- columns tell you what was predicted
cluster simplekmeans
- change num clusters 5
- ok->start
scipy cluster.hierarchy main function called linkage ldist takes levenstein distance of each parts of the set result is a matrix distance hierarchical clustering
single linkage clustering: start with n clusters, take the ones that have the shortest distance between them and make that a cluster. then keep going until you have 1 cluster.
- when you join two points, you always check both of the distances in that cluster against other points, and then take whatever is smaller
complete linkage: you take the largest distance instead
- there is also one that takes the average