Machine Learning Meetup Notes: 2010-06-30: Difference between revisions

From Noisebridge
Jump to navigation Jump to search
(Created page with 'amino acids build proteins 20 amino acids protein has an amino acid sequence (three bases make up an amino acid) dna comprised of 4 bases: A, T, C, G rna comprised of 4 bases, A…')
 
No edit summary
 
Line 1: Line 1:
Mike's bio overview:
amino acids build proteins
amino acids build proteins
20 amino acids
20 amino acids


protein has an amino acid sequence (three bases make up an amino acid)
*protein has an amino acid sequence (three bases make up an amino acid)
dna comprised of 4 bases: A, T, C, G
*dna comprised of 4 bases: A, T, C, G
rna comprised of 4 bases, A, U, C, G
*rna comprised of 4 bases, A, U, C, G
*A goes with T
*C with G


A goes with T
every three bases is a codon,
C with G
 
every three bases is a codon
dave wrote a script that will take the codons and map them to their amino acids
dave wrote a script that will take the codons and map them to their amino acids


protease - are a type of proteins that cleave other proteins?
*protease - are a type of proteins that cleave other proteins?
 
*reverse transcriptase - takes viral rna and transcribes it into dna
reverse transcriptase - takes viral rna and transcribes it into dna
*sends mrna (bad) into the ribosomes
sends mrna (bad) into the ribosomes
*they replicate very fast in your immune cells and thats how they kill them
they replicate very fast in your immune cells and thats how they kill them


99 amino acids in protease (297 dna bases)
99 amino acids in protease (297 dna bases)
Line 24: Line 25:


Possible Features:
Possible Features:
-for the OR acids, perhaps create all possible combinations and weight them by 1/(number of combinations), normal rows weight = 1
*for the OR acids, perhaps create all possible combinations and weight them by 1/(number of combinations), normal rows weight = 1
-find most probable sequences
*find most probable sequences (T)
-correlating permutations
*correlating permutations (T)
-molecular weight/length
*molecular weight/length (E/Th)
-acidity/charge
*acidity/charge  
-edit distance (differences between the sequences), use to cluster
*edit distance (differences between the sequences), use to cluster (A)
-list of known resistant mvt sites
*list of known resistant mvt sites (M)
-find out which sites are most variable
*find out which sites are most variable


Liebenstein?
Liebenstein?


for each site, and look at frequency of each amino acid
for each site, and look at frequency of each amino acid
could put into a tree classifier
could put into a tree classifier

Latest revision as of 22:21, 30 June 2010

Mike's bio overview:

amino acids build proteins

20 amino acids

  • protein has an amino acid sequence (three bases make up an amino acid)
  • dna comprised of 4 bases: A, T, C, G
  • rna comprised of 4 bases, A, U, C, G
  • A goes with T
  • C with G

every three bases is a codon, dave wrote a script that will take the codons and map them to their amino acids

  • protease - are a type of proteins that cleave other proteins?
  • reverse transcriptase - takes viral rna and transcribes it into dna
  • sends mrna (bad) into the ribosomes
  • they replicate very fast in your immune cells and thats how they kill them

99 amino acids in protease (297 dna bases)

reverse transcriptase is not predictable - each sequence is a different length


Possible Features:

  • for the OR acids, perhaps create all possible combinations and weight them by 1/(number of combinations), normal rows weight = 1
  • find most probable sequences (T)
  • correlating permutations (T)
  • molecular weight/length (E/Th)
  • acidity/charge
  • edit distance (differences between the sequences), use to cluster (A)
  • list of known resistant mvt sites (M)
  • find out which sites are most variable

Liebenstein?

for each site, and look at frequency of each amino acid

could put into a tree classifier