Machine Learning Meetup Notes: 2010-06-30: Difference between revisions
Jump to navigation
Jump to search
(Created page with 'amino acids build proteins 20 amino acids protein has an amino acid sequence (three bases make up an amino acid) dna comprised of 4 bases: A, T, C, G rna comprised of 4 bases, A…') |
No edit summary |
||
Line 1: | Line 1: | ||
Mike's bio overview: | |||
amino acids build proteins | amino acids build proteins | ||
20 amino acids | 20 amino acids | ||
protein has an amino acid sequence (three bases make up an amino acid) | *protein has an amino acid sequence (three bases make up an amino acid) | ||
dna comprised of 4 bases: A, T, C, G | *dna comprised of 4 bases: A, T, C, G | ||
rna comprised of 4 bases, A, U, C, G | *rna comprised of 4 bases, A, U, C, G | ||
*A goes with T | |||
*C with G | |||
every three bases is a codon, | |||
every three bases is a codon | |||
dave wrote a script that will take the codons and map them to their amino acids | dave wrote a script that will take the codons and map them to their amino acids | ||
protease - are a type of proteins that cleave other proteins? | *protease - are a type of proteins that cleave other proteins? | ||
*reverse transcriptase - takes viral rna and transcribes it into dna | |||
reverse transcriptase - takes viral rna and transcribes it into dna | *sends mrna (bad) into the ribosomes | ||
sends mrna (bad) into the ribosomes | *they replicate very fast in your immune cells and thats how they kill them | ||
they replicate very fast in your immune cells and thats how they kill them | |||
99 amino acids in protease (297 dna bases) | 99 amino acids in protease (297 dna bases) | ||
Line 24: | Line 25: | ||
Possible Features: | Possible Features: | ||
*for the OR acids, perhaps create all possible combinations and weight them by 1/(number of combinations), normal rows weight = 1 | |||
*find most probable sequences (T) | |||
*correlating permutations (T) | |||
*molecular weight/length (E/Th) | |||
*acidity/charge | |||
*edit distance (differences between the sequences), use to cluster (A) | |||
*list of known resistant mvt sites (M) | |||
*find out which sites are most variable | |||
Liebenstein? | Liebenstein? | ||
for each site, and look at frequency of each amino acid | for each site, and look at frequency of each amino acid | ||
could put into a tree classifier | could put into a tree classifier |
Latest revision as of 22:21, 30 June 2010
Mike's bio overview:
amino acids build proteins
20 amino acids
- protein has an amino acid sequence (three bases make up an amino acid)
- dna comprised of 4 bases: A, T, C, G
- rna comprised of 4 bases, A, U, C, G
- A goes with T
- C with G
every three bases is a codon, dave wrote a script that will take the codons and map them to their amino acids
- protease - are a type of proteins that cleave other proteins?
- reverse transcriptase - takes viral rna and transcribes it into dna
- sends mrna (bad) into the ribosomes
- they replicate very fast in your immune cells and thats how they kill them
99 amino acids in protease (297 dna bases)
reverse transcriptase is not predictable - each sequence is a different length
Possible Features:
- for the OR acids, perhaps create all possible combinations and weight them by 1/(number of combinations), normal rows weight = 1
- find most probable sequences (T)
- correlating permutations (T)
- molecular weight/length (E/Th)
- acidity/charge
- edit distance (differences between the sequences), use to cluster (A)
- list of known resistant mvt sites (M)
- find out which sites are most variable
Liebenstein?
for each site, and look at frequency of each amino acid
could put into a tree classifier