Editing
Talk:Machine Learning
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== March 6, 2014 == <pre> Group compared notes on list-scraping code and then delved into details of algorithm via tracing simple example. user presented w message user labels spam/not spam or scale user presented with rating: 50% naive after user labels, algorithm readjusts very beginning first step: assign prior (50% of all messages are spam) also have a likelihood that each word is spam or not spam - combines to total 50% (also consider log-likelihood) next step: algorithm decides per message this is end of phase zero (PERFORMANCE PHASE) algorithm performing without human intervention next: TRAINING PHASE first step of training phase examine message, assign a rating (spamicity ie 7/7 or drama-tag or spam/not spam) then save rating associated with message ie write to vector (each index corresponds to a message) spamicity vector: msg0 msg1 msg2 msg3 1 0 0 3 then update dictionary what dictionary? temporary dictionary in the first step every dictionary item has a frequency count now ... after (say) 1000 messages, algorithm guesses mostly correctly 'spam or not-spam' (is this TESTING PHASE ?) training continues via occasional instances of (human) correction update dictionary with each word in (human?) rated message one possibly viable dictionary structure: {'word':[counted_in_spam, counted_in_not_spam]} so, algorithm might operate as per this trace: msg[0]: 'foo bar' SPAM msg[1]: 'foo foo bar foo bar' HAM msg[2]: 'bar bar bar foo' ... WHAT IS THIS ???? Can consult algorithm because now we have SPAM and HAM so can get bayes-informed result dictionary after msg[0] {'foo': [1, 0], 'bar': [1, 0]} after msg[1] {'foo': [1, 3], 'bar': [1, 2]} NOW WE ARE AT msg[2] WHAT IS THIS ??? THIS LOOKS EASIER TO SOLVE THIS TIME !!!!!!! WE HAVE A VECTOR OF SPAM/HAM IT LOOKS LIKE THIS: ['s', 'h'] OR IF YOU LIKE BINARY [True, False] OR [0, 1] OK I GET IT !!!! WE HAVE A VECTOR AND A DICTIONARY and a message NOW WHAT ??? {'foo': [1, 3], 'bar': [1, 2]} [0, 1] msg[2]: 'bar bar bar foo' ... WHAT IS THIS ???? A: probability of 'foo' | spam = 1 B: probability of spam = 0.5 C: probability of 'foo' | ham = 3 ... wtf ??? this gets normalized later ? maybe sam hopes this cancels out without being painful D: probability of ham = 0.5 = 0.25 likelihood given 'foo' (1 * 0.5) / ((1 * 0.5) + (3 * 0.5)) A * B / ((A * B) + (C * D)) = 0.25 likelihood given 'foo' 1(.5) 1 = prob of foo in message | spam .5 = prob of any word | spam .5 = prob of 'foo' in (first) word | spam .5 = A (normed) 3 = prob of foo in message(?) | not spam 1/5 = prob of word | ham .6 = prob of foo in (first) word | ham .6 = C (normed) likelihood given 'bar' (1 * 0.5) / ((1 * 0.5) + (2 * 0.5)) = 0.3333... (1/3) (1/3.0)**3 * (1/4.0) / (((1/3.0)**3 * (1/4.0)) + ((2/3.0)**3 * (3/4.0))) = 0.04 The way this is not fully bayesian... p(foo) & p(bar) are interacting... Also, are we normalizing correctly? If we normalized, we take into account the following: avg freq of words in spam avg freq of words in ham but this is not fully bayesian because ... so far ... we have been assuming independence between words (at the full-message level) </pre>
Summary:
Please note that all contributions to Noisebridge are considered to be released under the Creative Commons Attribution-NonCommercial-ShareAlike (see
Noisebridge:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Log in
Request account
Namespaces
Page
Discussion
English
Views
Read
Edit
Add topic
View history
More
Search
Dig in!
Noisebridge
- Status: MOVED
- Donate
- ABOUT
- Accessibility
- Vision
- Blog
Manual
MANUAL
Visitors
Participation
Community Standards
Channels
Operations
Events
EVENTS
Guilds
GUILDS
- Meta
- Electronics
- Fabrication
- Games
- Music
- Library
- Neuro
- Philosophy
- Funding
- Art
- Crypto
- Documentation/Wiki
Wiki
Recent Changes
Random Page
Help
Categories
(Edit)
Tools
What links here
Related changes
Special pages
Page information