Talk:Machine Learning: Difference between revisions
Jump to navigation
Jump to search
Line 13: | Line 13: | ||
m = d['messageline'] | m = d['messageline'] | ||
yield m.split() | yield m.split() | ||
Plans to improve by using nltk[http://nltk.org] |
Revision as of 00:06, 28 February 2014
Feb. 27, 2014
Folks met and hacked on the noisebridge discuss mailing list. We created a 93MB text dump, and a python script to parse it, File:Py-piper-parser.txt. We wrote pseudo code to implement a Naive Bayesian filter to protect the world from trolls. Will implement soon.
Word parsing python script
Function 'get_words' takes list of dictionary of emails. Yields lists of words of in the message, for each message:
def get_words(lst): for d in lst: m = d['messageline'] yield m.split()
Plans to improve by using nltk[1]