Machine Learning Meetup Notes: 2010-05-19

From Noisebridge
Revision as of 22:04, 23 May 2010 by SpammerHellDontDelete (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
  • Erin provided a list of unique SubSkills and TracedSkills with frequencies, as well as a python script to normalize the skill values in the challenge sets.
  • Vikram gave a presentation and demo on Hadoop, EC2 and MapReduce. He created a bunch of scripts for EC2 MapReduce. Those tools can be found on github.

Here are some map reduce notes:

Word Counts (let line number be the key):

1 hello how are you

2 how is it going

3 are you happy

def map(key, value):

	words = value.split()

	#["hello", "how", "are", "you"]

	for word in words

		emit(word, 1)
		

def reduce(key, values):

	emit(key, len(values))	

results:

hello [1]

how [1,1]

are [1,1]