Machine Learning Meetup Notes: 2010-05-19

From Noisebridge
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
  • Erin provided a list of unique SubSkills and TracedSkills with frequencies, as well as a python script to normalize the skill values in the challenge sets.
  • Vikram gave a presentation and demo on Hadoop, EC2 and MapReduce. He created a bunch of scripts for EC2 MapReduce. Those tools can be found on github.

Here are some map reduce notes:

Word Counts (let line number be the key):

1 hello how are you

2 how is it going

3 are you happy

def map(key, value):

	words = value.split()

	#["hello", "how", "are", "you"]

	for word in words

		emit(word, 1)
		

def reduce(key, values):

	emit(key, len(values))	

results:

hello [1]

how [1,1]

are [1,1]