mapreduce - how to get the average number of words in a text in mrjob? -


im stuck simple problem in mrjob mareduce framework: want average number of words in given parragraph , got this:

class lineaverage(mrjob):  def mapper(self, _, line):     numwords = len(line.split())     yield "words", numwords     yield "lines", 1   def reducer(self, key, values):     yield key, sum(values) 

with code, after reduce process, total of lines , words in text, dont know how average doing:

words/totaloflines 

i newbie in model of programming, if can illustrate example it'll appreciated.

in meantime, thank attention , participation

after all, answer simple: sended reducer number of values equal number of lines. so, in reducer had count numer of values key.

class lineaverage(mrjob):  def mapper(self, _, line):     numwords = len(line.split())     yield "words", numwords   def reducer(self, key, values):     i,totall,totalw=0,0,0     in values:         totall += 1         totalw +=          yield "avg", totalw/float(totall) 

so mapper sends each line pair ("words", x), shuffle process result in tuple: ("words": x1, x2, x3,..xnumberoflines) whic input reducer, have count numbber of values key , thats it, got numer of lines.

hope helpfull someone.


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -