mapreduce - how to get the average number of words in a text in mrjob? -
im stuck simple problem in mrjob mareduce framework: want average number of words in given parragraph , got this:
class lineaverage(mrjob): def mapper(self, _, line): numwords = len(line.split()) yield "words", numwords yield "lines", 1 def reducer(self, key, values): yield key, sum(values)
with code, after reduce process, total of lines , words in text, dont know how average doing:
words/totaloflines
i newbie in model of programming, if can illustrate example it'll appreciated.
in meantime, thank attention , participation
after all, answer simple: sended reducer number of values equal number of lines. so, in reducer had count numer of values key.
class lineaverage(mrjob): def mapper(self, _, line): numwords = len(line.split()) yield "words", numwords def reducer(self, key, values): i,totall,totalw=0,0,0 in values: totall += 1 totalw += yield "avg", totalw/float(totall)
so mapper sends each line pair ("words", x), shuffle process result in tuple: ("words": x1, x2, x3,..xnumberoflines) whic input reducer, have count numbber of values key , thats it, got numer of lines.
hope helpfull someone.
Comments
Post a Comment