hadoop - map/reduce functionality -
i started looking hadoop , made wordcount example work on cluster(two datanodes) after going through struggles.
but have question map/reduce functionality. read during map, input files/data transformed form of data can efficiently processed during reduce step.
let's have 4 input files(input1.txt, input2.txt, input3.txt, input4.txt) , want read input files , transform form of data reduce.
so here question. if run application (wordcount) on cluster environment (two datanodes), these 4 input files read on each datanode or 2 input files read on each datanode? , how can check file read on datanode?
or map(on each datanode) read files kind of block instead of reading individual file?
see hadoop works on basis of blocks rather files. if 4 files less 128mb(or 64mb depending on block size) read 1 mapper. chunk read mapper known inputsplit. hope answers question.
Comments
Post a Comment