hadoop - is there any way to control inputsplit in map reduce -
i have lots of small(150-300 kb) text file 9000 per hour,i need process them through map reduce. created simple mr process file , create single output file. when run job job 1 hour data, took 45 min. started digging reason of poor performance, found takes many input-split number of file. guessing 1 reason poor performance.
is there way control input split can 1000 file entertained 1 input split/map.
hadoop designed huge files in small numbers , not other way. there ways around preprocessing data, using combinefileinputformat.
Comments
Post a Comment