hadoop - is there any way to control inputsplit in map reduce -


i have lots of small(150-300 kb) text file 9000 per hour,i need process them through map reduce. created simple mr process file , create single output file. when run job job 1 hour data, took 45 min. started digging reason of poor performance, found takes many input-split number of file. guessing 1 reason poor performance.

is there way control input split can 1000 file entertained 1 input split/map.

hadoop designed huge files in small numbers , not other way. there ways around preprocessing data, using combinefileinputformat.


Comments

Popular posts from this blog

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -

SQL: Divide the sum of values in one table with the count of rows in another -