hadoop - Checking filesize and its distribution in HDFS -


is possible know filesize in blocks , distribution on datanodes in hadoop?

currently using:  frolo@a11:~/hadoop> $hadoop_home/bin/hadoop dfs -stat "%b %o %r %n" /user/frolo/input/rmat-* 318339 67108864 1 rmat-10.0 392835957 67108864 1 rmat-20.0 

which not show actual number of blocks created after uploading file hdfs. , dont know way how find out distribution.

thanks, alex

the %r in stat command shows replication factor of queried file. if 1, means there only single replica across cluster blocks belonging file. hadoop fs -ls output shows value listed files 1 of numeric columns, replication factor per file fs attribute.

if looking find blocks reside instead, looking hdfs fsck (or hadoop fsck if using dated release) instead. below, example, let see list of block ids , respective set of resident locations, file:

hdfs fsck /user/frolo/input/rmat-10.0 -files -blocks -locations


Comments

Popular posts from this blog

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -

SQL: Divide the sum of values in one table with the count of rows in another -