Friday 27 December 2013

Gunzip the *.gz files on hadoop HDFS

If you have some gzipped files (*.gz) on your HDFS and you don't want to bring them on local disk for unzipping you can do it as follows:

hadoop dfs -ls /data/7days/netflow/2013/11/15/*/* | grep -i gz | awk '{print "hadoop dfs -cat "$8"  | gunzip | hadoop dfs -put - "substr($8,0,length($8)-3)}'

7 comments:

  1. Pretty blog, so many ideas in a single site, thanks for the informative article, keep updating more article.

    ReplyDelete