Friday 27 December 2013

Gunzip the *.gz files on hadoop HDFS

If you have some gzipped files (*.gz) on your HDFS and you don't want to bring them on local disk for unzipping you can do it as follows:

hadoop dfs -ls /data/7days/netflow/2013/11/15/*/* | grep -i gz | awk '{print "hadoop dfs -cat "$8"  | gunzip | hadoop dfs -put - "substr($8,0,length($8)-3)}'