Stuff I do: awk

Showing posts with label awk. Show all posts

Thursday, 31 March 2016

Finding the exact binary library containing the symbols in a directory of binaries

Sometimes while compiling a program you run into an issue of missing symbols; e.g.

Undefined symbols for architecture x86_64:

"cv::MSER::create(int, int, int, double, double, int, double, double, int)", referenced from:

getBlackAndWhiteImage(cv::Mat, int, double, double, double, std::__1::vector<std::__1::vector<cv::Point_<int>, std::__1::allocator<cv::Point_<int> > >, std::__1::allocator<std::__1::vector<cv::Point_<int>, std::__1::allocator<cv::Point_<int> > > > >&, std::__1::vector<cv::Rect_<int>, std::__1::allocator<cv::Rect_<int> > >&) in showimg-53afc0.o

ld: symbol(s) not found for architecture x86_64

Now you need to find the exact binary file in lets say a directory, this command can help you with it:

find /usr/local/lib/libopencv_* | awk '{ print "echo "$1"; nm "$1" | grep -i MSER" }' | sh

Friday, 27 December 2013

Gunzip the *.gz files on hadoop HDFS

If you have some gzipped files (*.gz) on your HDFS and you don't want to bring them on local disk for unzipping you can do it as follows:

hadoop dfs -ls /data/7days/netflow/2013/11/15/*/* | grep -i gz | awk '{print "hadoop dfs -cat "$8" | gunzip | hadoop dfs -put - "substr($8,0,length($8)-3)}'

Thursday, 21 March 2013

Print to dynamic files from AWK cmd

To print the output of AWK to a dynamic file:

head 236_236_raw | awk -F';' '{ print $7,"\t",$6 > ("_"$2"_"$3".txt")}'

If the number of files open are large, you will get the following error:

awk: * makes too many open files

input record number 203, file

source line number 1

To overcome the above problem, you can close the files like this:

head 236_236_raw | awk -F';' '{ print $7,"\t",$6 >> ("_"$2"_"$3".txt"); close("_"$2"_"$3".txt")}'

The important point to note is that now we changed the redirection operator to ">>" from ">". Makes sense ? :)

Monday, 4 February 2013

Using multi-dimensional associative arrays in AWK

Examples:

cat /tmp/33 | awk -F'[\\^\\[\\]]' '{c[$5,$6]+=1}END{for( i in c) {split(i,sep,SUBSEP); print sep[1],sep[2], c[i] ;}}'

Reading:

http://www.chemie.fu-berlin.de/chemnet/use/info/gawk/gawk_12.html

Tuesday, 18 December 2012

Using awk to find the sum of a CSV file after group by on a particular column

http://www.theunixschool.com/2012/06/awk-10-examples-to-group-data-in-csv-or.html

I have a CSV file like :

65523 , 100
65522 , 900
65522 , 1800
65522 , 100
65522 , 100
65521 , 500
65521 , 200
65521 , 200

I need to find the sum of the 2nd column based on the grouping by the 1st column, so that the output looks something like:

65523 , 100
65522 , 2900
65521 , 900

SOLUTION:

This can be easily achieved using a single line awk script:

awk -F"," '{a[$1]+=$2;}END{for (i in a)print i, a[i];}' file

Awesome isn't it !! :)

Thursday, 10 May 2012

Awk - with loops and regex

Awk built-in variables:

FNR         The input record number in the current input file.
NF          The number of fields in the current input record.
NR          The total number of input records seen so far.

// an awk command to match the regex "UST" in the line
cat matrace_20120430234915.dat | awk -F'[,]' '{print FNR; i=1;for(i=1;i<(NF);i++) { print i,$i, /UST/;} }'

// an awk command which finds the column matching the regular expression of date in format "yyyy-mm-dd" and splits the date and adds 1 to the date and then prints the same.
awk -F'[,]' '{print FNR; i=1;for(i=1;i<(NF);i++) { print i,$i; if(match($i,/(....)\-(..)\-(..)/)) { split($i,a,"-"); printf("%d-%d-%d",a[1],a[2],(a[3]+1));};} }'

// awk using sprintf
awk -F'[,]' '{x=sprintf("%s",$1); i=1; for(i=2;i<(NF);i++) { if(match($i,/(....)\-(..)\-(..)/)) { split($i,a,"-"); x=sprintf("%s,%s-%s-%d",x,a[1],a[2],(a[3]+1));} else { x=sprintf("%s,%s",x,$i); } } x=sprintf("%s,%s",x,$(NF)); print x;}'