bash - Awk sum rows in csv file based on value of three columns -


iam using awk process csv files:

awk 'begin {fs=ofs=";"} (nr==1) {$9="tpmc"; print $0} (nr>1 && nf) {a=$2$5; sum6[a]+=$6; sum7[a]+=$7; sum8[a]+=$8; other[a]=$0} end {for(i in  sum7) {$0=other[i]; $6=sum6[i]; $7=sum7[i]; $8=sum8[i];  $9=(sum8[i]?sum8[i]/sum6[i]:"nan"); print}}' input.csv > output.csv  

it doing sum of rows in columns 6,7,8 , division of sum8/sum6 rows same value in column 2 , 5.

i have 2 questions it
1) need same functionality calculations must done rows same value in columns 2,3 , 5. have tried replace

a=$2$5; 

with

b=$2$3; a=$b$5; 

but giving me wrong numbers.

2) how can delete rows value:

date;dbms;mode;test type;w;time;totaltpcc;neworder tpm 

except first row?

here example of csv.input:

date;dbms;mode;test type;w;time;totaltpcc;neworder tpm tue jun 16 21:08:33 cest 2015;sqlite;in-memory;tpc-c test;1;10;83970;35975 tue jun 16 21:18:43 cest 2015;sqlite;in-memory;tpc-c test;1;10;83470;35790 date;dbms;mode;test type;w;time;totaltpcc;neworder tpm tue jun 16 23:35:35 cest 2015;hsql;in-memory;tpc-c test;1;10;337120;144526 tue jun 16 23:45:44 cest 2015;hsql;in-memory;tpc-c test;1;10;310230;133271 thu jun 18 00:10:45 cest 2015;derby;on-disk;tpc-c test;5;120;64720;27964 thu jun 18 02:41:27 cest 2015;sqlite;on-disk;tpc-c test;1;120;60030;25705 thu jun 18 04:42:14 cest 2015;hsql;on-disk;tpc-c test;1;120;360900;154828    

output.csv should

date;dbms;mode;test type;w;time;totaltpcc;neworder tpm;tpmc tue jun 16 21:08:33 cest 2015;sqlite;in-memory;tpc-c test;1;20;167440;71765;3588.25 tue jun 16 23:35:35 cest 2015;hsql;in-memory;tpc-c test;1;20;647350;277797;13889.85 thu jun 18 00:10:45 cest 2015;derby;on-disk;tpc-c test;5;120;64720;27964;233.03 thu jun 18 02:41:27 cest 2015;sqlite;on-disk;tpc-c test;1;120;60030;25705;214.20 thu jun 18 04:42:14 cest 2015;hsql;on-disk;tpc-c test;1;120;360900;154828;1290.23 

to group columns 2,3, , 5 use a=$2$3$5. delete header rows, add match statement ($1 !~ /^date/)

so whole awk script becomes:

begin {   fs=ofs=";" } (nr==1) {$9="tpmc"; print $0} (nr>1 && nf && ($1 !~ /^date/)) {   a=$2$3$5; sum6[a]+=$6; sum7[a]+=$7; sum8[a]+=$8; other[a]=$0 } end {   for(i in sum7) {     $0=other[i]; $6=sum6[i]; $7=sum7[i]; $8=sum8[i]; $9=(sum8[i]?sum8[i]/sum6[i]:"nan"); print   } } 

Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -