bash - Awk sum rows in csv file based on value of three columns -
iam using awk process csv files:
awk 'begin {fs=ofs=";"} (nr==1) {$9="tpmc"; print $0} (nr>1 && nf) {a=$2$5; sum6[a]+=$6; sum7[a]+=$7; sum8[a]+=$8; other[a]=$0} end {for(i in sum7) {$0=other[i]; $6=sum6[i]; $7=sum7[i]; $8=sum8[i]; $9=(sum8[i]?sum8[i]/sum6[i]:"nan"); print}}' input.csv > output.csv
it doing sum of rows in columns 6,7,8 , division of sum8/sum6 rows same value in column 2 , 5.
i have 2 questions it
1) need same functionality calculations must done rows same value in columns 2,3 , 5. have tried replace
a=$2$5;
with
b=$2$3; a=$b$5;
but giving me wrong numbers.
2) how can delete rows value:
date;dbms;mode;test type;w;time;totaltpcc;neworder tpm
except first row?
here example of csv.input:
date;dbms;mode;test type;w;time;totaltpcc;neworder tpm tue jun 16 21:08:33 cest 2015;sqlite;in-memory;tpc-c test;1;10;83970;35975 tue jun 16 21:18:43 cest 2015;sqlite;in-memory;tpc-c test;1;10;83470;35790 date;dbms;mode;test type;w;time;totaltpcc;neworder tpm tue jun 16 23:35:35 cest 2015;hsql;in-memory;tpc-c test;1;10;337120;144526 tue jun 16 23:45:44 cest 2015;hsql;in-memory;tpc-c test;1;10;310230;133271 thu jun 18 00:10:45 cest 2015;derby;on-disk;tpc-c test;5;120;64720;27964 thu jun 18 02:41:27 cest 2015;sqlite;on-disk;tpc-c test;1;120;60030;25705 thu jun 18 04:42:14 cest 2015;hsql;on-disk;tpc-c test;1;120;360900;154828
output.csv should
date;dbms;mode;test type;w;time;totaltpcc;neworder tpm;tpmc tue jun 16 21:08:33 cest 2015;sqlite;in-memory;tpc-c test;1;20;167440;71765;3588.25 tue jun 16 23:35:35 cest 2015;hsql;in-memory;tpc-c test;1;20;647350;277797;13889.85 thu jun 18 00:10:45 cest 2015;derby;on-disk;tpc-c test;5;120;64720;27964;233.03 thu jun 18 02:41:27 cest 2015;sqlite;on-disk;tpc-c test;1;120;60030;25705;214.20 thu jun 18 04:42:14 cest 2015;hsql;on-disk;tpc-c test;1;120;360900;154828;1290.23
to group columns 2,3, , 5 use a=$2$3$5
. delete header rows, add match statement ($1 !~ /^date/)
so whole awk script becomes:
begin { fs=ofs=";" } (nr==1) {$9="tpmc"; print $0} (nr>1 && nf && ($1 !~ /^date/)) { a=$2$3$5; sum6[a]+=$6; sum7[a]+=$7; sum8[a]+=$8; other[a]=$0 } end { for(i in sum7) { $0=other[i]; $6=sum6[i]; $7=sum7[i]; $8=sum8[i]; $9=(sum8[i]?sum8[i]/sum6[i]:"nan"); print } }
Comments
Post a Comment