python - Pandas compare each row to reference row - certain columns only -
i have following pandas dataframe**** in python.
temp_fact oscillops_read b c d e f g h j 0 today 0.710213 0.222015 0.814710 0.597732 0.634099 0.338913 0.452534 0.698082 0.706486 0.433162 1 b today 0.653489 0.452543 0.618755 0.555629 0.490342 0.280299 0.026055 0.138876 0.053148 0.899734 2 aactl 0.129211 0.579690 0.641324 0.615772 0.927384 0.199651 0.652395 0.249467 0.262301 0.049795 3 dfe 0.743794 0.355085 0.637794 0.633634 0.810033 0.509244 0.470418 0.972145 0.647222 0.610636 4 c real_mt_olv 0.724282 0.332965 0.063078 0.004550 0.585398 0.869376 0.232148 0.630162 0.102206 0.232981 5 e q_mont 0.221685 0.224834 0.110734 0.397999 0.814153 0.552924 0.981098 0.536750 0.251941 0.383994 6 d dfe 0.655386 0.561297 0.305310 0.140998 0.433054 0.118187 0.479206 0.556546 0.556017 0.025070 7 f bryo 0.257884 0.228650 0.413149 0.285651 0.814095 0.275627 0.775620 0.392448 0.827725 0.935581 8 c aactl 0.017388 0.133848 0.939049 0.159416 0.923788 0.375638 0.331078 0.939089 0.098718 0.785569 9 c today 0.197419 0.595253 0.574718 0.373899 0.363200 0.289378 0.698455 0.252657 0.357485 0.020484 10 c pars 0.037771 0.683799 0.184114 0.545062 0.857000 0.295918 0.733196 0.613165 0.180642 0.254839 11 b pars 0.637346 0.090000 0.848710 0.596883 0.027026 0.792180 0.843743 0.461608 0.552165 0.215250 12 b pars 0.768422 0.017828 0.090141 0.108061 0.456734 0.803175 0.454479 0.501713 0.687016 0.625260 13 e tomorrow 0.860112 0.532859 0.091641 0.768896 0.635966 0.007211 0.656367 0.053136 0.482367 0.680557 14 d dfe 0.801734 0.365921 0.243407 0.826373 0.904416 0.062448 0.801726 0.049983 0.433135 0.351150 15 f q_mont 0.360710 0.330745 0.598830 0.582379 0.828019 0.467044 0.287276 0.470980 0.355386 0.404299 16 d last_week 0.867126 0.600093 0.813257 0.005423 0.617543 0.657219 0.635255 0.314910 0.016516 0.689257 17 e last_week 0.551499 0.724981 0.821087 0.175279 0.301397 0.304105 0.379553 0.971244 0.558719 0.154240 18 f bryo 0.511370 0.208831 0.260223 0.089106 0.121442 0.120513 0.099722 0.750769 0.860541 0.838855 19 e bryo 0.323441 0.663328 0.951847 0.782042 0.909736 0.512978 0.999549 0.225423 0.789240 0.155898 20 c tomorrow 0.267086 0.357918 0.562190 0.700404 0.961047 0.513091 0.779268 0.030190 0.460805 0.315814 21 b tomorrow 0.951356 0.570077 0.867533 0.365708 0.791373 0.232377 0.478656 0.003857 0.805882 0.989754 22 f today 0.963750 0.118826 0.264858 0.571066 0.761669 0.967419 0.565773 0.468971 0.466120 0.174815 23 b last_week 0.291186 0.126748 0.154725 0.527029 0.021485 0.224272 0.259218 0.052286 0.205569 0.617701 24 f aactl 0.269308 0.655920 0.595518 0.404817 0.290342 0.447246 0.627082 0.306856 0.868357 0.979879
i have series of values each column:
df_base = df[df['oscillops_read'] == 'last_week'] df_base_val = df_base.mean(axis=0)
as can see, pandas series , average, each column, on rows oscillops_read == 'last_week'
. here series:
[0.56993702256121603, 0.48394061768804786, 0.59635616273775061, 0.23591030688019868, 0.31347492150330231, 0.39519847430740507, 0.42467546792253791, 0.4461465888887961, 0.26026797943899194, 0.48706569569369912]
i have 2 lists:
1.
range_name_list = ['base','curnt','prediction','graph','swg','barometer_output','test_cntr']
this list gives values must added dataframe df
under conditions (described below).
2.
col_1 = list('dfa') col_2 = list('acef') col_3 = list('cef') col_4 = list('abdf') col_5 = list('def') col_6 = list('ac') col_7 = list('abcde')
these lists of column names. these columns df
must compared average series above. example, 6th list col_6, columns a
, c
each row of dataframe df
must compared columns a
, c
of series.
problem: mentioned above, need compare specific columns dataframe df
base series df_base_val
. columns compared listed in col_1, col_2, col_3, ..., col_7
. here need do:
- if row dataframe column names listed in
col_1
(eg. if row columnsa
,c
) greater base seriesdf_base_val
in 2 columns row, in new columnrange
, enter 6th value listrange_name_list
.
example: eg. use col_6
- 6th list , has column names a
, c
.
- step 1: row 1 of
df
, columnsa
,c
greaterdf_base_val[a]
,df_base_val[c]
respectively. - step 2: thus, row 1, in new column
range
, enter 6th element listrange_name_list
- 6th elementbarometer_output
.
example output: after doing this, 1st row becomes:
0 today 0.710213 0.222015 0.814710 0.597732 0.634099 0.338913 0.452534 0.698082 0.706486 0.433162 'barometer_output'
now, if row not greater series in columns a
, c
, not greater series in columns col_1
, col_2
, etc. range
column must assigned value 'not_in_range'. in case, row become:
0 today 0.710213 0.222015 0.814710 0.597732 0.634099 0.338913 0.452534 0.698082 0.706486 0.433162 'not_in_range'
simplification , question: in example:
i compared 1st row base series. need compare rows of
df
individually base series , add appropriate value.i used 6th list of columns -
col_6
. similarly, need go through each list of column names -col_1
,col_2
, ....,col_7
.if row being compared not greater of lists
col_1
col_7
, in specified columns, columnrange
must assigned value 'not_in_range'.
is there way this? maybe using loops?
**** create above dataframe, select above , copy. use following code:
import pandas pd df = pd.read_clipboard() print df
edit: if multiple conditions met, need listed. i.e. if row belongs 'swg' , 'curnt', need list both of these in range column, or create separate range columns, or python lists, each matching result. range1 list 'swg' , column range2 list 'curnt', etc.
for starters create dictionary condition sets keys can used indices range_name_list list:
conditions = {0: list('dfa'), 1: list('acef'), 2: list('cef'), 3: list('abdf'), 4: list('def'), 5: list('ac'), 6: list('abcde')}
the following code accomplish understand task:
# create range column filled in later. df['range'] = '|' index, row in df.iterrows(): ix, list in conditions.iteritems(): # create list of outcomes of checking whether # value each condition column greater # df_base_val average. truths = [row[column] > df_base_val[column] column in list] # see if checks evaluated true if sum(truths) == len(truths): # if so, set 'range' column's value current row # appropriate 'range_name' df.ix[index, 'range'] = df.ix[index, 'range'] + range_name_list[ix] + "|" # fill in rows no conditions met 'not_in_range' df['range'][df['range'] == '|'] = 'not_in_range'
Comments
Post a Comment