python - Pandas compare each row to reference row - certain columns only -


i have following pandas dataframe**** in python.

   temp_fact oscillops_read                 b         c         d         e         f         g         h                 j 0                   today  0.710213  0.222015  0.814710  0.597732  0.634099  0.338913  0.452534  0.698082  0.706486  0.433162 1          b          today  0.653489  0.452543  0.618755  0.555629  0.490342  0.280299  0.026055  0.138876  0.053148  0.899734 2                   aactl  0.129211  0.579690  0.641324  0.615772  0.927384  0.199651  0.652395  0.249467  0.262301  0.049795 3                     dfe  0.743794  0.355085  0.637794  0.633634  0.810033  0.509244  0.470418  0.972145  0.647222  0.610636 4          c    real_mt_olv  0.724282  0.332965  0.063078  0.004550  0.585398  0.869376  0.232148  0.630162  0.102206  0.232981 5          e         q_mont  0.221685  0.224834  0.110734  0.397999  0.814153  0.552924  0.981098  0.536750  0.251941  0.383994 6          d            dfe  0.655386  0.561297  0.305310  0.140998  0.433054  0.118187  0.479206  0.556546  0.556017  0.025070 7          f           bryo  0.257884  0.228650  0.413149  0.285651  0.814095  0.275627  0.775620  0.392448  0.827725  0.935581 8          c          aactl  0.017388  0.133848  0.939049  0.159416  0.923788  0.375638  0.331078  0.939089  0.098718  0.785569 9          c          today  0.197419  0.595253  0.574718  0.373899  0.363200  0.289378  0.698455  0.252657  0.357485  0.020484 10         c           pars  0.037771  0.683799  0.184114  0.545062  0.857000  0.295918  0.733196  0.613165  0.180642  0.254839 11         b           pars  0.637346  0.090000  0.848710  0.596883  0.027026  0.792180  0.843743  0.461608  0.552165  0.215250 12         b           pars  0.768422  0.017828  0.090141  0.108061  0.456734  0.803175  0.454479  0.501713  0.687016  0.625260 13         e       tomorrow  0.860112  0.532859  0.091641  0.768896  0.635966  0.007211  0.656367  0.053136  0.482367  0.680557 14         d            dfe  0.801734  0.365921  0.243407  0.826373  0.904416  0.062448  0.801726  0.049983  0.433135  0.351150 15         f         q_mont  0.360710  0.330745  0.598830  0.582379  0.828019  0.467044  0.287276  0.470980  0.355386  0.404299 16         d      last_week  0.867126  0.600093  0.813257  0.005423  0.617543  0.657219  0.635255  0.314910  0.016516  0.689257 17         e      last_week  0.551499  0.724981  0.821087  0.175279  0.301397  0.304105  0.379553  0.971244  0.558719  0.154240 18         f           bryo  0.511370  0.208831  0.260223  0.089106  0.121442  0.120513  0.099722  0.750769  0.860541  0.838855 19         e           bryo  0.323441  0.663328  0.951847  0.782042  0.909736  0.512978  0.999549  0.225423  0.789240  0.155898 20         c       tomorrow  0.267086  0.357918  0.562190  0.700404  0.961047  0.513091  0.779268  0.030190  0.460805  0.315814 21         b       tomorrow  0.951356  0.570077  0.867533  0.365708  0.791373  0.232377  0.478656  0.003857  0.805882  0.989754 22         f          today  0.963750  0.118826  0.264858  0.571066  0.761669  0.967419  0.565773  0.468971  0.466120  0.174815 23         b      last_week  0.291186  0.126748  0.154725  0.527029  0.021485  0.224272  0.259218  0.052286  0.205569  0.617701 24         f          aactl  0.269308  0.655920  0.595518  0.404817  0.290342  0.447246  0.627082  0.306856  0.868357  0.979879 

i have series of values each column:

df_base = df[df['oscillops_read'] == 'last_week'] df_base_val = df_base.mean(axis=0) 

as can see, pandas series , average, each column, on rows oscillops_read == 'last_week'. here series:

[0.56993702256121603, 0.48394061768804786, 0.59635616273775061, 0.23591030688019868, 0.31347492150330231, 0.39519847430740507, 0.42467546792253791, 0.4461465888887961, 0.26026797943899194, 0.48706569569369912] 

i have 2 lists:

1.

range_name_list = ['base','curnt','prediction','graph','swg','barometer_output','test_cntr'] 

this list gives values must added dataframe df under conditions (described below).

2.

col_1 = list('dfa') col_2 = list('acef') col_3 = list('cef') col_4 = list('abdf') col_5 = list('def') col_6 = list('ac') col_7 = list('abcde') 

these lists of column names. these columns df must compared average series above. example, 6th list col_6, columns a , c each row of dataframe df must compared columns a , c of series.

problem: mentioned above, need compare specific columns dataframe df base series df_base_val. columns compared listed in col_1, col_2, col_3, ..., col_7. here need do:

  • if row dataframe column names listed in col_1 (eg. if row columns a , c) greater base series df_base_val in 2 columns row, in new column range, enter 6th value list range_name_list.

example: eg. use col_6 - 6th list , has column names a , c.

  1. step 1: row 1 of df, columns a , c greater df_base_val[a] , df_base_val[c] respectively.
  2. step 2: thus, row 1, in new column range, enter 6th element list range_name_list - 6th element barometer_output.

example output: after doing this, 1st row becomes:

0                   today  0.710213  0.222015  0.814710  0.597732  0.634099  0.338913  0.452534  0.698082  0.706486  0.433162  'barometer_output' 

now, if row not greater series in columns a , c , not greater series in columns col_1, col_2, etc. range column must assigned value 'not_in_range'. in case, row become:

0                   today  0.710213  0.222015  0.814710  0.597732  0.634099  0.338913  0.452534  0.698082  0.706486  0.433162  'not_in_range' 

simplification , question: in example:

  1. i compared 1st row base series. need compare rows of df individually base series , add appropriate value.

  2. i used 6th list of columns - col_6. similarly, need go through each list of column names - col_1, col_2, ...., col_7.

  3. if row being compared not greater of lists col_1 col_7, in specified columns, column range must assigned value 'not_in_range'.

is there way this? maybe using loops?

**** create above dataframe, select above , copy. use following code:

import pandas pd df = pd.read_clipboard() print df 

edit: if multiple conditions met, need listed. i.e. if row belongs 'swg' , 'curnt', need list both of these in range column, or create separate range columns, or python lists, each matching result. range1 list 'swg' , column range2 list 'curnt', etc.

for starters create dictionary condition sets keys can used indices range_name_list list:

conditions = {0: list('dfa'),               1: list('acef'),               2: list('cef'),               3: list('abdf'),               4: list('def'),               5: list('ac'),               6: list('abcde')} 

the following code accomplish understand task:

# create range column filled in later. df['range'] = '|' index, row in df.iterrows():   ix, list in conditions.iteritems():     # create list of outcomes of checking whether     # value each condition column greater      # df_base_val average.     truths = [row[column] > df_base_val[column] column in list]     # see if checks evaluated true     if sum(truths) == len(truths):       # if so, set 'range' column's value current row       # appropriate 'range_name'       df.ix[index, 'range'] = df.ix[index, 'range'] + range_name_list[ix] + "|" # fill in rows no conditions met 'not_in_range' df['range'][df['range'] == '|'] = 'not_in_range' 

Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -