Python Pandas filter rows based on the string value of an entry -
i have excel sheet (bloomberg data license output) read in with
import pandas pd raw_data = pd.read_excel('my-file.xlsx')
there 1 column (start-of-file
) , varying number rows, depending on amount of data returned.
i interested in data between 2 rows, start-of-data
, end-of-data
. rows in column
19 start-of-data 20 wts equity|0|6|wts|50545|54.440000|54.000000|5... 21 xom equity|0|6|xom|6555175|84.950000|85.300000... 22 sup equity|0|6|sup|27405|19.250000|19.200000|1... 23 end-of-data
with varying number of rows, (not 20 22). how can filter rows in column data between cells, ie raw_data['start-of-file']['start-of-data' : 'end-of-data']
. , use str.split('|')
seperate pipe delimited data seperate columns new dataframe?
>>> import pandas pd >>> df = pd.dataframe(['abcdef', 'start-of-data', 'g|h|i', 'j|k|l', 'm|n|o', 'end-of-data', 'pqrstu', columns=['a']]) >>> df 0 abcdef 1 start-of-data 2 g|h|i 3 j|k|l 4 m|n|o 5 end-of-data 6 pqrstu >>> start, end = df[df['a'].str.contains('(start|end)-of-data')].index.tolist() >>> pd.dataframe(df[start+1:end]['a'].str.split('|').tolist(), columns=['a', 'b', 'c']) b c 2 g h 3 j k l 4 m n o
Comments
Post a Comment