python - Reading a CSV file into Pandas Dataframe with invalid characters (accents) -

July 15, 2015

i trying read csv file pandas dataframe. however, csv contains accents. using python 2.7

i've ran unicodedecodeerror because there accent in first column. i've read on bunch of sites this question utf-8 in csv files, this blog post on csv errors related newlines, , this blog post on utf-8 issues in python 2.7.

i used answers i've found there try modify code. had:

import pandas pd  #create dataframe data interested in df = pd.dataframe.from_csv('mydata.csv') mode = lambda ts: ts.value_counts(sort=true).index[0] cols = df['companyname'].value_counts().index df['calls'] = df.groupby('companyname')['companyname'].transform(pd.series.value_counts)

excetera. worked, passing in "nÍ" , "nê" customer name giving error:

unicodedecodeerror: 'utf8' codec can't decode byte 0xea in position 7: invalid continuation byte

i tried changing line df =pd.read_csv('mydata.csv',encoding ='utf-8') gives same error.

so tried suggestions found researching, not working either, , getting same error.

import pandas pd import csv  def unicode_csv_reader(utf8_data, dialect=csv.excel, **kwargs):     csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)     row in csv_reader:         yield [unicode(cell, 'utf-8') cell in row]   reader = unicode_csv_reader(open('mydata.csv','ru'), dialect = csv.reader) #create dataframe data interested in df =pd.dataframe(reader)

i feel should not difficult read csv data pandas dataframe. know of easier way?

edit: strange if delete row accented characters still error

unicodedecodeerror: 'utf8' codec can't decode byte 0xd0 in position 960: invalid continuation byte.

this strange test csv has 19 rows , 27 columns. hope if decode utf8 entire csv fix problem.

try adding top of script:

import sys   reload(sys)  sys.setdefaultencoding('utf8')

Search This Blog

Macro

python - Reading a CSV file into Pandas Dataframe with invalid characters (accents) -

Comments

Post a Comment

Popular posts from this blog

symfony - TEST environment only: The database schema is not in sync with the current mapping file -

twig - Using Twigbridge in a Laravel 5.1 Package -

jdbc - Not able to establish database connection in eclipse -