python - Reading a CSV file into Pandas Dataframe with invalid characters (accents) -


i trying read csv file pandas dataframe. however, csv contains accents. using python 2.7

i've ran unicodedecodeerror because there accent in first column. i've read on bunch of sites this question utf-8 in csv files, this blog post on csv errors related newlines, , this blog post on utf-8 issues in python 2.7.

i used answers i've found there try modify code. had:

import pandas pd  #create dataframe data interested in df = pd.dataframe.from_csv('mydata.csv') mode = lambda ts: ts.value_counts(sort=true).index[0] cols = df['companyname'].value_counts().index df['calls'] = df.groupby('companyname')['companyname'].transform(pd.series.value_counts) 

excetera. worked, passing in "nÍ" , "nê" customer name giving error:

unicodedecodeerror: 'utf8' codec can't decode byte 0xea in position 7: invalid continuation byte 

i tried changing line df =pd.read_csv('mydata.csv',encoding ='utf-8') gives same error.

so tried suggestions found researching, not working either, , getting same error.

import pandas pd import csv  def unicode_csv_reader(utf8_data, dialect=csv.excel, **kwargs):     csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)     row in csv_reader:         yield [unicode(cell, 'utf-8') cell in row]   reader = unicode_csv_reader(open('mydata.csv','ru'), dialect = csv.reader) #create dataframe data interested in df =pd.dataframe(reader) 

i feel should not difficult read csv data pandas dataframe. know of easier way?

edit: strange if delete row accented characters still error

unicodedecodeerror: 'utf8' codec can't decode byte 0xd0 in position 960: invalid continuation byte.

this strange test csv has 19 rows , 27 columns. hope if decode utf8 entire csv fix problem.

try adding top of script:

import sys   reload(sys)  sys.setdefaultencoding('utf8') 

Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -