python - ValueError when converting string to integer in Dataframe -
i trying replace strings in years
column of dataframe below numbers in string. example, change zc025yr
025
. code follows:
import urllib, urllib2 import csv stringio import stringio import pandas pd import os zipfile import zipfile pprint import pprint, pformat my_url = 'http://www.bankofcanada.ca/stats/results/csv' data = urllib.urlencode({"lookuppage": "lookup_yield_curve.php", "startrange": "1986-01-01", "searchrange": "all"}) request = urllib2.request(my_url, data) result = urllib2.urlopen(request) zipdata = result.read() zipfile = zipfile(stringio(zipdata)) df = pd.read_csv(zipfile.open(zipfile.namelist()[0])) df = pd.melt(df, id_vars=['date']) df.rename(columns={'variable': 'years'}, inplace=true)
the dataframe have looks this:
date years value 0 1986-01-01 zc025yr na 1 1986-01-02 zc025yr 0.0948511020 2 1986-01-03 zc025yr 0.0972953210 3 1986-01-06 zc025yr 0.0965403640 .....
however, if add code below in order restructure dataframe error valueerror: cannot convert float nan integer
in line df['years'] = df['years'].str.extract('(\d+)').astype(int)
strange because when @ year
's data in csv file don't see there being 'nan' associated it.
#converting strings in column number of years df['years'] = df['years'].str.extract('(\d+)').astype(int) df['years'] = df.years/100
thank you
try creating new function convert strings integer
, call in series.apply
method follows -
edit: adding logic default empty strings 0
, use different value if want handle empty strings in years
colomn differently
import re def getyear(s): x = re.search('(\d+)',s) return int(x.groups()[0]) if x not none else 0 # or want handle
then use function -
df['years'] = df['years'].apply(getyear)
Comments
Post a Comment