python - Unicode on Scrapy Json output -
i'm having problem on json output of scrapy. crawler works good, cli output works without problem. xml item exporter works without problem , output saved correct encoding, text not escaped.
- tried using pipelines , saving items directly there.
- using feed exporters , jsonencoder json library
these won't work data includes sub branches.
unicode text in json output file escaped this: "\u00d6\u011fretmen s\u00fcleyman yurtta\u015f cad."
but xml output file correctly written: "Öğretmen süleyman yurttaş cad."
even changed scrapy source code include ensure_ascii=false scrapyjsonencoder, no use.
so, there way enforce scrapyjsonencoder not escape while writing file.
edit1: btw, using python 2.7.6 scrapy not support python3.x
this standart scrapy crawler. spider file, settings file , items file. first page list crawled starting base url content scraped pages. data pulled page assigned variables defined in items.py of scrapy project, encoded in utf-8. there's no problem that, works on xml output.
scrapy crawl --nolog --output=output.json -t json spidername
xml output works without problem command:
scrapy crawl --nolog --output=output.xml -t xml spidername
i have tried editing scrapy/contrib/exporter/init.py , scrapy/utils/serialize.py insert ensure_ascii=false parameter json.jsonencoder.
edit2:
tried debugging again.there's no problem python2.7/json/encoder.py code. data intact , not escaped. after that, gets hard debug scrapy works async , there lots of callbacks.
edit3:
a bit of dirty hack, after editing python2.7.6/lib/json/encoder.py , changing ensure_ascii parameter false, problem seems solved.
as don't have code test, can try use codecs
try: import codecs f = codecs.open('yourfilename', 'your_mode', 'utf-8') f.write('whatever want write') f.close()
Comments
Post a Comment