python - Unicode on Scrapy Json output -


i'm having problem on json output of scrapy. crawler works good, cli output works without problem. xml item exporter works without problem , output saved correct encoding, text not escaped.

  • tried using pipelines , saving items directly there.
  • using feed exporters , jsonencoder json library

these won't work data includes sub branches.

unicode text in json output file escaped this: "\u00d6\u011fretmen s\u00fcleyman yurtta\u015f cad."

but xml output file correctly written: "Öğretmen süleyman yurttaş cad."

even changed scrapy source code include ensure_ascii=false scrapyjsonencoder, no use.

so, there way enforce scrapyjsonencoder not escape while writing file.

edit1: btw, using python 2.7.6 scrapy not support python3.x

this standart scrapy crawler. spider file, settings file , items file. first page list crawled starting base url content scraped pages. data pulled page assigned variables defined in items.py of scrapy project, encoded in utf-8. there's no problem that, works on xml output.

scrapy crawl --nolog --output=output.json -t json spidername

xml output works without problem command:

scrapy crawl --nolog --output=output.xml -t xml spidername

i have tried editing scrapy/contrib/exporter/init.py , scrapy/utils/serialize.py insert ensure_ascii=false parameter json.jsonencoder.

edit2:

tried debugging again.there's no problem python2.7/json/encoder.py code. data intact , not escaped. after that, gets hard debug scrapy works async , there lots of callbacks.

edit3:

a bit of dirty hack, after editing python2.7.6/lib/json/encoder.py , changing ensure_ascii parameter false, problem seems solved.

as don't have code test, can try use codecs try: import codecs f = codecs.open('yourfilename', 'your_mode', 'utf-8') f.write('whatever want write') f.close()


Comments

Popular posts from this blog

twig - Using Twigbridge in a Laravel 5.1 Package -

jdbc - Not able to establish database connection in eclipse -

Kivy: Swiping (Carousel & ScreenManager) -