ocr - Tesseract Assert failed trainingsampleset.cpp line 622 with mftraining -
when mftraining executed on training files, following error message:
ps > mftraining -f font_properties -u unicharset -o lang.unicharset .\eng.ds-digita l.exp0.box.tr .\eng.ds-digitalb.exp0.box.tr .\eng.ds-digitali.exp0.box.tr warning: no shape table file present: shapetable reading .\eng.ds-digital.exp0.box.tr ... reading .\eng.ds-digitalb.exp0.box.tr ... reading .\eng.ds-digitali.exp0.box.tr ... font id = -1/0, class id = 1/12 on sample 0 font_id >= 0 && font_id < font_id_map_.sparsesize():error:assert failed:in file ..\..\classify\trainingsampleset.cpp, li ne 622 a dialog windows appears stating "feature training tesseract has stopped working". there several posts around net adressing issue, none of them (that have tried far) seems have solutions make data-set go through.
the folder mftraining command executed @ contains following files:
eng.ds-digital.exp0.box eng.ds-digital.exp0.box.tr eng.ds-digital.exp0.box.txt eng.ds-digital.exp0.tif eng.ds-digitalb.exp0.box eng.ds-digitalb.exp0.box.tr eng.ds-digitalb.exp0.box.txt eng.ds-digitalb.exp0.tif eng.ds-digitali.exp0.box eng.ds-digitali.exp0.box.tr eng.ds-digitali.exp0.box.txt eng.ds-digitali.exp0.tif font_properties unicharset and font_properties has following content (it ends newline documentation states):
ds-digital 0 0 0 0 0 ds-digitalb 0 1 0 0 0 ds-digitali 1 0 0 0 0 i've tried different naming conventions on font-name on font_properties (althought documentation quite clear font name of file , not file name, people around net seems claim otherwise), , renaming files .tr-files follows pattern eng.ds-digital*.exp0.tr without anvil.
edit: running on tesseract 3.02
i getting same issue , resolved checking font name in eng.ds-digital.exp0.box.tr should same given in font_properties file.
example: echo "ds-digital 0 0 0 0 0" > font_properties
then eng.ds-digital.exp0.box.tr should have ds-digital font name.
another easy way train tesseract link.
Comments
Post a Comment