tesseract自动化训练批量训练字库
程序员文章站
2022-07-06 09:42:32
...
输入文件名:
src:是字库文件名,如nomal
font_name:字体名,如 num
desc:nomal.num.exp0
脚本使用时的目录结构:
training├── jTessBoxEditor |——jTessBoxEditor.jar
├── nomal.num.exp0.tif
├── training.bat
注意:应先用jTessBoxEditor merge 样本images 生成.tif文件,再使用脚本,使用期间会再次打开工具,便于调整字符位置和内容。应保证jTessBoxEditor 和 java环境正常。
@echo off
set "src=%1%"
set "font_name=%2%"
set "desc=%3%"
if not defined src set /p src=" please pass your filename : "
if not defined font_name set /p font_name=" please pass your font_name : "
if not defined desc set /p desc=" please pass your desc : "
rem 判断参数的合法性
if not defined src echo IllegalArgumentException arg1 must not be null & pause>nul & exit
if not defined font_name echo IllegalArgumentException arg2 must not be null & pause>nul & exit
if not defined desc set "desc=%src:~0,-4%"
echo desc %desc%
rem 如果目录下没有font_properties 文件创建 font_properties ,并写入文件
if exist font_properties (
echo font_properties exist
) else (
ECHO %font_name% 0 0 0 0 0 >"font_properties"
)
rem 删除原有文件
if exist %font_name%.unicharset ECHO DEL %font_name%.unicharset & DEL /Q names %font_name%.unicharset
if exist %font_name%.inttemp ECHO DEL %font_name%.inttemp & DEL /Q names %font_name%.inttemp
if exist %font_name%.pffmtable ECHO DEL %font_name%.pffmtable & DEL /Q names %font_name%.pffmtable
if exist %font_name%.shapetable ECHO DEL %font_name%.shapetable & DEL /Q names %font_name%.shapetable
if exist %font_name%.normproto ECHO DEL %font_name%.normproto & DEL /Q names %font_name%.normproto
if exist %font_name%.font_properties ECHO DEL %font_name%.font_properties & DEL /Q names %font_name%.font_properties
rem makebox
tesseract %desc%.tif %desc% -l eng --psm 6 batch.nochop makebox
ECHO Create files: %desc%.box
java -Xms128m -Xmx512m -jar jTessBoxEditor/jTessBoxEditor.jar
ECHO Please change your results , and press any key to continue
pause>nul
tesseract %src% %desc% -l eng --psm 6 nobatch box.train
unicharset_extractor %desc%.box
shapeclustering -F font_properties -U unicharset %desc%.tr
mftraining -F font_properties -U unicharset -O unicharset %desc%.tr
cntraining %desc%.tr
rem 配置新文件
if exist unicharset ECHO rename unicharset %font_name%.unicharset & rename unicharset %font_name%.unicharset
if exist inttemp ECHO rename inttemp %font_name%.inttemp & rename inttemp %font_name%.inttemp
if exist pffmtable ECHO rename pffmtable %font_name%.pffmtable & rename pffmtable %font_name%.pffmtable
if exist shapetable ECHO rename shapetable %font_name%.shapetable & rename shapetable %font_name%.shapetable
if exist normproto ECHO rename normproto %font_name%.normproto & rename normproto %font_name%.normproto
combine_tessdata %font_name%.
if exist font_properties ECHO rename font_properties %font_name%.font_properties & rename font_properties %font_name%.font_properties
ECHO press any key to continue
pause>nul
将代码命名为 XXX.bat,cmd cd 到训练目录执行。
可输出脚本运行信息到logs.txt中:
training>>logs.txt
后期尽量再脚本中完善
上一篇: 使用kibana操作ES