摘要
针对书籍内缝边缘类型的材料在进行光学字符识别(OCR)时会产生曲面变形的问题,提出将曲面表面的文字平面化的算法.将文字切割为网格,通过网格的尺寸数据,拟合计算材料变形的表示函数,得到变形材料与展平材料之间的映射关系,将变形材料表面还原.提出一种适合快速计算和实现的不需要拟合函数和积分运算的快速算法,经过测试,验证了该算法的适用范围.结果表明,该算法比拟合函数的算法在效率上有很大提高,并且相对原变形材料,识别率大幅上升;该算法不需要复杂的采集设备,可对绝大多数已有的扫描数据进行校正.
In optical character recognition(OCR) works,the edge of book material can produce surface deformation,and these parts can be hard to recognize by software.An algorithm was proposed to solve the problem.The main purpose of the algorithm is to flatten the deformation text information.The mapping information between raw materials and the flattened materials was obtained through cutting the text into grid and fitting calculation of material deformation function of the curvature.Then the flattened information was regenerated.A new algorithm was proposed which can perform well but doesn't need too much calculation.The algorithm was tested and proved under different situations.The efficiency was greatly improved,and recognition rate was significantly increased compared with the raw materials.The algorithm does not require complicate device to gather data,and most existing scanned data can be used for correction process.
出处
《浙江大学学报(工学版)》
EI
CAS
CSCD
北大核心
2012年第1期130-135,共6页
Journal of Zhejiang University:Engineering Science
基金
中华人民共和国科技部科技型中小企业技术创新基金资助项目(07C26213301377)