期刊文献+

维汉英混排文档识别 被引量:3

Uyghur, Chinese and English Multilingual Document Recognition
下载PDF
导出
摘要 维、汉、英是特点完全不同的文字。该文依据多层次语言判断和适当干预的多语言字符识别系统设计原则首次实现了维、汉、英混排文本识别系统。识别系统首先根据维、汉、英文字的各自特点实现字符块语言属性的初步判断,然后针对每种文字设计不同的字符切割算法。字符识别可信度用来判断字符语言属性和字符切分结果是否正确。实验结果表明,各种维、汉、英混排文本识别率达到96.4%以上。 The characteristics of Uyghur, Chinese and English scripts are totally different. A Uyghur, Chinese and English multilingual document recognition system is implemented the first time based on the multilingual OCR system design principle, which includes "multi-layer character language estimation" and "suitable adjustment". At first, the language property of each text block is estimated according to the characteristics of Uyghur, Chinese and English scripts. After that, language-oriented character segmentation algorithms are performed on text blocks, and the character recognition confidence is used to judge whether the results of character segmentation and language property estimation of a text block are right. Experimental results show the recognition accuracy of Uyghur, Chinese and English multilingual documents achieves 96.4% and above.
出处 《电子与信息学报》 EI CSCD 北大核心 2006年第7期1188-1191,共4页 Journal of Electronics & Information Technology
基金 国家自然科学基金(60241005) 中国博士后科学基金(2004035331)资助课题
关键词 混排文本识别 字符切割 字符识别 维吾尔文 Multilingual document recognition, Character segmentation, Character recognition, Uyghur script
  • 相关文献

参考文献10

  • 1刘长松,郭繁夏,丁晓青,郭宏.印刷汉字识别方法综述.中国计算机报,1997(663):141—145.
  • 2Rice S V, Jenkins F R, Nartker T A. The Fifth annual test of OCR accuracy. Technical Report, Information Science Research Institute, University of Nevada, Las Vegas, 1996.
  • 3Kanungo T, Marton G A, Bulbul O. Omnipage vs. Sakhr: Paired model evaluation of two Arabic OCR products. SPIE Conference on Document Recognition and Retrieval VI, San Jose, CA, USA,January 27-28, 1999, 3651: 109- 120.
  • 4靳简明 王庆人.多语言字符识别系统集成研究[J].软件学报,2002,13:225-230.
  • 5Romeo-Pakker K, Miled H, Lecourtier Y. A new approach for Latin/Arabic character segmentation. The 3rd International Conference on Document Analysis and Recognition. Montreal,Canada, 1995:874 - 877.
  • 6Lee Seong-Whan, Kim Jong-Soo. Multi-lingual, multi-font and multi-size large-set character recognition using self-organizing neural network. The 3rd International Conference on Document Analysis and Recognition, Montreal, Canada, 1995:28 - 33.
  • 7Chi Su-Young, Moon Kyung-Ae, Oh Weon-Geun. Recognition of large-set multilingual characters by optimal feature class reduction.The 17th International Conference on Computer Processing of Oriental Languages, Hong Kong, 1997:349 - 352.
  • 8Guo H, Ding X. Realization of a high-performance bilingual Chinese-English OCR system. The 3rd International Conference on Document Analysis and Recognition, Montreal, Canada, 1995:978 - 981.
  • 9靳简明.汉英双语OCR系统集成原则及实现[J].工程图学学报,2001,22:26-26.
  • 10靳简明,丁晓青,彭良瑞,王华.印刷维吾尔文本切割[J].中文信息学报,2005,19(5):76-83. 被引量:17

二级参考文献15

  • 1Adnan Amin, and Jean F. Mari. Machine recognition and correction of printed Arabic text [J]. IEEE Transactions on Systems, Man and Cybernetics, 1989, 19(5):1300- 1306.
  • 2Katerin Romeo-Pakker, H. Miled, and Yves Lecourtier. A new approach for Latin / Arabic character segmentation [A]. Proceedings of the 3rd International Conference on Document Analysis and Recognition [C]. Montréal, Cana da, 1995, 874- 877.
  • 3H. Al-Muallim, and S. Yamaguchi. A method of recognition of Arabic cursive handwriting [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1987, 9(5):715- 722.
  • 4Anthony Cheung, Mohammed Bennamoun, and Neil W. Bergmann. An Arabic optical character recognition system using recognition-based segmentation [J]. Pattern Recognition, 2001, 34(2):215- 233.
  • 5Issam Bazzi, Richard Schwartz, and John Makhoul. An omnifont open-vocabulary OCR system for English and Arabic [J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(6):495- 504.
  • 6Anniwear Ymin, and Yoshinao Aoki. On the segmentation of multi-font printed Uygur scripts [A]. Proceedings of the 13th International Conference on Pattern Recognition [C]. Vienna, Austria, 1996, 215- 219.
  • 7A. Zahour, B. Taconet, P. Mercy, and S. Ramdane. Arabic hand-written text-line extraction [A]. Proceedings of the 6th International Conference on Document Analysis and Recognition [C]. Seattle, USA, 2001, 281 - 285.
  • 8Gasser A. Auda, and Hazem Raafat. An automatic text reader using neural networks [A]. Proceedings of the Canadian Conference on Electrical and Computer Engineering [C]. Vancouver, BC Canada, 1993, 92- 95.
  • 9Ibrahim S. I. Abuhaiba, M. J. J. Holt, and S. Datta. Recognition of off-line cursive handwriting [J]. Computer Vision and Image Understanding, 1998, 71(1) :19- 38.
  • 10M.F. Bushofa, and M. Spann. Segmentation and recognition of Arabic characters by structural classification [J].Image and Vision Computing, 1997, 15(3): 167 - 179.

共引文献18

同被引文献37

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部