期刊文献+

中英文混合文章识别问题 被引量:18

Research on Chinese/English Mixed Document Recognition
下载PDF
导出
摘要 当前,已经有大量为单一字符集(或语种)而设计的OCR(optical character recognition)分类器.同时,随着全球一体化,多语文档的出现越来越普遍.因此,设计多语文档处理系统势在必行.提出了一般性的解决方案:两项OCR技术、一个系统和语言判断.为了使研究工作具体化,实现了一个中英文混合文章处理系统.其中主要涉及了3个关键问题:系统流程控制、汉英语言区域分离和英文字符切分.与以往的系统相比,该系统增加了汉英语言区域分离模块,并将基于等间距性的新方法应用于该模块.为了验证本系统的有效性,综合以往的方法实现了另一个系统.实验结果表明,该系统的性能明显优于另一个系统,在杂志样和书籍样上的识别率分别从98.48%和98.68%提高到99.13%和99.25%. Currently, OCR (optical character recognition) classifiers are generally designed for one character set (or language). On the other hand, multilingual document increases drastically due to the globalization. Therefore, designing a document processing system with multilingual capability is very important. A general scheme is presented in this paper: two OCR techniques, a system, and a language classification. For embodying the scheme, a Chinese/English mixed document processing system is implemented. Three key problems are considered: the control of the system flow, the classification of Chinese/English regions, and the segmentation of English characters. Compared with old systems presented in other papers, the module of the classification of Chinese/English regions is added in the system, and a novel approach based on the equidistance is applied to the module. To verify the effectiveness of the system, another system is implemented according to the methods presented in other papers. Experiment shows, the new system is more effective than the old system. The recognition rate increases from 98.48% to 99.13% on magazine samples and from 98.68% to 99.25% on book samples, respectively.
作者 王恺 王庆人
出处 《软件学报》 EI CSCD 北大核心 2005年第5期786-798,共13页 Journal of Software
基金 国家自然科学基金天元基金~~
关键词 系统设计 语言判别 字符切分 多语光学字符识别系统 文档图像处理 Algorithms Electronic document identification systems Feature extraction Flowcharting Image processing
  • 相关文献

参考文献13

  • 1朱小燕,史一凡.基于反馈的手写体字符识别方法的研究[J].计算机学报,2002,25(5):476-482. 被引量:18
  • 2.ExperVision公司研发OCR高科技产品,在国际同类产品20项评比中19项得第一,副标题王庆人在南开大学研发成功技术转移来美·大放异彩[N].美国:世界日报,1993-9-17(头版头条).
  • 3Rice SV, Kanai J, Nartker TA. An evaluation of OCR accuracy. Technical Report, Las Vegas: Information Science Research Institute, University of Nevada, 1993.9-33.
  • 4Rice SV, Kanai J, Nartker TA. The 3rd annual test of OCR accuracy. Technical Report, Las Vegas: Information Science Research Institute, University of Nevada, 1994. 11-38.
  • 5Kanai J, Liu YC, Rice SV, Nartker TA. A preliminary evaluation of Chinese OCR systems. Technical Report, Las Vegas:Information Science Research Institute, University of Nevada, 1994.41-47.
  • 6Guo H, Ding XQ, Zhang Z, Guo FX. Realization of a high-performance bilingual Chinese-English OCR system. In: Kavanaugh M,Storms P, eds. ICDAR'95: the 3rd Int'l Conf. on Document Analysis and Recognition. Los Alamitos: IEEE Computer Society Press,1995. 978-981.
  • 7Feng ZD, Huo Q. Confidence guided progressive search and fast match techniques for high performance Chinese/English OCR. In:Kasturi R, Laurendeau D, Suen C, eds. ICPR 2002: the 16th Int'l Conf. on Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2002. 89-92.
  • 8Huo Q, Feng ZD. Improving Chinese/English OCR performance by using MCE-based character-pair modeling and negative training. In: Antonacopoulos A, ed. ICDAR 2003: the 7th Int'l Conf. on Document Analysis and Recognition. Los Alamitos: IEEE Computer Society Press, 2003. 364-368.
  • 9靳简明 王庆人.多语言字符识别系统集成研究[J].软件学报,2002,13:225-230.
  • 10Pan WM, Jin JM, Shi GS, Wang QR. A system for automatic Chinese business card recognition. In: Antonacopoulos A, ed. ICDAR2003: the 7th Int'l Conf. on Document Analysis and Recognition. Los Alamitos: IEEE Computer Society Press, 2003.1138-1141.

二级参考文献10

  • 1郝红卫,戴汝为.人机结合的集成方法及其在字符识别中的应用[J].模式识别与人工智能,1996,9(1):10-20. 被引量:14
  • 2Garris M D, Wilson C L. Neural network-based systems for handprinted OCR applications. IEEE Trans Image Processing, 1998, 7(8):1097-1112
  • 3Zhu Xiao-Yan. Multiple neural networks model and its application in pattern recognition. In: Proc International Conference on Neural Information, Beijing, China, 1995. 996-969
  • 4Wang Song, Zhu Xiao-Yan. Multiple experts recognition system based on neural network. In:Proc International Conference on Pattern Recognition, Vienna, Austria, 1996. 545-548
  • 5Brows R M, Foy T H. Handprinted symbol recognition system. Pattern Recognition, 1988, 21(2):981-118
  • 6Huang Kai, Yan Hong. Off-line signature verification based on geometric feature extraction and neural network classification. Pattern Recognition, 1997, 30(1):9-17
  • 7Cheung A, Bennamoun M, Bergmann N M. Recognition-based arabic optical character recognition system. In: Proc IEEE International Conference on System, Man and Cybernetics, USA, 1998.4189-4194
  • 8Brows R M, Foy T H. Handprinted symbol recognition system. Pattern Recognition, 1988, 21(2):981-118
  • 9Kimura F, Shiridhar M. Handwritten numerical recognition based on multiple algorithms. Pattern Recognition, 1991, 24(10):969-983
  • 10陈友斌,丁晓青,吴佑寿.一种新的用于手写汉字识别的非线性规一化方法[J].模式识别与人工智能,1998,11(3):310-317. 被引量:14

共引文献18

同被引文献112

引证文献18

二级引证文献32

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部