期刊文献+

文档识别中误切分字符拒识问题的研究 被引量:6

Research on the Missegmented Character Rejection in Document Recognition
下载PDF
导出
摘要 自动文档识别中字切分算法如果仅仅依靠大小位置等度量信息,很容易产生误切分图像块,需要字符分类器给出一定的反馈才能准确切分,为此提出了一个新的拒识算法,目标是尽可能准确地拒识非法字符。该文分析了基于距离的分类器的置信度和广义置信度,在此基础上改进了常用的广义置信度映射函数,并设计了一个基于样本学习的拒识规则,提高了拒识算法的适应性。在中日韩三种文档样本上的实验表明,该文算法明显改善了系统性能,对于较低质量的印刷文本识别具有一定的普遍意义。 In OCR systems the character segmentation algorithm may generate missegmented blocks,especially when us-ing only geometric measure information such as size and location.Feedback information from character classifier is nec-essary to achieve higher character segmentation accuracy.In this paper a novel rejection algorithm is proposed to reject these invalid characters more accurately.First,the confidence and generalized confidence of distance-based classifiers are analyzed,and then usual generalized confidence mapping function is modified.A new sample-based rejection rule is also proposed,which is more adaptive and flexible.Experiments on Chinese,Japanese and Korean document recognition show that new rejection algorithm evidently improved the system performance,especially for low-quality printed document recognition.
出处 《计算机工程与应用》 CSCD 北大核心 2002年第17期69-72,共4页 Computer Engineering and Applications
基金 国家863高技术研究发展计划(编号:2001AA114081) 国家自然科学基金(编号:69972024)
关键词 文档识别 误切分字符拒识问题 字符识别 置信度 拒识规则 OCR,Character Recognition,Confidence,Rejection Rule
  • 相关文献

参考文献8

  • 1[1]C K Chow. An optimum recognition error and rejection tradeoff[J].IEEE Trans Information Theory, 1970; IT- 16 (1) :41~46
  • 2[2]B Dubusson,M Masson.A statistical decision rule with incomplete knowledge about classes[J].Pattern Recognition, 1993 ;26(1): 155~165
  • 3[3]T M Ha. The optimum class-selective rejection rule[J].IEEE Trans Pattern Analysis and Machine Intelligence, 1997; 19(6) :608~615
  • 4[4]C L Liu ,M Nakagawa. Precise candidate selection for large character set recognition by confidence evaluation[J].2000;22(6):636~642
  • 5马少平,夏莹,朱小燕,姜哲.汉字识别系统的误识模型[J].清华大学学报(自然科学版),1998,38(S1):111-114. 被引量:5
  • 6[6]X Lin et al.Adaptive confidence transform based classifier combination for Chinese character recognition[J].Pattern Recognition Letters,1998; 19(10) :975~988
  • 7[7]R G Casey,E Lecolinnet. A survey of methods and strategies in character segmentation[J].IEEE Trans Pattern Analysis Machine Intelligence, 1996; 18(7) :690~706
  • 8林晓帆,丁晓青,吴佑寿.最近邻分类器置信度估计的理论分析[J].科学通报,1998,43(3):322-325. 被引量:10

二级参考文献1

  • 1Lin Xiaofan,Proceedings of ICDAR’97, Los Alamitos,1997年,471页

共引文献13

同被引文献34

  • 1NAGY G. Twenty years of document image analysis in PAMI [ J ]. IEEE Xrans on Pattern Analysis and Machine Intelligence, 2000,22( 1 ) :38-62.
  • 2LU Yi. Machine printed character segmentation:an overview[ J]. Pat- tom Recognition, 1995,28 ( 1 ) :67- 80.
  • 3NOMURA A, MICHISHITA K, UCHIDA S, et al. Detection and seg- mentation of touching characters in mathematical expressions [ C ]// Proc of the 7th Intemafional Conference on Document Analysis and Recognition. Washington DC: IEEE Computer Society, 2003: 126- 130.
  • 4LU Yi, HAIST B, HARMON L, et al. An accurate and efficient system for segmenting machine-printed text [ C ]//Proc of the 5th Advanced Technology Conference. Washington DC : IEEE Press, 1992:93-105.
  • 5WANG J, JEAN J. Segmentation of merged characters by neural net- works and shortest path [ J]. Pattern Recognition, 1994,27 ( 5 ) : 649 - 658.
  • 6TSUJIMOTO S,ASADA H. Resolving ambiguity in segmenting touch- ing characters [ C ]//Proc of the 1 st International Conference on Docu- ment Ananlysis and Recognition. 1991:701-709.
  • 7RICHARD G,ERIC L.A survey of methods and strategies in character segmentation [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1996,18(7):690-706.
  • 8YI Lu,SHRIDHAR M.Characters segmentation in handwritten words-an overview [J].IEEE Transaction Pattern Recognition,1996,29(1):77-96.
  • 9Zhi-Dan Feng,Qiang Huo.Confidence Guided Processive Search and Fast Match Techniques for High Performance Chinese/English OCR[C].In:ICDAR,2002:89~92
  • 10Seng Whan Lee,Jong-Soo Kim.Multi-lingual,multi-font and multisize large set character recognition using self-organizing neural network[C].In:ICDAR,1995:28~33

引证文献6

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部