期刊文献+

一种基于字词结合的汉字识别上下文处理新方法

A NOVEL METHOD BASED ON INTEGRATING CHARACTERS WITH WORDS FOR CONTEXTUAL PROCESSING OF CHINESE CHARACTER RECOGNITION
下载PDF
导出
摘要 根据字、词信息之间的互补性 ,提出一种字、词结合的上下文处理方法 .在单字识别的基础上 ,首先利用前向 -后向搜索算法在较大的候选集上进行基于字 bigram模型的上下文处理 ,在提高文本识别率的同时可提高候选集的效率 ;然后在较小的候选集上进行基于词 bigram模型的上下文处理 .该方法在兼顾处理速度的同时 ,可有效地提高文本识别率 .脱机手写体汉字文本 (约 6 .6万字 )识别中的实验表明 :经字 bigram模型处理 ,文本识别率由处理前的 81.5 8%提高至 94 .5 0 % ,文本前 10选累计正确率由 94 .33%提高到 98.2 5 % ;再经词 bigram模型处理 ,文本识别率进一步提高至 95 .75 % . According to the complementarity between Chinese characters and Chinese words, a novel contextual processing method is put forward, which integrates character based language model with word based language model. On the basis of isolated character recognition, character based bigram post processing using forward backward search is first executed on big candidate sets, which improves both the recognition rate of document (RRD) and the efficiency of candidate sets (the accumulated recognition rate of the top ten candidates is greatly boosted). Then, word based bigram post processing is executed on small candidate sets to further improve the RRD. This method effectively improves the RRD while giving attention to the processing speed in the meantime. Experimental results on off line handwritten Chinese documents (about 66000 characters) demonstrate the effectiveness of the novel method: character based bigram post processing improves the RRD to 94.50% from 81.58% RRD before post processing, and the accumulated recognition rate of the top ten candidates boosts from 94.33% to 98.25%. The 95 75% RRD is obtained after word based bigram post processing.
出处 《计算机研究与发展》 EI CSCD 北大核心 2002年第7期838-842,共5页 Journal of Computer Research and Development
基金 国家自然科学基金 (69972 0 2 4) 国家"八六三"高技术研究发展计划基金 (863 -3 0 6-ZT0 3 -0 3 -1)资助
关键词 汉字识别 语言模型 上下文处理 前向-后向搜索算法 候选集效率 Chinese character recognition, language model, contextual processing, forward backward search algorithm, efficiency of candidate set
  • 相关文献

参考文献7

  • 1[1]P-K Wong, C Chan. Postprocessing statistical language models for a handwritten Chinese character recognizer. IEEE Trans on System,Man and Cybernetics,1999,29(2):286~291
  • 2[2]C-H Chang. Simulated annealing clustering of Chinese words for contextual text recognition. Pattern Recognition Letters. 1996, 17(3): 30~36
  • 3[3]H-J Lee, C-H Tung. A language model based on semantically clustered words in a Chinese character recognition system. Pattern Recognition, 1997, 30(8): 1339~1346
  • 4[4]C-H Tung, H-J Lee. Increasing character recognition accuracy by detection and correction of erroneously identified characters. Pattern Recognition, 1994, 27(9): 1259~1266
  • 5[8]L R Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1989, 77(2): 257~286
  • 6[9]H Y Gu, C Y Tseng, L S Lee. Markov modeling of Mandarin Chinese for decoding the phonetics sequence into Chinese characters. Computer Speech and Language, 1991, 5(4): 363~377
  • 7[12]X Lin, X Ding, M Chen et al. Adaptive confidence transform based classifier combination for Chinese character recognition. Pattern Recognition Letters, 1998, 19(10): 975~988

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部