摘要
为了改善汉字文本识别率,本文提出了一种基于语料库统计概率的后处理方法,该方法利用上下文相关信息,超过词汇。对于汉字文本识别,把具有确定性边界的一个汉字序列(多数情况为一个句子)作为一个处理单元,利用统计获得的字字同现概率,采用动态规划方法,获得了令人满意的效果。
In order to improve Chinese text recognition rate, in this paper we present a post processing method of corpus-based statistical probabilities. The method has used contextual information more than the lexical lever knowledge. For Chinese text recognition, a bounded seguence of Chinese characters (more often, a sentence) is processed as an unit. And the cooccurrence probability between characters and dynamic progamming strategy are employed to acquire the satisficatory recognition results.
出处
《中文信息学报》
CSCD
1996年第1期23-30,共8页
Journal of Chinese Information Processing
关键词
汉字识别
语料库语言学
汉字文本识别
Chinese Characters Recognition, Corpus Linguistics, Markov Model, Post processing