摘要
为了改善汉字文本的识别率,本文提出了一种基于语料库统计概率的汉字识别文本自动后处理方法.对该方法利用的上下文相关的信息.数据量很大的字字同现概率统计方法和统计结果作了介绍,把具有确定性边界的一个汉字序列(多数情况为一个句子)作为一个处理单元,利用统计获得的字字同现概率,采用动态规划方法,对汉字识别文本进行自动后处理,获得了今人满意的效果.
In order to improve Chinese text recognition rate, a method of automatic post-processing based statistical probabilities for Chinese recognition text is proposed. The method has used contextual information more than the lexical lever knowledge. The statistic approach and the results of the co-occurrence probabilities between characters have be introduced. A bounded sequence of Chinese characters (more often, a sentence) is processed as an unit. And the co-occurrence probabilities between characters and dynamic progamming strategy are employed. For Chinese text, a post-processing is automaticlly processed. The satisfacatory Chinese text recognition results are acquired.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
1996年第2期172-178,共7页
Pattern Recognition and Artificial Intelligence
关键词
汉字识别
MARKOV模型
文本处理
Chinese Characters Recognition, Corpus Linguistics, Markov Model, Post-Processing.