摘要
印刷体汉字识别是中文信息处理技术的主要问题,由于各种干扰因素的存在,在对汉字进行识别之前,必须进行预处理,预处理方法主要包括二值化处理、平滑处理、行、字切分及规格化处理。在关键步骤二值化处理中,首先做出汉字图象的灰度直方图,然后采用近邻加权平均的方法对该直方图进行迭代平滑处理,直至最终获取2个全局峰值,取2个峰值间的最低点即可作为所选阈值。该方法用于作者研制的ANS印刷体汉字识别方法中,取得了满意效果。
Printed Chinese character recognition is an important subject of Chinese information processing. Because there are different kinds disturbances,preprocessing must be carried out before a Chinese character image can be recognized.We present a preprocessing method,which includes binary processing,smoo thprocessing,line segmentation,character segmentation and normalization. In the key step of binary processing,we propose a threshold-selecting method.First,we work out the gray statistical histogram of the Chinese character image,then smooth the histogram repeatedly by using the weighted near neighbor points until two global peak points are obtained. Find the lowest point of the two peak points and take it as the threshold of the Chinese character image;We used this preprocessing method in printed Chinese character recognition in which neural network technology is used,and got a satisfactory result.
出处
《大庆石油学院学报》
CAS
北大核心
1996年第2期59-62,共4页
Journal of Daqing Petroleum Institute
关键词
印刷体
汉字识别
预处理
二值化
行切分
字切分
printed Chinese character recognition,preprocessing,binary processing,line segmentation,character segmentation,normalization