摘要
针对汉字文本分析,提出了一种新的文本特征———空白线特征(BCF)来进行文本投影轮廓生成。在对生成的投影轮廓进行分析之前,应用BCF矢量平滑算法(BVSA)对它进行预处理。处理后的投影轮廓揭示了汉字文本的一个重要现象,就是BCF矢量中间聚集现象。通过统计实验验证,这是一个稳定的现象,也就是说,不同字体、不同字号、印刷体和手写体等等文本风格的不同,都不影响汉字文本的BCF矢量中间聚集现象。应用这个现象对汉字文本进行行分离,取得了良好效果。
BCF(Blank Count Feature), a new feature of the construction of projection profile, was presented for Chinese text analysis. BVSA(BCF Vector Smoothing Algorithm), as the preprocessing of projection profile, was presented to be applied before the analysis. After applying BVSA, the projection profile of Chinese text showed an important phenomenon - the convergence of BCF vector. The statistic experiment proved that it was a common and stable phenomenon, and the convergence phenomenon would still be available even if the font style, or font size, or even the text writing/printing style changed. Based on it, the algorithm designed for text line extraction achieved good effect.
出处
《计算机应用》
CSCD
北大核心
2005年第5期1039-1041,共3页
journal of Computer Applications