摘要
为了解决计算机打印文档的自动鉴别,提出了一种基于中文汉字显微放大图像灰度共生矩阵统计纹理特征的打印文档鉴别算法.首先,从理论模型上分析了激光打印机传动系统对打印字符潜影的影响;接着对字符图像的22维灰度共生矩阵统计纹理特征进行计算,并利用ReliefF特征选择算法进行特征选择;最后提取显微字符图像激光扫描方向和纸张行进方向的灰度共生矩阵纹理特征并进行融合,利用最近邻和支持向量机2种分类器进行分类鉴别.在两种样本集上的实验结果表明:特征融合后的鉴别性能有所提高;支持向量机的分类鉴别性能优于最近邻分类器,在相同字无重复样本集上的分类准确率和平均召回率分别为96.5%和96.64%,在相同字有重复样本集上分类准确率和平均召回率分别为98%和98.18%;激光打印机品牌分类准确率为98%.上述的实验结果显示该方法具有良好的打印文档分类鉴别性能.
In order to solve the automatic identification of laser print documents,aprint document identification algorithm based on statistical texture features computed from gray-level co-occurrence matrix of Chinese character microscopic images was proposed.First,the laser printer transmission system's influence on the latent images of the printed character was analyzed on the theoretical model.Then twenty-two statistical texture features were calculated;and the ReliefF algorithm was used for feature selection.Finally,the statistical texture features of laser scanning direction and paper moving direction were fused;and the nearest neighbor classifier and support vector machine were used for print document identification.The experimental results on two sample sets reveal that the feature fusion is beneficial to the improvement of the identification performance;the identification performance of support vector machine is better than the nearest neighbor classifier;the average classification identification rate and the average recall rate of the same word sets without duplicate sample are 96.5%and 96.64%respectively;the average classification identification rate and the average recall rate of the same word sets with duplicate sample are 98%and 98.18%respectively;and the average printer brand classification identification rate is 98%.The experimental results show that the method has a good print document identification performance.
出处
《武汉大学学报(工学版)》
CAS
CSCD
北大核心
2016年第1期154-160,共7页
Engineering Journal of Wuhan University
基金
公安部重大项目(编号:2014JSYJA017)
湖北省教育厅科学技术研究项目(编号:B2015033)
湖北工程学院科学研究项目(编号:201511)