摘要
为实现基于关键词的维吾尔文文档图像检索,提出一种基于由粗到细层级匹配的关键词文档图像检索方法。使用改进的投影切分法将经过预处理的文档图像切分成单词图像库,使用模板匹配对关键词进行粗匹配;在粗匹配的基础上,提取单词图像的方向梯度直方图(HOG)特征向量;通过支持向量机(SVM)分类器学习特征向量,实现关键词图像检索。在包含108张文档图像的数据库中进行实验,实验结果表明,检索准确率平均值为91.14%,召回率平均值为79.31%,该方法能有效实现基于关键词的维吾尔文文档图像检索。
To realize keywords based information retrieval system for Uyghur document images,a keywords document image retrieval method based on coarse-fine level matching was proposed.The preprocessed document images were segmented into word image library using the improved projection segmentation method,and the keywords were roughly matched by template matching.On the basis of rough matching,feature vectors of directional gradient histogram(HOG)of word images were extracted.Keywords image retrieval was realized by learning feature vectors with support vector machine(SVM)classifier.Experiments were carried out on a database containing 108 document images.The average retrieval accuracy and recall rate are 91.14%and 79.31%respectively.The results show that the proposed method can effectively realize image retrieval of Uyghur documents based on keywords.
作者
李静静
木特力甫·马木提
吾尔尼沙·买买提
阿力木江·艾沙
库尔班·吾布力
LI Jing-jing;Mutelep·Mamut;Hornisa·Mamat;Alim·Aysa;Kurban·Ubul(College of Information Science and Engineering,Xinjiang University,Urumqi 830046,China;Library,Xinjiang University,Urumqi 830046,China;Teachers’Department,Xinjiang University,Urumqi 830046,China)
出处
《计算机工程与设计》
北大核心
2020年第4期1062-1069,共8页
Computer Engineering and Design
基金
国家自然科学基金项目(61563052、61862021、61363064)
新疆大学2018年度博士启动基金项目(62008040)。