摘要
由于中亚地区某些文种相似度较高,单一纹理特征不能充分描述它们的纹理特点。为此,提出基于NSCT子带纹理特征融合的文种识别方法,即先对预处理后的文档图像进行非下采样Contourlet变换。对变换产生的子带分别提取局部二值模式和灰度共生矩阵特征,生成高维融合特征向量,通过主成分分析法对其进行降维生成低维特征向量。通过对阿拉伯文、俄文、藏文、中文、维吾尔文、英文、蒙古文、吉尔吉斯斯坦文、哈萨克斯坦文、土耳其文进行实验,验证了该方法能更准确地提取文档图像多尺度、多方向的纹理特征,有效提高识别率。
Due to the higher similarity of some scripts in Central Asia,a single texture feature can not adequately describe their texture feature.To solve this problem,a script-identification method based on fusion texture feature of nonsubsampled Contourlet transform sub-bands was proposed.The preprocessed document images were subjected to nonsubsampled Contourlet transform firstly.The local binary patterns and the gray level co-occurrence matrix features were extracted from the sub-bands gene rated by the transformation,and the high-dimensional fusion feature vector was generated.The principal component analysis was used to reduce dimension to generate low-dimensional feature vectors.Experiments on Arabic,Russian,Tibetan,Chinese,Uyghur,English,Mongolian,Kyrgyzstan,Kazakhstan,and Turkish verify that the proposed method can more accurately extract the multi-scale and multi-directional texture features of document images,and can improve the recognition rate effectively.
作者
韩兴坤
阿力木江.艾沙
努尔毕亚.亚地卡尔
朱亚俐
库尔班.吾布力
HAN Xing-kun;Alimjan Aysa;Nurbiya Yadikar;ZHU Ya-li;Kurban Ubul(School of Information Science and Engineering,Xinjiang University,Urumqi 830046,China;Network and Information Center,Xinjiang University,Urumqi 830046,China)
出处
《计算机工程与设计》
北大核心
2018年第9期2848-2855,共8页
Computer Engineering and Design
基金
国家自然科学基金项目(61363064
61563052
61163028)
新疆大学博士科研启动基金项目(BS150262)