摘要
为了能够对文档中的少数民族文字种类进行正确地识别分类,提出一种基于小波分析与改进的二次分类函数(MQDF)的少数民族文字种类识别方法。该方法采用多辨识小波分解,从而获得小波能量和小波能量比例分布的特征描述,利用MQDF分类器对少数民族文种进行识别。构建藏文、西双版纳傣文、纳西象形文、维吾尔文、德宏傣文和彝文6种常用的少数民族文字及汉字、英语共8种文字的样本库,采用该方法对少数民族的样本库进行了进行训练和测试。实验结果显示,该方法在多层小波分解的情况下,对于少数民族文种识别的精度好于传统的贝叶斯和KNN。
In order to classify the type of the Chinese minority scripts, the method of identifying the kinds of Chinese minority scripts based on wavelet analysis and Modified Quadratic Discriminant Function (MQDF) was presented. Using wavelet energy and wavelet energy distribution proportion as features by wavelet multi-resolution transform, muhivariate classifier in MQDF was constructed. A sample data set was built which contained six common Chinese minority scripts: Tibetan, Tai Lue, Naxi Pictographs, Uighur, Tai Le, Yi and Chinese and English in total, some samples were used for training, others were for testing, and the proportions of the training samples in dataset were variant. Obviously, the experimental result shows that, in muhi-level decomposition, the method is better than the traditional Bayes and K-Nearest Neighbor (KNN) classification in recognition rate.
出处
《计算机应用》
CSCD
北大核心
2009年第12期3360-3362,3365,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(60803096)
国家民委项目(07DL07)
关键词
中国少数民族文字
文种识别
小波分析
改进的二次分类函数
Chinese minority script
script identification
wavelet analysis
Modified Quadratic Discriminant Function (MQDF)