摘要
提出了一种组合凸线器和Hadamard纠错码相结合的多类文本分类算法,利用Hadamard纠错码将多类分类问题转换成多个二分类问题,对每个二分类问题,采用组合凸线器构造二分类器,使用海明距离判定待分类文本类别.在标准数据集Reuters 21578上进行了文本分类实验,分类结果表明,与支持向量机多类分类算法1-a-r、1-a-1和DAGSVM相比,该算法不仅提高了分类精度,而且分类速度有较大幅度的提高.
A multi -class text classification algorithm based on Muhiconlitron and Hadamard ECOC is pro- posed. Hadamard ECOC is used to convert the multi - class classification problem into a series of binary - class problems. For each of the binary - class problems, the binary classifier is constructed by using multiconlitron. Hamming distance is used to determine the text category. The classification experiments are done on the reuters 21578 dataset. The experimental results show that, compared with 1 -a -r, 1 -a -1 and DAGSVM, the pro- posed algorithm can not only remarkably increase the speed of classification, but also significantly improve the precision of classification.
出处
《渤海大学学报(自然科学版)》
CAS
2017年第1期71-75,共5页
Journal of Bohai University:Natural Science Edition
基金
国家自然科学基金项目(No:61602056)
辽宁省教育厅项目(No:L2014444)