摘要
文本自动分类是一种有效的组织信息和管理信息的工具,传统分类方法一般在分类效果和运行效率上两者不可兼得,通过综合Rocchio和KNN2种分类方法的优点,设计出一种基于多代表点的文本分类方法,该方法通过对各类挖掘出多个有效的代表点(真实或虚拟的),再使用基于这些代表点的Rocchio和KNN方法进行分类.实验表明,该方法以较少的训练时间达到令人满意的分类效果,并且能很好地解决不平衡类问题,实验结果显示,该方法能达到与SVM相当的分类效果.
Text classification is an effective tool of organization and management for information.Traditional classification methods are not good both in the effectiveness and in efficiency.This paper designed a method of classification based on multiple representative points,firstly mining a number of effective representative points to every category,and it can be true document or virtual point, then the methods of Rocchio and KNN can be working based on those points.Experiment results show that this classification method can achieve satisfactory results in less training time,and it can solve imbalance problem well,the results show that the method can achieve significant results similar to SVM.
出处
《郑州大学学报(工学版)》
CAS
北大核心
2010年第6期116-118,125,共4页
Journal of Zhengzhou University(Engineering Science)