摘要
针对Rocchio算法分类效果不理想,限定于对可划分空间地分类,和KNN对K值的选取过于依赖这些缺陷,提出了一个通过为整个分类空间建立不止一个代表的方法,并且根据数据集的具体数据分布,为整个分类空间建立不同个数的分类代表。有效解决了Rocchio线性地划分实例空间的造成数据划分不合理的问题,由构造的代表和每个类泛化的实例创建的分类模型有效提高了分类效率,而且解决了分类准确度依赖人工给定K值的问题,提高了分类的准确度。通过在20-newsgroup和Reuters-21578两个数据集上的实验,实验结果显示新的算法远优于Rocchio和KNN分类算法,与选择的基准比较算法SVM相比效果略优。
For Rocchio classification algorithm' results are not very accurate, and KNN' accurate is completely dependent on the selected K value, need large amount of computation, but the results of this algorithm is more accurate. Combine this two classification algo- rithms' advantages, a 'new support vector machines classification algorithm has been proposed. Create several local representatives for one category, the count of representatives of every category depends on the distributed of the Training set, this method solve the problem that linearly divide the space of the instances cause data partitioning unreasonable, the classification model consist of some constructed repre- sentatives and instances generalized by every category can effectively improve the classification efficiency. Through the experiment on two common document corpora, namely, the 20-newsgroup and the Reuters-21578. The experimental results Show that the new improved clas- sification approach outperforms the KNN and Rocchio classifiers, and is comparable to SVM, which is used as a benchmark in our experi- ments.
出处
《自动化与仪器仪表》
2017年第8期107-110,共4页
Automation & Instrumentation