摘要
为了解决支持向量机的分类仅应用于较小样本集的问题,提出了一种密度聚类与支持向量机相结合的分类算法.在密度聚类中,当一个样本点不存在拟密度可达的样本点,则其显著特征即表现为该簇的边缘点,将该点加入约简集合,直至选出样本集合中的所有边缘对象,然后再利用约简集合寻找支持向量.实验表明,采用该算法,分类的准确率可从基于无监督聚类的支持向量机算法的86.81%提升至95.43%,核函数计算量由原数量级109下降到106以下,采取限制密度聚类中的核心点ε-邻域内的反例百分比的方法,可以增加约简样本的个数,可将分类准确率提高5%~8%左右.
To solve the problem that support vector machine (SVM) can only classify the small samples set, a new algorithm which applied SVM to density clustering is proposed. For density clustering, when an object p is not an approximate density-reachable object, the feature of this object will show itself as the edge object of this cluster, then, saving this object to the new reduced samples set until the end of procedure of extracting samples. SVM can use these reduced samples to train the classifier. Experimental results indicate that comparing with the SVM's unsupervised clustering, the accuracy is increased from 86.81% to 95.43%, the order of magnitude of kernel computation of SVM is decreased from 10^9 to 10^6, and the accuracy is improved by about 5%-8%, and the number of reduced samples can be increased by using the method that limits the percentage of counterexamoles in kernel object ε-adjacent area.
出处
《西安交通大学学报》
EI
CAS
CSCD
北大核心
2005年第12期1319-1322,1348,共5页
Journal of Xi'an Jiaotong University
基金
国家自然科学基金资助项目(60173066)
关键词
支持向量机
密度聚类
ε-邻域
support vector machine
density clustering
ε--area