摘要
目前的多类学习方法大多将多类问题转化为二类问题,这样处理除了时间开销大,还存在识别盲区。提出了一种直接进行多类学习的算法multi-SVDD。该算法在考虑大样本和多类样本数据中的类内不平衡现象基础上,首先为每类训练样本进行聚类,根据聚类结果由支持向量数据描述(SVDD,Support Vector Date Description)建立多个最小包围球。根据测试样本到SVDD所建立的最小包围球的距离来确定测试样本属于哪个聚类,最终可判断测试样本属于哪个类。multi-SVDD算法在时空开销上相比最小包围球方法没有明显增长,而实验效果则好于最小包围球方法。
Most of the multi-class learning methods transfer the multi-class classification problems to two-class classification problems, which not only are time-expensive but also have some region undiscriminating. A direct multi-class learning algorithm named multi-SVDD was proposed. Based on the consideration that there is within-class imbalance in large data sets and multi-class data sets, every class of the training data was firstly clustered. Some minimum bounding hyperspheres were formed by Support Vector Date Description (SVDD) according to the clustering results. A test sample is assigned to the label of hyperspheres if its distance to the sphere center is smaller than or equal to the radius. Compared with minimum enclosing hypersphere algorithm, the multi-SVDD algorithm doesn't become worse in time and space cost,and the experiment result is better.
出处
《计算机科学》
CSCD
北大核心
2009年第3期65-68,共4页
Computer Science
基金
国家自然科学基金"单类分类器和数据不平衡分类问题研究"(No.60603029)
江苏自然科学基金"基于单类分类器的安全审计中的异常检测研究"(No.BK2005009)项目支持