摘要
针对现有的支持向量数据描述(SVDD)在解决分类问题时通常存在盲目性和有偏性,在研究信息熵和SVDD分类理论的基础上,提出了改进两类分类问题的E-SVDD算法。首先对两类样本数据分别求出其熵值;然后根据熵值大小决定将哪类放在球内;最后结合两类样本容量以及各自的熵值所提供的分布信息,对SVDD算法中的C值重新进行定义。采用该算法对人工样本集和UCI数据集进行实验,实验结果验证了算法的可行性和有效性。
Most of Support Vector Data Description(SVDD) methods have blindness and bias issues when working on two-class problems.The authors proposed a new SVDD method based on information entropy.In this algorithm,firstly,the entropy values were resolved respectively of the two classes of samples.Secondly,according to the size of the value,one class was placed inside the ball.Finally,the penalty was given based on the information provided by the sizes of the two sample data and their entropy values.The efficiency of this algorithm was verified by using artificial data and UCI datasets for the data imbalanced classification problem.The experimental results on artificial data sets and UCI data sets show the feasibility and effectiveness of the proposed method.
出处
《计算机应用》
CSCD
北大核心
2011年第4期1114-1116,共3页
journal of Computer Applications
基金
国家自然科学基金资助项目(60874074)
浙江省科技计划重点项目(2009C14032)
关键词
信息熵
分布特性
支持向量数据描述
分类
information entropy
distribution character
Support Vector Data Description(SVDD)
classification