摘要
为了提高AdaBoost集成学习算法的数据分类性能,提出基于合群度-隶属度噪声检测及动态特征选择的改进AdaBoost算法.综合考虑待检测样本与邻居样本的相似度及与不同类别样本集的隶属关系,引入合群度和隶属度的概念,提出新的噪声检测方法.在此基础上,为了更好地选择那些能够有效区分错分样本的特征,在传统过滤器特征选择方法的基础上提出通用的结合样本权重的动态特征选择方法,以提高AdaBoost算法针对错分样本的分类能力.以支持向量机作为弱分类器,在8个典型数据集上分别从噪声检测、特征选择及现有方法比较3个方面进行实验.结果表明,所提算法充分考虑了噪声样本和样本权重对AdaBoost分类结果的影响,相对于传统算法在分类性能上获得显著提升.
An improved AdaBoost algorithm using group degree and membership degree based noise detection and dynamic feature selection was proposed in order to improve the performance of AdaBoost ensemble learning algorithm on data classification.Firstly,the similarity between a sample and its neighbor samples and the membership relationship between a sample and the categories were comprehensively considered.The conceptions of group degree and membership degree were introduced,and a new noise detection method was proposed.On this basis,for the purpose of selecting the features those can effectively distinguish the misclassified samples,a general and sample weight combined dynamic feature selection method was proposed based on the traditional feature selections of filters,improving the classification ability of AdaBoost algorithm on misclassified samples.Experiments were carried out by using support vector machine as the weak classifier on eight typical datasets from three aspects of noise detection,feature selection and existing algorithms comparison.Experimental results show that the proposed method comprehensively considers the influences of sample density and sample weights on the classification results of AdaBoost algorithm,and obtains significant improvement on classification performance compared to traditional algorithms.
作者
王友卫
凤丽洲
WANG You-wei;FENG Li-zhou(School of Information,Central University of Finance and Economics,Beijing 100081,China;School of Statistics,Tianjin University of Finance and Economics,Tianjin 300222,China)
出处
《浙江大学学报(工学版)》
EI
CAS
CSCD
北大核心
2021年第2期367-376,共10页
Journal of Zhejiang University:Engineering Science
基金
国家自然科学基金资助项目(61906220)
教育部人文社科资助项目(19YJCZH178)
国家社会科学基金资助项目(18CTJ008)
天津市自然科学基金资助项目(18JCQNJC69600)。
关键词
集成学习
数据分类
噪声检测
特征选择
样本权重
ensemble learning
data classification
noise detection
feature selection
sample weight