摘要
特征选择是机器学习、自然语言处理和数据挖掘等领域中数据预处理阶段必不可少的步骤。在一些基于信息论的特征选择算法中,存在着选择不同参数就是选择不同特征选择算法的问题。如何确定动态的非先验权重并规避预设先验参数就成为一个急需解决的问题。该文提出动态加权的最大相关性和最大独立性(WMRI)的特征选择算法。首先该算法分别计算新分类信息和保留类别信息的平均值。其次,利用标准差动态调整这两种分类信息的参数权重。最后,WMRI与其他5个特征选择算法在3个分类器上,使用10个不同数据集,进行分类准确率指标(fmi)验证。实验结果表明,WMRI方法能够改善特征子集的质量并提高分类精度。
Feature selection is an essential step in the data preprocessing phase in the fields of machine learning,natural language processing and data mining.In some feature selection algorithms based on information theory,there is a problem that choosing different parameters means choosing different feature selection algorithms.How to determine the dynamic,non-a priori weights and avoid the preset a priori parameters become an urgent problem.A Dynamic Weighted Maximum Relevance and maximum Independence(WMRI)feature selection algorithm is proposed in this paper.Firstly,the algorithm calculates the average value of the new classification information and the retained classification information.Secondly,the standard deviation is used to dynamically adjust the parameter weights of these two types of classification information.At last,WMRI and the other five feature selection algorithms use ten different data sets on three classifiers for the fmi classification metrics validation.The experimental results show that the WMRI method can improve the quality of feature subsets and increase classification accuracy.
作者
张俐
陈小波
ZHANG Li;CHEN Xiaobo(College of Computer Engineering,Jiangsu University of Technology,Changzhou 213001,China;Key Laboratory of Trustworthy Distributed Computing and Service(Ministry of Education),Beijing University of Posts and Telecommunications,Beijing 100876,China;The People's Bank of China,Changzhou Branch,Changzhou 213001,China)
出处
《电子与信息学报》
EI
CSCD
北大核心
2021年第10期3028-3034,共7页
Journal of Electronics & Information Technology
基金
国家科技基础性工作专项(2015FY111700-6)
江苏理工学院博士科研基金(KYY19042)。
关键词
特征选择
分类信息
平均值
标准差
动态加权
Feature selection
Classification information
Average value
Standard deviation
Dynamic weighting