摘要
为获取文本中的较优特征子集,剔除干扰和冗余特征,提出了一种结合过滤式算法和群智能算法的混合特征寻优算法。首先计算每个特征词的信息增益值,选取较优的特征作为预选特征集合,再利用正余弦算法对预选特征进行寻优,获取精选特征集合。为较好地平衡正余弦算法中的全局搜索和局部开发能力,加入了自适应惯性权重;为更精确地评价特征子集,引入以特征数量和准确率进行加权的适应度函数,并提出了新的位置更新机制。在KNN和贝叶斯分类器上的实验结果表明,该特征选择算法与其它特征选择算法及改进前的算法相比,分类准确率得到了一定的提升。
In order to obtain a better feature subset in the text and eliminate interference and redundant features,a hybrid feature optimization algorithm combining filtering and swarm intelligence algorithm is proposed.Firstly,the information gain value of each feature word is calculated,the better feature is selected as the preselected feature set,and then the sine cosine algorithm is used to optimize the preselected feature to obtain the selected feature set.In order to better balance the global search and local development capabilities in the sine-cosine algorithm,adaptive inertia weights are added.To more accurately evaluate feature subsets,a fitness function weighted by the number of features and accuracy is introduced,and a new location update mechanism is proposed.Experiment results on KNN and Bayesian classifier show that this feature selection model improves the classification accuracy,compared with other feature selection methods and the model before improvement.
作者
文武
万玉辉
文志云
WEN Wu;WAN Yu-hui;WEN Zhi-yun(School of Communication and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065;Research Center of New Telecommunication Technology Applications,Chongqing University of Posts and Telecommunications,Chongqing 400065;Chongqing Information Technology Designing Co.,Ltd.,Chongqing 401121,China)
出处
《计算机工程与科学》
CSCD
北大核心
2022年第8期1467-1473,共7页
Computer Engineering & Science
关键词
特征选择
正余弦
惯性权重
分类准确率
feature selection
sine and cosine
inertia weight
classification accuracy