摘要
现有的过滤式特征选择算法忽略了特征之间的关联性。鉴于此,提出了一种新的过滤式特征选择算法——基于持续同调的特征选择算法(Rel-Betti算法),该算法能够识别特征之间的关联性以及组合效果。通过提出相关贝蒂数概念,筛选出数据集中重要的拓扑特征信息。该算法对数据集进行预处理后,根据类标签将数据集分类,计算不同类中的相关贝蒂数,获得数据信息的特征均值,按特征均值差值大小对特征进行重要性排序。利用UCI数据集中的8个数据,将该算法与其他常见算法在决策树、随机森林、K近邻和支持向量机这4种学习模型下进行比较实验。结果表明,该算法是一种有效的特征选择算法,其能够提高分类的准确率和F1值,并且不依赖于特定的机器学习模型。
The existing filtering feature selection algorithm ignores the correlation between features.This paper proposes a new filtering feature selection algorithm—the feature selection algorithm based on persistent homology(Rel-Betti algorithm),which can consider the correlation between features and the combined effect.This paper gives a new definition named by relevant Betti numbers,which can filter out the important topological features in the dataset.The algorithm first preprocesses the data set,classifies the data set according to the class labels,calculates the relevant Betti numbers in different classes,obtains the feature mean of the data information,and uses the feature mean difference to sort the importance of the features.Using eight data in UCI,the algorithm is compared with other common algorithms under four learning models:decision tree,random forest,K-nearest neighbor and support vector machine.Experimental results show that the Rel-Betti algorithm is an effective method that can improve classification accuracy and F1 value,and does not depend on a specific machine learning model.
作者
殷杏子
彭宁宁
詹学燕
YIN Xingzi;PENG Ningning;ZHAN Xueyan(School of Science,Wuhan University of Technology,Wuhan 430070,China)
出处
《计算机科学》
CSCD
北大核心
2023年第6期159-166,共8页
Computer Science
基金
国家自然科学基金(11701438)。
关键词
特征选择
持续同调
条形码
贝蒂数
机器学习
Feature selection
Persistent homology
Barcode
Betti number
Machine learning