摘要
软件缺陷预测是降低软件测试成本的重要手段,而特征选择则是其中关键的一环.然而,传统的特征选择算法局限于考虑特征之间的双边关系和两两特征的关联,而无法有效处理更为复杂的多边关系和多向交互等问题.为此,提出了一种基于TMFG的软件缺陷预测图特征选择方法.该方法首先将拓扑图引入特征选择算法中,利用对称不确定性作为特征关联度,将特征表示为拓扑图的节点,构建特征全连接图.然后,通过TMFG去连边算法去除全连接图中的部分连边,并进行图聚类操作.接着,对每个聚类中的特征进行排序,并从每个类中选取特定数目的特征进行综合,得到最终的特征子集.最后,通过在Promise数据仓库中的数据集上进行对比实验,结果表明,所提出的方法在进一步优化特征选择选出的特征子集的质量方面取得了良好的效果,尤其在数据量较大的数据集中表现出更大的优势.
Software defect prediction serves as an important approach to reduce software testing costs,with feature selection be-ing a crucial component.However,traditional feature selection algorithms are limited to considering bilateral relationships be-tween features and pairwise correlations,thereby being unable to effectively handle more complex multilateral relationships and multidirectional interactions.To address this issue,this paper proposed a novel software defect prediction graph-based feature se-lection method utilizing TMFG(Triangulated Maximally Filtered Graph).The method first introduced a topological graph into the feature selection algorithm,representing features as nodes in the graph and employing symmetric uncertainty as the measure of feature relevance,thus constructing a fully connected feature graph.Subsequently,the TMFG edge removal algorithm was em-ployed to remove selected edges from the fully connected graph,followed by graph clustering operations.Then,features within each cluster were ranked,and a specific number of features from each cluster were comprehensively selected to obtain the final feature subset.Finally,comparative experiments conducted on the dataset from the promise repository demonstrated that the pro-posed method achieved favorable results in further optimizing the quality of the selected feature subset,particularly exhibiting greater advantages in datasets with larger volumes.
作者
崔梦天
陈建英
徐智慧
CUI Meng-tian;CHEN Jian-ying;XU Zhi-hui(School of Computer Science and Engineering,Southwest Minzu University,Chengdu 610041,China)
出处
《西南民族大学学报(自然科学版)》
CAS
2024年第4期418-427,共10页
Journal of Southwest Minzu University(Natural Science Edition)
基金
四川省科技计划项目(2023YFH0057,2023YFN0026)。
关键词
软件缺陷预测
特征选择
拓扑图
社区检测算法
TMFG
software defect prediction
feature selection
topological graph
community detection algorithm
TMFG(Triangulated Maximally Filtered Graph)