一种基于同步语义对齐的异构缺陷预测方法被引量：2

Heterogeneous Defect Prediction Based on Simultaneous Semantic Alignment

下载PDF

导出

摘要异构缺陷预测(heterogeneous defect prediction,HDP)在具有异构特征的项目间进行缺陷预测,可以有效解决源项目和目标项目使用了不同特征的问题.当前大多数HDP方法都是通过学习域不变特征子空间以减少域之间的差异来解决异构特征问题.但是,源域和目标域通常呈现出巨大的异质性,使得域对齐效果并不好.究其原因,这些方法都忽视了分类器对于两个域中的同一类别应产生相似的分类概率分布这一潜在知识,没有挖掘数据中包含的内在语义信息.另一方面,由于在新启动项目或历史遗留项目中搜集训练数据依赖于专家知识,费时费力且容易出错,探究了基于目标项目内少数标记模块来进行异构缺陷预测的可能性.鉴于此,提出一种基于同步语义对齐的异构缺陷预测方法(SHSSAN).一方面,探索从标记的源项目中学到的隐性知识,从而在类别之间传递相关性,达到隐式语义信息迁移.另一方面,为了学习未标记目标数据的语义表示,通过目标伪标签进行质心匹配达到显式语义对齐.同时,SHSSAN可以有效解决异构缺陷数据集中常见的类不平衡和数据线性不可分问题,并充分利用目标项目中的标签信息.对包含30个不同项目的公共异构数据集进行的实验表明,与目前表现优异的CTKCCA、CLSUP、MSMDA、KSETE和CDAA方法相比,在F-measure和AUC上分别提升了6.96%、19.68%、19.43%、13.55%、9.32%和2.02%、3.62%、2.96%、3.48%、2.47%. Heterogeneous defect prediction(HDP)can effectively solve the problem that the source project and the target project use different features.It uses heterogeneous feature data from the source project to predict the defect tendency of the software module in the target project.At present,HDP has made certain achievements,but its overall performance is not satisfactory.Most previous HDP methods solve this problem by learning domain invariant feature subspace to reduce the difference between domains.However,the source domain and the target domain usually show huge heterogeneity,which makes the domain alignment effect not satisfied.The reason is that these methods ignore the potential knowledge that the classifier should generate similar classification probability distributions for the same category in the two domains,and fail to mine the intrinsic semantic information contained in the data.In addition,because the collection of training data in newly launched projects or historical legacy projects relies on expert knowledge,is time-consuming,laborious,and error-prone,the possibility of heterogeneous defect prediction is explored based on a small number of labeled modules in the target project.Based on these,a heterogeneous defect prediction method is proposed based on simultaneous semantic alignment(SHSSAN).On the one hand,it explores the implicit knowledge learned from the labeled source projects,so as to transfer relevance between categories and achieve implicit semantic information transfer.On the other hand,in order to learn the semantic representation of unlabeled target data,centroid matching is performed through target pseudo-labels to achieve explicit semantic alignment.At the same time,SHSSAN can effectively solve the class imbalance problem and the data linearly inseparable problem,and make full use of the label information in the target project.Experiments on public heterogeneous data sets containing 30 different projects show that compared with the current excellent CTKCCA,CLSUP,MSMDA,KSETE,and CDAA methods,the F-measure and AUC are increased by 6.96%,19.68%,19.43%,13.55%,9.32%and 2.02%,3.62%,2.96%,3.48%,2.47%,respectively.

作者李伟湋陈翔张恒伟黄志球贾修一 LI Wei-Wei;CHEN Xiang;ZHANG Heng-Wei;HUANG Zhi-Qiu;JIA Xiu-Yi(College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China;College of Astronautics,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China;School of Information Science and Technology,Nantong University,Nantong 226019,China;School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094,China)

机构地区南京航空航天大学计算机科学与技术学院南京航空航天大学航天学院南通大学信息科学技术学院南京理工大学计算机科学与工程学院

出处《软件学报》 EI CSCD 北大核心 2023年第6期2669-2689,共21页 Journal of Software

基金国家重点研发计划(2018YFB1003900) 国家自然科学基金(61906090,62176123) 中央高校基本科研业务费专项资金(30920021131)。

关键词异构缺陷预测语义对齐少样本数据类不平衡线性不可分 heterogeneous defect prediction(HDP) semantic alignment few sample data class imbalance linearly inseparable

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献2

1陈翔,顾庆,刘望舒,刘树龙,倪超.静态软件缺陷预测方法研究[J].软件学报,2016,27(1):1-25. 被引量：122
2陈翔,王莉萍,顾庆,王赞,倪超,刘望舒,王秋萍.跨项目软件缺陷预测方法研究综述[J].计算机学报,2018,41(1):254-274. 被引量：43

二级参考文献138

1王青,伍书剑,李明树.软件缺陷预测技术.软件学报,2008,19(7):1565—1580.http://www.jos.org.cn/1000—9825/19/1565.htm.
2Hall T, Beecham S, Bowes D, Gray D, Counsell S. A systematic literature review on fault prediction performance in software engineering. IEEE Trans. on Software Engineering, 2012,38(6): 1276-1304. [doi: 10.1109/TSE.2011.103 ].
3Radjenovic D, Hericko M, Torkar R, Zivkovic A. Software fault prediction metrics: A systematic literature review. Information and Software Technology, 2013,55(8): 1397-1418. [doi: 10.1016/j.infsof.2013.02.009].
4Akiyama E. An example of software system debugging. In: Proc. of the Int'1 Federation of Information Proc. Societies Congress. New York: Springer Science and Business Media, 1971. 353-359.
5Halstead MH. Elements of Software Science (Operating and Programming Systems Series). New York: Elsevier Science Inc., 1977.
6McCabe TJ. A complexity measure. IEEE Trans. on Software Engineering, 1976,2(4):308-320. [doi: 10.1109/TSE.1976.233837].
7Chidamber SR, Kemerer CF. A metrics suite for object oriented design. IEEE Trans. on Software Engineering, 1994,20(6): 476-493. [doi: 10.1109/32.295895].
8Basili VR, Briand LC, Melo WL. A validation of object-oriented design metrics as quality indicators. IEEE Trans. on Software Engineering, 1996,22(10):751-761. [doi: 10.1109/32.544352].
9Subramanyam R, Krishnan MS. Empirical analysis of CK metrics for object-oriented design complexity: Implications for software defects. IEEE Trans. on Software Engineering, 2003,29(4):297-310. [doi: 10.1109/TS E.2003.1191795].
10Zhou YM, Xu BW, Leung H. On the ability of complexity metrics to predict fault-prone classes in object-oriented systems. Journal of Systems and Software, 2010,83(4):660-674. [doi: 10.1016/j.jss.2009.11.704].

共引文献143

1郭肇强,周慧聪,刘释然,李言辉,陈林,周毓明,徐宝文.基于信息检索的缺陷定位:问题、进展与挑战[J].软件学报,2020(9):2826-2854. 被引量：14
2陈曙,叶俊民,刘童.一种基于领域适配的跨项目软件缺陷预测方法[J].软件学报,2020,31(2):266-281. 被引量：15
3贾燕华,李英梅.基于自适应聚类过采样的软件缺陷预测研究[J].哈尔滨师范大学自然科学学报,2023,39(2):45-50. 被引量：1
4郑继栋.训练扎实指导有序──第十册《基础训练6》第五六七题教学设计[J].小学语文教学,2000(6):57-57.
5武玉英,孙平,何喜军,蒋国瑞.基于迁移学习的新产品销量预测模型[J].系统工程,2018,36(6):124-132. 被引量：2
6呙明辉.组态软件测试下电力系统程序缺陷检测仿真[J].计算机仿真,2018,35(12):325-328. 被引量：2
7陈翔.地方高等院校计算机专业本科生的科研能力培养方法研究[J].计算机教育,2016(6):17-21. 被引量：2
8刘望舒,陈翔,顾庆,刘树龙,陈道蓄.软件缺陷预测中基于聚类分析的特征选择方法[J].中国科学：信息科学,2016,46(9):1298-1320. 被引量：25
9王星,何鹏,陈丹,曾诚.跨项目缺陷预测中训练数据选择方法[J].计算机应用,2016,36(11):3165-3169. 被引量：3
10甘露,臧洌,李航.基于DA-SVM的软件缺陷预测模型[J].计算机与现代化,2017(2):36-39. 被引量：3

同被引文献21

1陈曙,叶俊民,刘童.一种基于领域适配的跨项目软件缺陷预测方法[J].软件学报,2020,31(2):266-281. 被引量：15
2胡甜媛,姜瑛.体现使用反馈的APP软件用户评论挖掘[J].软件学报,2019,30(10):3168-3185. 被引量：23
3Zhou Xu,Shuai Pang,Tao Zhang,Xia-Pu Luo,Jin Liu,Yu-Tian Tang,Xiao Yu,Lei Xue.Cross Project Defect Prediction via Balanced Distribution Adaptation Based Transfer Learning[J].Journal of Computer Science & Technology,2019,34(5):1039-1062. 被引量：4
4郑炜,陈军正,吴潇雪,陈翔,夏鑫.基于深度学习的安全缺陷报告预测方法实证研究[J].软件学报,2020,31(5):1294-1313. 被引量：10
5段文静,姜瑛.基于用户反馈的APP软件缺陷识别[J].计算机科学,2020,47(6):44-50. 被引量：4
6肖建茂,陈世展,冯志勇,刘朋立,薛霄.一种基于用户评论自动分析的APP维护和演化方法[J].计算机学报,2020,43(11):2184-2202. 被引量：6
7陈凯,邵培南.基于深度学习的软件缺陷预测模型[J].计算机系统应用,2021,30(1):29-37. 被引量：4
8钟仁毅,王翀,梁鹏,罗忠.基于版本更新日志的移动应用演化趋势自动分析[J].计算机研究与发展,2021,58(4):763-776. 被引量：4
9钱宇,曹恩叶,邓文君,袁华.海量用户评论在APP更新设计中的参与作用挖掘[J].系统工程理论与实践,2021,41(3):554-564. 被引量：12
10王璐,李青山,吕文琪,张河,李昊.基于事件关系保障识别质量的自适应分析方法[J].软件学报,2021,32(7):1978-1998. 被引量：4

引证文献2

1刘海毅,姜瑛,赵泽江.面向版本演化的APP软件缺陷跟踪分析方法[J].软件学报,2024,35(7):3180-3203.
2高芹芹,凌松松,于婕,于旭.基于动态分布对齐和伪标签学习的跨项目缺陷预测[J].计算机系统应用,2024,33(8):40-50.

1康碧莲,孙祥云,于庆红,刘琳,丛敏,王萍,刘天会.非酒精性脂肪性肝炎小鼠肝组织中脂蛋白脂肪酶及其相关调控因子的表达及二乙基二硫代氨基甲酸对其的调控作用[J].临床和实验医学杂志,2023,22(8):785-788. 被引量：2
2薛红霞,贾凤梅.制作创意书架:项目学习常态化应用的实践——以“一元一次方程”的教学为例[J].中国数学教育（初中版）,2023(6):11-14.
3谢国民,刘东阳,刘明.多策略改进MPA算法与HKELM的变压器故障辨识[J].电子测量与仪器学报,2023,37(4):172-182. 被引量：6
4刘晶,闵帆,宋国杰.基于DiWCSmSTL的跨项目软件缺陷预测[J].海南热带海洋学院学报,2023,30(2):51-61.
5李会红,汲培文,蒲钔,张守著.理论物理专款资助工作回顾与展望(连载完)[J].大学科普,2023,17(2):21-28.
6王敬,张政梅.专创融合的产品设计与3D打印课程改革与实践[J].职业,2023(12):79-81. 被引量：2
7张轶锋,朱红洲.基于“分工”视角的建设项目使用方角色研究——以EPC项目为例[J].建筑经济,2023,44(7):39-46.
8刁旭炀,吴凯,陈都,周俊峰,高璞.基于混合注意力机制的软件缺陷预测方法[J].计算机测量与控制,2023,31(3):56-62. 被引量：1
9李福德,高婷,刘汉周,马园林,姬志涛,王磊,米彦龙,李鹏飞.榆能化煤、油、气综合利用项目互联互通管线应用实践[J].煤化工,2023,51(2):84-86.
10Saleh Albahli,Ghulam Nabi Ahmad Hassan Yar.Defect Prediction Using Akaike and Bayesian Information Criterion[J].Computer Systems Science & Engineering,2022,41(6):1117-1127. 被引量：2

软件学报

2023年第6期

浏览历史

内容加载中请稍等...

一种基于同步语义对齐的异构缺陷预测方法被引量：2

参考文献2

二级参考文献138

共引文献143

同被引文献21

引证文献2

相关作者

相关机构

相关主题

浏览历史

一种基于同步语义对齐的异构缺陷预测方法 被引量：2

参考文献2

二级参考文献138

共引文献143

同被引文献21

引证文献2

相关作者

相关机构

相关主题

浏览历史

一种基于同步语义对齐的异构缺陷预测方法被引量：2