期刊文献+

基于代价敏感主动学习的软件缺陷预测方法 被引量:4

Software Defect Prediction Method Based on Cost-Sensitive Active Learning
下载PDF
导出
摘要 软件缺陷预测数据集在搜集过程中存在标注成本较高的问题,引入主动学习有利于选择有价值的数据样例来快速构建数据集,但是主动学习一般选出不确定度最高的样例进行人工标注,并未考虑低不确定度样例。为了进一步降低数据标注的成本,融合信息熵与相对熵提出一种基于代价敏感的混合式主动学习策略。该策略首先使用基于信息熵的主动学习策略,将信息熵最高的样例交由领域专家进行人工标注;对于信息熵最低的样例,借助查询委员会进行二次分析,若满足阈值则进行伪标注。实证研究表明,在同等标注样例的情形下,该策略的AUC值要优于其他3种经典的主动学习策略。使用基于代价敏感的主动学习查询策略可以有效提高软件缺陷预测领域的标注效率并降低标注成本。 There are problems such as high labeling cost when construct software defect prediction data set operates.Active learning can be used to select valuable data samples to build data sets quickly.However,the highest uncertainty sample is usually selected with low uncertainty samples unprocessed.A hybrid active learning strategy based on cost sensitivity is proposed for reducing labeling cost.For the sample with low uncertainty,the query committee will analyze it secondly and then the sample will be pseudo-labeled.Empirical studies show that this strategy can effectively reduce the cost of labeling. In the case of the same labeling example,the strategy can perform better than other three classical active learning strategies when using AUC.Therefore cost-sensitive active learning query strategy can increase the efficiency in software defect prediction while reduce the labeling cost.
作者 曲豫宾 陈翔 QU Yubin;CHEN Xiang(College of Information Engineering,Jiangsu College of Engineering and Technology,Nantong 226007,China;School of Information Science and Technology,Nantong University,Nantong 226019,China)
出处 《南通大学学报(自然科学版)》 CAS 2019年第1期9-15,共7页 Journal of Nantong University(Natural Science Edition) 
关键词 代价敏感 主动学习 软件缺陷预测 信息熵 Kullback-Leibler离散度 cost-sensitive active learning software defects prediction information entropy Kullback-Leibler divergence
  • 相关文献

参考文献3

二级参考文献141

  • 1王青,伍书剑,李明树.软件缺陷预测技术.软件学报,2008,19(7):1565—1580.http://www.jos.org.cn/1000—9825/19/1565.htm.
  • 2Hall T, Beecham S, Bowes D, Gray D, Counsell S. A systematic literature review on fault prediction performance in software engineering. IEEE Trans. on Software Engineering, 2012,38(6): 1276-1304. [doi: 10.1109/TSE.2011.103 ].
  • 3Radjenovic D, Hericko M, Torkar R, Zivkovic A. Software fault prediction metrics: A systematic literature review. Information and Software Technology, 2013,55(8): 1397-1418. [doi: 10.1016/j.infsof.2013.02.009].
  • 4Akiyama E. An example of software system debugging. In: Proc. of the Int'1 Federation of Information Proc. Societies Congress. New York: Springer Science and Business Media, 1971. 353-359.
  • 5Halstead MH. Elements of Software Science (Operating and Programming Systems Series). New York: Elsevier Science Inc., 1977.
  • 6McCabe TJ. A complexity measure. IEEE Trans. on Software Engineering, 1976,2(4):308-320. [doi: 10.1109/TSE.1976.233837].
  • 7Chidamber SR, Kemerer CF. A metrics suite for object oriented design. IEEE Trans. on Software Engineering, 1994,20(6): 476-493. [doi: 10.1109/32.295895].
  • 8Basili VR, Briand LC, Melo WL. A validation of object-oriented design metrics as quality indicators. IEEE Trans. on Software Engineering, 1996,22(10):751-761. [doi: 10.1109/32.544352].
  • 9Subramanyam R, Krishnan MS. Empirical analysis of CK metrics for object-oriented design complexity: Implications for software defects. IEEE Trans. on Software Engineering, 2003,29(4):297-310. [doi: 10.1109/TS E.2003.1191795].
  • 10Zhou YM, Xu BW, Leung H. On the ability of complexity metrics to predict fault-prone classes in object-oriented systems. Journal of Systems and Software, 2010,83(4):660-674. [doi: 10.1016/j.jss.2009.11.704].

共引文献143

同被引文献19

引证文献4

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部