摘要
静态分析工具可以帮助开发人员在项目编码初期定位可能存在缺陷的代码。然而有研究表明,此类工具往往会报告大量的警告,且其中大部分为误报警告。为了增强静态分析工具的可用性,研究者们通常采用统计和机器学习方法将警告分类为有效警告和误报警告。然而,现有警告分类方法并未考虑大量误报警告造成警告数据类不平衡问题,以及误分类代价不等的问题。鉴于此,分别将BP神经网络和基于过采样、阈值操作、欠采样方法的代价敏感神经网络应用到有效警告的分类中。实验结果对比发现,相比BP神经网络,基于代价敏感神经网络方法在有效警告查全率方面平均提高了44.07%,且当有效警告被误分类的代价高于一定值时,代价敏感分类方法能得到更低的分类代价。
Static analysis tools can help developers locate potential code errors in the early phase of development. However, studies show that such tools always report a large number of alerts, and most of them are meaningless false ones. To enhance the availability of static analysis tools, researchers di- vide alerts to actionable and unactionable alerts using statistics and machine learning techniques. These classification techniques do not consider the class imbalance problem caused by false positives and the unequal cost of different misclassifications. Aiming at these problems, we apply the BP neural networks and cost sensitive neural networks based on over sampling, threshold moving and under sampling tech- niques to classify alerts respectively. Experimental results show that, compared with BP neural net- works, the cost sensitive neural networks techniques can on average increase actionable alert recall rate by 44.07%. And when the cost of misclassification of an actionable alert is above a certain value, cost sensitive techniques can have a lower classification cost.
出处
《计算机工程与科学》
CSCD
北大核心
2017年第6期1097-1103,共7页
Computer Engineering & Science
基金
国家自然科学基金(91118005)
关键词
有效警告
误报警告
代价敏感
类不平衡
神经网络
actionable alert
unactionable alert
cost sensitive
class imbalance
neural networks