摘要
随着计算机技术的发展,计算机软件产品给个人和企业都带来了很多方便,但很多软件也会存在各种缺陷。为了找到并解决软件中存在的缺陷,研究者将机器学习等方法应用到软件缺陷预测之中,但这些方法在数据预处理方面还存在很多需要改善的地方。在之前的研究中,有研究者使用多维尺度分析(MDS)对数据样本进行降维,但关于如何使用和改善MDS的方法却很少。文中提出了基于阈值相关性的多维尺度分析(TC_MDS)方法,在使用MDS方法的基础上,使用对称不确定性(SU)方法提取具有高鉴别的特征,并使用阈值相关性去除冗余特征。该方法学习得到的数据具有高鉴别性,去除了冗余特征,从而提高了预测效率。在软件工程NASA数据库上的实验结果表明,提出的方法具有较好的缺陷预测效果。
With the development of computer technology, computer software products have brought many convenience to individuals and businesses, but many software may have a variety of defects. In order to find and solve them, researchers have applied machine learning and other methods in software default prediction,but they need to be improved on data preprocessing. In previous studies, the researchers used Multi-Dimensional Scaling (MDS) to reduce the dimensionality of data samples. But the methods about how to use and improve MDS are few. A method of Threshold Correlation on MDS (TCMDS) is proposed in this paper. Based on MDS, Symmetrical Uncer- tainty (SU) is used to extract the features with high discriminatory and threshold correlation to remove the redundancy. The method makes the data with high discriminatory ,removing of redundancy ,improvement of forecasting efficiency. The results on NASA database show it has very good defect prediction effect.
出处
《计算机技术与发展》
2017年第12期20-22,27,共4页
Computer Technology and Development
基金
国家自然科学基金资助项目(61272273)