期刊文献+

Spark环境下基于综合权重的不平衡数据集成分类方法 被引量:7

Integrated Classification Method of Imbalanced Data Based on Comprehensive Weight in Spark
下载PDF
导出
摘要 不平衡数据分类经常面临样本严重不平衡、少数类样本分类精度低的问题,随着数据规模增大,分类效率也成为了瓶颈问题.针对以上问题,本文结合spark高效的数据处理能力,提出了一种Spark环境下基于综合权重的不平衡数据集成分类方法.该方法首先依照多数类样本中每类样本的权重以及少数类样本量获得的综合权重进行采样,并与少数类样本组成平衡规模的训练数据集;其次,采用基于相关性的特征选择方法选择最优的特征子集,并对随机森林算法进行改进优化以及利用其获得子分类器.最后在Spark环境下,以UCI数据集进行实验验证.实验结果表明本文方法不仅提高了整体分类精度,而且提升了分类效率. Imbalanced data classification often faces the problem of severe sample imbalance and lowaccuracy of minority sample classification,and with the increase of data size,classification efficiency has also become a bottleneck problem. In viewof the above problems,combined with the efficient data processing ability of Spark,this paper proposes an integrated classification method of imbalanced data based on comprehensive weight in Spark environment. Firstly,the method samples by comprehensive weight which obtained by in accordance with weight of each class of samples in majority class samples and samples of minority class amount from the original sample. and form a balanced scale of training data set with samples of minority class;Secondly,we select the optimal feature subset based on the correlation based feature selection method to improve and optimize the random forest algorithm,and use it to get the sub classifiers;Finally,in the Spark environment,using UCI data set experimental verification. The experimental results showthat the proposed method not only improves the accuracy of the overall classification,but also improves the classification efficiency.
作者 丁家满 王思晨 贾连印 游进国 姜瑛 DING Jia-man;WANG Si-chen;JIA Lian-yin;YOU Jin-guo;JIANG Ying(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2019年第2期255-259,共5页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(51467007 61562054 61462050)资助
关键词 不平衡数据分类 样本采集 综合权重 随机森林 SPARK imbalance data classification sample sampling comprehensive weight random forest Spark
  • 相关文献

参考文献11

二级参考文献181

共引文献270

同被引文献64

引证文献7

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部