摘要
由于二手车推荐的数据集具有非平衡特性,因此,二手车推荐可视为非平衡分类问题,可借助解决非平衡分类问题的方法来实现二手车推荐。本文对非平衡数据分类的数据集重构进行研究,通过分析合成少数类过采样方法(Synthetic Minority Over-sampling Technique,SMOTE)的特点与不足,提出合成少数类过采样过滤器方法 (Synthetic Minority Oversampling Technique Filter,Smote Filter),对SMOTE方法合成样本进行过滤,减少合成样本中的噪声数据,提高训练样本"质量"。使用支持向量机对SMOTE合成的数据和Smote Filter合成的数据进行实验对比,结果表明Smote Filter方法相较传统的SMOTE过采样方法,提高了二手车推荐中少数类的预测精度,提升了对二手车推荐的整体预测性能。
Due to the fact the used-car data have unbalanced characteristics , recommendation of used-cars boils down to unbal-anced data classification problem and it can be solved with the unbalanced classification methods .In this paper , with the focus on reconstruction of the trainning data set and by an analysis of characteristics and deficiency of the SMOTE over -sampling method , we propose the Synthetic Minority Over-sampling Technique Filter , or SmoteFilter for short .It works by filtering the data genera-ted by SMOTE over-sampling and reduces the noise in generated data .Based on support vector machine using data generated by SMOTE and SmoteFilter , the experimental study shows that SmoteFilter method has better effect on predicting accuracy of minority class than the SMOTE method , improving the prediction performance of vehicle recommendation .
出处
《计算机与现代化》
2016年第7期118-123,共6页
Computer and Modernization
基金
国家自然科学基金资助项目(61403195)
江苏省自然科学基金资助项目(SBK2014042586)
关键词
二手车推荐
分类
非平衡数据
过采样
支持向量机
used-car recommendation
classification
imbalanced dataset
over-sampling
support vector machine