期刊文献+

基于协同过滤加权预测的主动学习缺失值填补算法 被引量:2

Missing value imputation with active learning based on collaborative filtering weighted prediction
下载PDF
导出
摘要 在机器学习应用中,缺失值填补作为一种预处理技术,能有效提高数据的可用性,然而在缺失值较多或不均衡时,这些技术的效果并不理想.主动学习场景允许机器与用户交互,以获取少量关键数据,提高分类精度.针对主动获取数据量有限的问题,提出基于协同过滤加权预测的主动学习缺失值填补算法(Collaborative Filtering weighted prediction based Active Learning,CFAL).首先采用基于样本和基于属性的协同过滤方法分别预测缺失值;然后根据预测值的差异对数据进行排序,差异大的少量数据进行主动获取,差异小的少量数据利用预测值的平均值进行填补;重复该过程直到主动获取数据达到所给阈值上限,其余缺失值则使用预测值均值填补.实验在七个UCI常用数据集上进行,结果表明,与流行的几种填补算法相比,CFAL算法能更好地提升数据质量,应用于C4.5,kNN等算法能获得更高的分类精度. In machine learning applications,missing value imputation is an effective preprocessing technique designed to increase data availability.However,if there are many missing values or the values of different attributes are imbalanced,these techniques may not produce satisfactory results.The active learning scenario allows the machine to interact with the users(also known as oracle)to get a small amount of critical data and improve classification accuracy.Most of the existing methods focus on obtaining class labels,and rarely discuss obtaining missing values.This paper studies the active learning problem,in which the number of missing values which can be actively obtained is pre-specified.We propose a missing value imputation algorithm called Collaborative Filtering weighted prediction based Active Learning(CFAL).First,both user-based and item-based collaborative filtering approaches are employed to predict missing values.Second,the missing values are sorted according to the bias of different prediction approaches.Missing values with high deviation are actively obtained,while those with low deviation are filled with the average prediction.This process repeats until the number of active acquisitions achieves the pre-specified value.Remaining missing values are filled with average prediction.We compare CFAL with popular missing valueimputation algorithms including EBN(Imputation algorithm of missing values based on EM and Bayesian network),Mean,NB(Na6 ve-Bayes),and kNN(k Nearest Neighbors)on seven popular UCI(University of California,Irvine)datasets.Results show while coupled with classifiers such as C4.5 and kNN,CFAL produces better classification accuracy than its counterparts.
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2018年第4期758-765,共8页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(61379089)
关键词 数据缺失 协同过滤 预测填补 主动学习 分类 data missing collaborative filtering predictive imputation active learning classification
  • 相关文献

参考文献4

二级参考文献24

共引文献172

同被引文献5

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部