期刊文献+

基于聚类的连续型数据缺失值充填方法 被引量:12

Clustering-based Missing Value Filling Method for Continuous Data
下载PDF
导出
摘要 在大数据应用中,多数建模方法是在完备数据集基础上进行的,但在数据采集过程或存储过程中容易出现数据缺失的现象,导致无法建模。为此,提出一种基于聚类的递归充填方法。使用同类簇的均值对不完备数据进行预填充,形成初始完备数据集,针对得到的完整数据进行聚类,并运用同类簇的均值修正初始充填值。根据充填效果误差判定充填稳定性,并进行多次递归聚类修正充填值,直到前后两次充填较为稳定或迭代次数超过阈值时停止迭代。实验结果表明,与均值充填、K最近邻充填、聚类充填及粗糙集不完备数据分析等方法相比,该方法能够进行更为精准的充填,使得最终充填更加接近真实数据。 In big data applications,most modeling methods are based on a complete data set,but data missing in the data acquisition process or storing process tend to result in failure to modeling.Therefore,a clustering-based recursive filling method is proposed.The incomplete data is pre-filled using the mean of the same cluster to form an initial complete data set.The complete data obtained are clustered,and the initial filling is corrected using the mean of the same cluster.The filling stability is determined according to the deviation of filling results,and the filling value is corrected through multiple times of recursive clustering until the last two times of filling is stable or the number of iterations exceeds the threshold.Experimental results show that compared with the methods of mean filling,K nearest neighbor filling,cluster filling and incomplete data analysis for rough sets,the method can implement more precise filling,making the final filling more close to real data.
作者 李国和 杨绍伟 吴卫江 郑艺峰 LI Guohe;YANG Shaowei;WU Weijiang;ZHENG Yifeng(Beijing Key Lab of Petroleum Data Mining ,China University of Petroleum(Beijing), Beijing 102249,China;College of Geophysics and Information Engineering,China University of Petroleum(Beijing), Beijing 102249,China;Key Laboratory of Data Science and Intelligence Application ,Minnan Normal University,Zhangzhou,Fujian 363000,China;School of Computer Sciences,Minnan Normal University,Zhangzhou,Fujian 363000,China)
出处 《计算机工程》 CAS CSCD 北大核心 2019年第9期32-39,共8页 Computer Engineering
基金 国家自然科学基金(61701213) 国家油气重点专项子课题(G-5800-08-ZS-WX) 中国石油大学(北京)克拉玛依校区科研启动基金(RCYJ2016B-03-001) 福建省教育厅中青年基金(JA15300)
关键词 缺失值 预充填 聚类 递归充填 平方误差 missing value prefilling clustering recursive filling square error
  • 相关文献

参考文献7

二级参考文献81

  • 1刘华元,袁琴琴,王保保.并行数据挖掘算法综述[J].电子科技,2006,19(1):65-68. 被引量:15
  • 2TROYANSKAYA O,CANTOR M,SHERLOCK G,et al.Missing value estimation methods for DNA microarrays[J]. Bioinformatics,2001,17:520-525.
  • 3SHIGEYUKI OBA, MASA-AKI SATO,ICHIRO TAKEMASA,et al.A Bayesian missing value estimation method for gene expression profile data[J]. Bioinformatics,2003,19(16) .
  • 4KIMY H,GOLUBZ GH,PARKY H.Missing Value Estimation for DNA Microarray Gene Expression Data: Local Least Squares Imputation[J]. Bioinformatics,2004.
  • 5KI-YEOL KIM, BYOUNG-JIN KIM,GWAN-SU YI.Reuse of imputed data in microarray analysis increases imputation efficiency[J].BMC Bioinformatics 2004,5:160.
  • 6贾俊平.统计学[M].北京:中国人民大学出版社,2002..
  • 7SPELLMAN PT,SHERLOCK G,ZHANG MQ,et al.Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization[J].Mol Biol Cell,1998,9(12):3273-3297.
  • 8DERISI JL,IYER VR,BROWN PO.Exploring the metabolic and genetic control of gene xpression on a genomic scale[J]. Science,1997,278,680-686.
  • 9GASCH AP,SPELLMAN PT,KAO CM,et al.Genomic expression programs in the response of yeast cells to environmental changes[J]. Mol Biol Cell,2000,11(12):4241-4257.
  • 10DUDOIT S,YANG YH,CALLOW MJ,et al.Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments[J].Statistica Sinica,2002,12(1):111-139.

共引文献67

同被引文献81

引证文献12

二级引证文献52

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部