期刊文献+

高维相关性缺失数据的分块填补算法研究 被引量:6

Research on Block Imputation Algorithm for High Dimensional Correlation Missing Data
下载PDF
导出
摘要 研究了高维相关性缺失数据的填补方法,提出了分块填补算法。该算法核心思想是:在填补数据的过程中会考虑变量之间的相互关系,仅利用与待填补数据有相关性的数据进行填补,从而降低不相关数据对缺失数据填补的影响,提高数据填补的准确度。同时,该算法能够并行处理缺失数据,从而提高数据填补效率,对于高维缺失数据的填补有重要意义。为了对分块情况未知的缺失数据进行分块,提出了基于k-means聚类的分块算法。大量的仿真实验和基于真实数据集的实验表明,对于相关性数据,分块填补算法能够有效地利用相关信息进行填补,从而提高数据填补准确度。 This paper studies the method of filling the high dimensional correlation missing data,and proposes a new imputation algorithm based on data block.The key idea of the algorithm is to consider the correlation between variables when filling missing data,and only use the data correlated with the missing data to fill,thereby reducing imputation effects of the missing data caused by the irrelevant data,and improving the accuracy of data imputation.At the same time,the proposed imputation algorithm can be implemented in a parallel way,so that it performs efficiently to fill the high dimensional missing data.In order to divide the missing data with unknown information about blocks into several blocks,this paper proposes a block algorithm based on k-means clustering.Simulation research and application show that the proposed imputation algorithm is more effective and accurate to handle the missing for the correlation high dimensional data with considering variables'block relationship than others with not.
作者 杨杰 杨虎 王鲁滨 金鑫 郭华 于亮亮 YANG Jie;YANG Hu;WANG Lubin;JIN Xin;GUO Hua;YU Liangliang(School of Information, Central University of Finance and Economics, Beijing 100081, China;Jingzhou Power Supply Company ICT Branch of State Grid Corporation, Jingzhou, Hubei 434000, China;Liaoning Power Supply Company ICT Branch of State Grid Corporation, Shenyang 110000, China)
出处 《计算机科学与探索》 CSCD 北大核心 2017年第10期1557-1569,共13页 Journal of Frontiers of Computer Science and Technology
基金 中央财经大学青年教师发展基金No.QJJ1510 国家电网科技部项目No.SGTYHT/14-JS-188~~
关键词 高维相关性数据 缺失数据 分块填补算法 high dimensional correlation data missing data block imputation algorithm
  • 相关文献

参考文献12

二级参考文献116

  • 1王双成,苑森淼.具有丢失数据的贝叶斯网络结构学习研究[J].软件学报,2004,15(7):1042-1048. 被引量:62
  • 2彭红毅,朱思铭,蒋春福.数据挖掘中基于ICA的缺失数据值的估计[J].计算机科学,2005,32(12):203-205. 被引量:9
  • 3张其文,李明.一种缺失数据的填补方法[J].兰州理工大学学报,2006,32(2):102-104. 被引量:7
  • 4Baraldi A.N. Enders C. K. An introduction to modern missing data analyses[J]. Journal of School Psychology. 2010(48 ) :5 - 37.
  • 5Angiulli F. lanni G. Palopoli L. On the complexity of inducing categorical and quantitative association rules [J]. Theoretical Computer Science. 2004(314) :217 - 249.
  • 6Huang,C. C. , A Case - Based Reasoning Model for Supporting Feature Weight and Missing Value Completion [ J ], Industrial and Information Management, NCKU. 2005.
  • 7Gustavo E. A. P. A. Batista and Maria Carolina Monard, AnAnalysis of Four Missing Data Treatment Methods for Supervised Learning[J], Applied Artificial Intelligence, 2003 ( 17 ) : 519 - 533.
  • 8Liu, W.Z, White, A.P. , Thompson, S.G. and Bramer, M. A.Techniques for Dealing with Missing Values in Classification [ J ] , International Symposium on intelligent Data Analysis, 1997:527 - 536.
  • 9Liang, T. H., Wang, C. Y., and Yang, Y. H. A study ofImputation Missing Data for Household Income[J], Journal of Data Analysis, 2006(4) :75 - 101.
  • 10Agrawal, R. and Srikant, R., Fast Algorithm for MiningAssociation Rules [ C ] , Proc. 20th Int'l Conf. Very Large Data Bases, Santiago, Chile, 1994. 487-499.

共引文献142

同被引文献43

引证文献6

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部