期刊文献+

大数据分析下不完备数据多重准确填补仿真 被引量:3

Incomplete Data Multiple Precision Filling Simulation under Big Data Analysis
下载PDF
导出
摘要 对大数据分析下的不完备数据进行填补,能够有效提高数据的利用率。对不完备数据进行多重准确填补,需要计算所有数据的向量属性均值与标准差,并将不完备数据填补模拟中重复应用。传统方法对数据填补变量间关系予以考虑,根据与待填补数据之间的相关性完成缺失填补,但忽略了计算所有数据的标准差,导致填补效率低。提出基于logistic的大数据分析下不完备数据多重准确填补方法。对所有数据向量属性值均值与标准差进行计算,采用估计的形式得到数据平均向量与协方差函数,并对各观察对象缺失值进行独立模拟填补,通过logistic回归模型选择存在缺失值的变量所需填补值,得到完备数据。重新估计数据平均向量与协方差函数,并将其在不完备数据填补模拟中重复应用。对上述过程进行迭代,直到达到迭代条件,将不完备数据多重填补结果输出。实验表明,上述方法填补效率较高,可为该领域研究发展奠定基础。 Traditional method ignores to calculate all the standard deviation of data, resulting in the low efficiency. This paper focuses on a multiple accuracy imputation method for incomplete data based on Logistic in big data analysis. Firstly, mean and standard deviation of all data vector attribute values were calculated, and then the form of estimation was used to obtain average vector and covariance function of data. Meanwhile, the missing value of each observed object was simulated and filled independently. Moreover, the logistic regression model was used to choose the imputed data needed by variable with the missing value, and then get the complete data. In addition, average vector and covariance function of data were estimated again, which were applied to simulation of incomplete data imputation repeatedly. Finally, the above process was iterated until reaching the iteration condition. Thus, the result of multiple imputations of incomplete data was output. Simulations prove that the proposed method has high-efficient data imputation, which can lay the foundation for the research and development in this field.
作者 王丽雯 黄旭 WANG Li-wen;HUANG Xu(Xi'an University of Science and Technology,Xi'an Shanxi 710054,China)
机构地区 西安科技大学
出处 《计算机仿真》 北大核心 2019年第7期367-370,共4页 Computer Simulation
关键词 大数据分析 不完备数据 多重填补 Big data analysis Incomplete data Multiple imputations
  • 相关文献

参考文献10

二级参考文献89

共引文献86

同被引文献32

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部