摘要
针对成分数据中的零值或近似零值,导致对成分数据作对数比变换后出现了缺失数据的现象,提出了一种基于均值插补法的修正EM算法来估计变换后的缺失数据.该方法首先对缺失数据所在列中的数据用Bootstrap方法反复抽样,然后用抽样得到的样本均值作为EM算法的初始值来估计缺失数据,最后将缺失数据的估计值代入对数比变换的逆变换求得原始成分数据零值的近似估计.实验结果表明:在误差允许的范围内,基于均值插补法的修正EM算法计算量少,操作简单,因此对于数据量大或者缺失率高的数据不失为一个好的插补法.
A new method, a modified EM algorithm based on the mean interpolation method, was proposed to estimate the missing data, which was caused by the zeros or rounded zeros in compositional after the log-ration transformation. This method first used a Bootstrap sampling technique in the data of the columns which con- tained missing data. Then missing data were estimated by applying the EM Algorithm with the mean of the samples as its initial value. Finally original zeros were estimated in compositional dada using the inverse trans- formation of log-ration transformation into the estimation of missing data. The experimental results illustrate that modified EM algorithms based on the mean interpolation has less computation and simple operation. So it can be considered as a good imputation method when the size of sample or the rate of missing is large.
出处
《中北大学学报(自然科学版)》
CAS
北大核心
2013年第5期485-487,499,共4页
Journal of North University of China(Natural Science Edition)
基金
国家自然科学基金重点项目(71031006
41101440)
山西省青年科技基金资助项目(201202105-1)