摘要
实际应用中存在大量的丢失数据的数据集,对丢失数据的处理已成为目前分类领域的研究热点。分析和比较了几种通用的丢失数据填充算法,并提出一种新的基于EM和贝叶斯网络的丢失数据填充算法。算法利用朴素贝叶斯估计出EM算法初值,然后将EM和贝叶斯网络结合进行迭代确定最终更新器,同时得到填充后的完整数据集。实验结果表明,与经典填充算法相比,新算法具有更高的分类准确率,且节省了大量开销。
Dataset with missing values is quite common in real applications,and handling missing values has become a research hot issue in the classification field.This paper analyzes and compares several popular missing values imputation algorithms,and has proposed a novel imputation algorithm for missing values based on EM(Expectation Maximization) and Bayesian network.In this algorithm,the Nave Bayesian is employed to estimate the initial values of EM algorithm,and the EM inspired approach for filling up missing values is incorporated to Bayesian network learning with the objective of ensuring the ultimate updater.As a result,the complete dataset is got after imputation.Experiment results demonstrate that the proposed algorithm enables much higher classification accuracy and lower cost when compared with other classical imputation algorithm.
出处
《计算机工程与应用》
CSCD
北大核心
2010年第5期123-125,共3页
Computer Engineering and Applications
基金
国家杰出青年基金No.60425310~~