期刊文献+

基于分类的加速EM缺失数据填充算法 被引量:1

Accelerating EM Missing Data Filling Algorithm Based on the Clustering
下载PDF
导出
摘要 在数据挖掘的整个过程中,EM算法因其数值计算的稳定性、实现上的简单性,可靠的全局收敛性,被广泛应用于处理数据不完整问题。针对EM算法收敛速度慢,算法高度依赖初始值的选择,使用KNN算法的分类结果作为EM算法的初始使用范围,KNN算法根据挖掘目的的不同选择不同的特性,然后利用增量式EM(IEM)算法按E步M步迭代反复求精,快速有效地得出填充缺失数据的最优值;该算法大大加快了收敛速度,加强了聚类的稳定性,数据填充效果显著。 In the whole process of data mining, the EM algorithm is widely applied to dealing with incomplete data for its numerical stability, simplicity of implementation, reliable global convergence. the main disadvantage of the EM is slow convergence speed, the algorithm is highly dependent on the initial value of the option, In this paper, the clustering results use kNN Classification as the initial scope of EM algorithm, according to the different choice of different characteristics of mining purposes, then use incremental EM algorithm (IEM) step by step EM iterative refinement repeatedly, it obtains the optimal value of filling missing data quickly and efficiently, it is concluded that the optimal value of filling missing data experimental results show that the algorithm of this paper to speed up the convergence rate, strengthened the stability of clustering, data filling effect is remarkable.
作者 孙华艳 李业丽 字云飞 韩旭 管欣鑫 周楚风 SUN Huayan;LI Yeli;ZI Yunfei;HAN Xu;GUAN Xinxin;ZHOU Chufeng(School of Information Engineering,Beijing Institute of Graphic Communication,Beijing 102600,China)
出处 《北京印刷学院学报》 2018年第9期98-102,共5页 Journal of Beijing Institute of Graphic Communication
基金 北京市科技创新服务能力协调创新项目(PXM2016_014223_000025)
关键词 KNN分类 EM算法 增量式EM算法 收敛速度 稳定聚类 缺失数据填充 KNN classification EM algoritbm incremental EM algoritbm convergence speed stableclustering missing data filling
  • 相关文献

参考文献9

二级参考文献60

  • 1彭红毅,朱思铭,蒋春福.数据挖掘中基于ICA的缺失数据值的估计[J].计算机科学,2005,32(12):203-205. 被引量:9
  • 2Vassilis Athitsos, et al., (2008),Nearest Neighbor Retrieval Using Distance-Based Hashing[C].ICDE,327-336.
  • 3Cover, T.M. and Hart, P.E.(1967). Nearest neighbor pattern classification [M]. IEEE Transactions on Information Theory, Vol. 13, No. 1, pp. 21 - 27.
  • 4Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, series B, Vot. 39, pp. 1 - 38.
  • 5Han J and Kamber, M., (2006), Data Mining: Concepts and Techniques (2nd edition)[M].Morgan Kaufmann publications.2006.
  • 6Little R. and Rubin D. (2002)..Statistical Analysis with Missing Data[M]. Wiley, 2002.
  • 7Lakshminarayan K,Harp S A,Samad T.Imputation of missing data in industrial databases[J].Applied Intelligence,1999,11:259-275.
  • 8Li K H.Imputation using Markov chains[J].Journal of Statisticalt Comput Simul,1988,30:57-79.
  • 9Little R J,Rubin D B.Statistical analysis with missing data[M].[S.l] :John Wiley and Sons,1987.
  • 10Gustavo E A,Batista P A,Monard M C.An analysis of four missing data treatment methods for supervised learning[J].Applied Artificial Intelligence,2003,17(5/6):519-533.

共引文献121

同被引文献23

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部