期刊文献+

基于马氏距离的缺失数据填充算法 被引量:6

Mahalanobis-based Algorithm for Imputing Missing Data
下载PDF
导出
摘要 最近邻算法由于操作简单,效果显著,无论在科研还是实际生活中都具有广泛应用。文章首先解释了基于欧式距离的最近邻算法在计算两个记录之间距离方面的不足,然后提出了基于马氏距离的最近邻算法,真实数据集的实验结果显示,改进后的最近邻算法能取得较好的成绩。 Nearest neighbor(NN) algorithm is applied widely in both scientific research and real application because it can be operated easily and the algorithm's performance usually is excellent than the corresponding methods.In this paper, we analyze the advantages of Euclidean-based NN algorithm, then propose Mahalanobis-based NN algorithm in which Mahalanobis distance metric is designed to replace the Euclidean distance for computing the distance between two records.Finally, the experimental results on real datasets show the improved method outperform the original one.
机构地区 钦州学院
出处 《微计算机信息》 2010年第9期225-226,215,共3页 Control & Automation
基金 基金申请人:刘星毅 项目名称:工业数据集缺失数据的填充研究 基金颁发部门:广西科技厅(桂科自0899018) 基金申请人:刘星毅 项目名称:社会调查中缺失数据的研究 基金颁发部门:广西教育厅(200808MS062)
关键词 最近邻算法 数据缺失填充 马氏距离 nearest neighbor algorithm missing data imputation Mahalanobis distance
  • 相关文献

参考文献6

  • 1Vassilis Athitsos, et al., (2008),Nearest Neighbor Retrieval Using Distance-Based Hashing[C].ICDE,327-336.
  • 2Cover, T.M. and Hart, P.E.(1967). Nearest neighbor pattern classification [M]. IEEE Transactions on Information Theory, Vol. 13, No. 1, pp. 21 - 27.
  • 3Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, series B, Vot. 39, pp. 1 - 38.
  • 4Han J and Kamber, M., (2006), Data Mining: Concepts and Techniques (2nd edition)[M].Morgan Kaufmann publications.2006.
  • 5Little R. and Rubin D. (2002)..Statistical Analysis with Missing Data[M]. Wiley, 2002.
  • 6刘星毅.GBNN-填充缺失属性值算法[J].微计算机信息,2007(05X):246-248. 被引量:6

二级参考文献2

共引文献5

同被引文献35

  • 1金连,王宏志,黄沈滨,高宏.基于Map-Reduce的大数据缺失值填充算法[J].计算机研究与发展,2013,50(S1):312-321. 被引量:18
  • 2杨涛,骆嘉伟,王艳,吴君浩.基于马氏距离的缺失值填充算法[J].计算机应用,2005,25(12):2868-2871. 被引量:24
  • 3刘星毅,农国才.几种不同缺失值填充方法的比较[J].南宁师范高等专科学校学报,2007,24(3):148-150. 被引量:8
  • 4LamisHawarah, Ana Simonet, Michel Simonet TIMC- IMAG. Dealing with missing values in a probabilistic decision tree during classification[C]//proceedings of the 6th IEEE International Conference on Data Unin- ing-workshops, washington, DC USA: IEEE, 2006. 325-329.
  • 5Zhang S C. Missing is useful missing values in cost- sensitive decision trees[J]. IEEE transactions knowl- edge and data Engineering. 2008,9(1) : 32-38.
  • 6JinlongSHI,ZhigangLUO. Missing value estimation for DNA microarray gene expression data with principal curves: 2010 [ C ]//International Conference on Bioin- formatics and Biomedical Technology. Haesun Park: IEEE, 2010: 262-265.
  • 7XiaofengZhu, ShichaoZhang. Missing value estination for mixed-attribute data sets[J]. IEEE transactions on knowledge and data engineering,2011,23(1) :110-121.
  • 8Little R,Rubin D.Statistical analysis with missing data[ M].2nd ed.New York:John Wiley and Sons,2002.
  • 9Huang C C,Lee H M.A grey-based nearest neighbor approach for miss-ing attribute value prediction [ J ].Applied Intelligence,2004,20(3):239-252.
  • 10Lakshminarayan K,Harp S A,Samad T.Imputation of missing data in industrial databases [ J ].Applied Intelligence,1999,11(3):259-275.

引证文献6

二级引证文献42

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部