期刊文献+

基于马氏距离和灰色分析的缺失值填充算法 被引量:6

Improved kNN algorithm based on Mahalanobis distance and gray analysis
下载PDF
导出
摘要 针对kNN算法中欧氏距离具有密度相关性敏感的缺点,提出综合马氏距离和灰色分析方法代替kNN算法中欧式距离的新算法,应用到缺失数据填充方面。其中马氏距离能解决密度相关明显的数据集,灰色分析方法能处理密度相关不明显的情况。因此,该算法能很好处理任何数据集,实验结果显示,算法在填充结果上明显优于现有的其他算法。 The Euclidean-based k-Nearest Neighbor (kNN) algorithm is restricted to the dataset without correlationsensitive on density. The author proposed an improved kNN algorithm based on Mahalanobis distance and gray analysis for imputing missing data to replace the existing Euclidean distance. The Mahalanobis distances can deal with the issue of correlation-sensitive on density, and the gray-analysis method can deal with the opposite case. Hence, the proposed method can deal with any kind of datasets, and the experimental results show the proposed method outperforms the existing algorithms.
作者 刘星毅
出处 《计算机应用》 CSCD 北大核心 2009年第9期2502-2504,2536,共4页 journal of Computer Applications
基金 广西自然科学基金资助项目(桂科自0899018) 广西教育厅科研项目(200808MS062)
关键词 数据预处理 缺失数据 最近邻算法 灰色分析 马氏距离 data preprocessing missing data Nearest Neighbor (NN) algorithm gray analysis Mahalanobis distance
  • 相关文献

参考文献11

  • 1COVER T M, HART P E. Nearest neighbor pattern classification [ J]. IEEE Transactions on Information Theory, 1967, 13( 1): 21 -27.
  • 2HAN J, KAMBER M. Data mining concepts and techniques [ M]. 2nd ed. San Francisco: Morgan Kaufmann Publishers, 2006.
  • 3SCHAFER J, GRAHAM J. Missing data: Our view of the state of the art [J]. Psychological Methods, 2002, 7(2): 147 -177.
  • 4LAKSHMINARAYAN K, HARP S A, SAMAD T. Imputation of missing data in industrial databases [ J]. Applied Intelligence, 1999, 11(3): 259-275.
  • 5LITTLE R, RUBIN D. Statistical analysis with missing data [ M]. 2nd ed. New York: John Wiley and Sons, 2002.
  • 6HUANG C C, LEE H M. A grey-based nearest neighbor approach for missing attribute value prediction [ J]. Applied Intelligence 2004, 20(3): 239 -252.
  • 7邓聚龙.灰色系统理论[M].武汉:华中工学院出版社,1984:1-30.
  • 8杨涛,骆嘉伟,王艳,吴君浩.基于马氏距离的缺失值填充算法[J].计算机应用,2005,25(12):2868-2871. 被引量:24
  • 9SPELLMAN P T, SHERLOCK G, ZHANG M Q, et al. Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by micro array hybridization [ J]. Molecular Biology of the Cell, 1998, 9(12) : 3273 -3297.
  • 10DERISI J L, IYER V R, BROWN P O. Exploring the metabolic and genetic control of gene xpression on a genomic scale [ J]. Science, 1997, 278(5338): 680-686.

二级参考文献17

  • 1TROYANSKAYA O,CANTOR M,SHERLOCK G,et al.Missing value estimation methods for DNA microarrays[J]. Bioinformatics,2001,17:520-525.
  • 2SHIGEYUKI OBA, MASA-AKI SATO,ICHIRO TAKEMASA,et al.A Bayesian missing value estimation method for gene expression profile data[J]. Bioinformatics,2003,19(16) .
  • 3KIMY H,GOLUBZ GH,PARKY H.Missing Value Estimation for DNA Microarray Gene Expression Data: Local Least Squares Imputation[J]. Bioinformatics,2004.
  • 4KI-YEOL KIM, BYOUNG-JIN KIM,GWAN-SU YI.Reuse of imputed data in microarray analysis increases imputation efficiency[J].BMC Bioinformatics 2004,5:160.
  • 5贾俊平.统计学[M].北京:中国人民大学出版社,2002..
  • 6SPELLMAN PT,SHERLOCK G,ZHANG MQ,et al.Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization[J].Mol Biol Cell,1998,9(12):3273-3297.
  • 7DERISI JL,IYER VR,BROWN PO.Exploring the metabolic and genetic control of gene xpression on a genomic scale[J]. Science,1997,278,680-686.
  • 8GASCH AP,SPELLMAN PT,KAO CM,et al.Genomic expression programs in the response of yeast cells to environmental changes[J]. Mol Biol Cell,2000,11(12):4241-4257.
  • 9DUDOIT S,YANG YH,CALLOW MJ,et al.Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments[J].Statistica Sinica,2002,12(1):111-139.
  • 10ARBEITMAN MN,FURLONG EEM,IMAM F,et al.Gene expression during the life cycle of Drosophila melanogaster[J].Science,2002,297(5590):2270-2275.

共引文献24

同被引文献50

引证文献6

二级引证文献32

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部