摘要
提出了一种基于马氏距离的填充算法来估计基因表达数据集中的缺失数据。该算法通过基因之间的马氏距离来选择最近邻居基因,并将已得到的估计值应用到后续的估计过程中,然后采用信息论中熵值的概念计算最近邻居的加权系数,得到缺失数据的填充值。实验结果证明了该算法具有有效性,其性能优于其他基于最近邻居法的缺失值处理算法。
A imputation method based on Mahalanobis distance was proposed to estimate missing values in the gene expression data. The nearest neighbors were chosen by the Mahalanobis distance between genes, and then the concept of entropy was utilized to obtain estimations of missing values. The imputed values were used for the later imputation. Experiments prove that the method is valid and its performance is higher than the other imputation methods based on k-nearest neighbors for gene expression data.
出处
《计算机应用》
CSCD
北大核心
2005年第12期2868-2871,共4页
journal of Computer Applications
基金
湖南省自然科学基金(03JJY3095)
关键词
微阵列
缺失值估计
马氏距离
信息熵
microarray
missing value estimation
Mahalanobis distance
entropy