摘要
基因表达数据时常出现缺失,阻碍了对基因表达的研究。提出了一种新的相似性度量方案——精简关联度,在此基础上,又提出了基于精简关联度的缺失数据迭代填补算法(RKNNimpute)。精简关联度是对灰色关联度的一种改进,能达到与灰色关联度同样的效果,却显著降低了算法的时间复杂度。RKNNimpute算法以精简关联度作为相似度量,将填补后的基因扩充到近邻的候选基因集,通过迭代的方式填补其他缺失数据,提高了算法的填补效果和性能。选用时序、非时序、混合等不同类型的基因表达数据集进行了大量实验来评估RKNNimpute算法的性能。实验结果表明,精简关联度是一种高效的距离度量方法,所提出的RKNNimpute算法优于常规填补算法。
Gene expression data frequently suffers from missing value, which adversely affects downstream analysis. A new similarity metric method named reduced relational grade was proposed. Based on this, we presented the iterative im- putation algorithm for gene expression data (RKNNimpute). Reduced relational grade is an improvement of gray rela- tional grade. The former can achieve the same performance as the latter while greatly reducing the time complexity. RKNNimpute imputes missing value iteratively by considering the reduced relational grade as similarity metric and ex- panding the set of candidate genes to nearest neighbors with imputed genes, which improves the effect and performance of the imputation algorithm. We selected data sets of different kind, such as time series, non-time series and mixed, and then experimentally evaluated the proposed method. The results demonstrate that the reduced relational grade is effec- tive and RKNNimpute outperforms common imputation algorithms.
出处
《计算机科学》
CSCD
北大核心
2015年第11期251-255,283,共6页
Computer Science
基金
国家自然科学基金(U1433116)
江苏省"333"高层次人才工程
航空科学基金(20145752033)资助
关键词
基因表达数据
精简关联度
填补
迭代
缺失值
Gene expression data, Reduced relational grade, Imputation, Iteration, Missing value