摘要
大规模基因表达谱为肿瘤诊断提供了更为可靠和细致的生物数据,但相关基因的选取是对这些数据进行分析的关键。本文从Kullback-Leiber判别信息的角度对于肿瘤相关基因的选取进行了研究。根据肿瘤相关基因和无关基因的表达水平值分布的特性,我们提出了一种基于信息准则的基因选取方法。进一步,我们将这种方法应用到肿瘤诊断上,并根据支持向量机(SVM)对相关基因表达谱数据进行训练建立肿瘤诊断模型。实验结果表明这种方法是有效的,依此所建立的诊断模型可使得在结肠癌数据集和白血病数据集上的诊断(预测)正确率分别高达94.4%和100%石。
Large scale gene expression profiles have provided more reliable and detailed biological information for tumor diagnosis. However, the key to analysis of these biological data is to find out the genes that are related to a tumor. In this paper, we study this gene selection problem from a pointview of Kullback-Leiber discrimination information. According to the characteristics of the probability distributions of the related and unrelated gene expression values to a tumor, we propose an information criterion based gene selection method. Then, we construct the tumor diagnosis system by the support vector machine trained on the set of the related gene expression profiles. It is demonstrated by the experiments that the information criterion based gene selection method is efficient and the constructed tumor diagnosis system can reach 94.4% correctness rate of diagnosis on colon datased and 100% correctness rate of diagnosis on leukemia dataset, respectively.
出处
《信号处理》
CSCD
北大核心
2005年第3期312-315,共4页
Journal of Signal Processing
基金
国家自然科学基金项目60071004资助