摘要
目的用较少的基因标签准确地来识别结肠癌患者。方法根据基因之间的相关性,采取模糊聚类分析法对大量基因进行聚类。引入各基因与基因中心向量的距离建立优化模型。其次根据基因在样本中的分布特征将基因分为突变基因和无关基因。综合基因的这两个特征建立优化模型。为了提高识别率,采用蒙特卡罗方法考虑了基因中的噪声。最后,考虑到已知的基因标签的特征,重新建立了优化模型。结果在不考虑噪声时,得到8个基因标签,正确识别率为72.6%;加入噪声之后正确识别率为85.00%;加入已知基因标签之后正确识别率为87.1%;加入符合已知基因标签特征的全部基因标签得到25个基因标签,识别率提高到了96.7%。结论考虑的基因特征越多,正确识别率越高。
Aim To identify colon cancer patients accurately by using less genetic tags. Methods According to the correlations in variety of genes, the fuzzy cluster analysis method is utilized in clustering large quantities of genes, distence between the single gene and the gene center vector is defined to build an optimization model. On the basis of the distribution features of genes in sample, genes are divided into mutant and irrelevant genes. The two features are taken together to establish an optimization model. To increase the recognition rate, the Monte Carlo method is used when considering noise in genes. Finally, the optimization model is again established on the basis of features about the known genes tags. Results Without considering noise, the recognition rate is 72. 6 percent by u- sing 8 genetic tags identify samples; considering noise, the recognition rate increases to 85 percent; when the known genes tags are considered, the recognition rate is again increased to 87.1 percent; and aceordingto all genes owning characteristics of the known genetic tags, 25 genetic tags are got, the recognition rate increases to 96.7 per- cent. Conclusion The more features of genes are considered, the higher the recognition rate is.
出处
《西北大学学报(自然科学版)》
CAS
CSCD
北大核心
2012年第5期713-718,共6页
Journal of Northwest University(Natural Science Edition)
基金
陕西省教育厅科研基金资助项目(11JK0511)
关键词
基因特征
模糊聚类分析
相关性
基因标签
蒙特卡罗方法
识别率
genetic characteristic
fuzzy clustering analysis
correlation
genetic tags
Monte Carlo method
recog-nition rate