摘要
基因选择算法是辅助生物学分析最重要的方法之一,但这类统计学算法受样本量相对基因数目过少的困扰。提出一种结合Gene Ontology(GO)注释信息的基因选择算法,用GO注释接近基因的方差的加权平均进行修正,增强小样本量下对总体的估计,进而寻找差异表达基因。将该算法与其他5种常见算法对比,以选择出的基因为特征构建分类器,以分类器的可靠性作为衡量算法的标准。3组芯片实验的结果表明,该算法在小样本情况下具有一定优势。亦有Pubmed文献证明,该算法可以鉴别出其他算法未曾发现的致病基因。该方法所建立起来的框架,是把生物学注释信息引入算法改进的一种有效尝试。
Gene selection is one of the most important methods for facilitating biological analysis. However, such statistical methods suffer from the small number of samples relative to the features. A gene selection algorithm based on Gene Ontology (GO) annotations was proposed in this paper. Weighted mean of variance pooled from all similar genes took the place of the original single gene variance, which was intended to enhance the estimation of the variance and in turn to find differentially expressed genes. The proposed algorithm was compared with 5 other conventional gene selection algorithms. Classifiers were constructed on the selected features, and the reliability of each of the classifiers was used as a measure of the algorithm. Results from 3 sets of microarray experiments showed that in case of small number of samples, the proposed algorithm had advantages. There also existed evidence from Pubmed to show that some disease correlated genes were only discovered by our algorithm. The framework created by this algorithm is an effective attempt of introducing biological annotations into improvements of statistical methods.
出处
《中国生物医学工程学报》
CAS
CSCD
北大核心
2009年第5期696-700,706,共6页
Chinese Journal of Biomedical Engineering
关键词
基因芯片
基因选择
T检验
置换检验
GO
microarray
gene selection
t-test
permutation test
Gene Ontology(GO)