Objective: Identification of colorectal cancer (CRC) metastasis genes is one of the most important issues in CRC research. For the purpose of mining CRC metastasis-associated genes, an integrated analysis of mJcroa...Objective: Identification of colorectal cancer (CRC) metastasis genes is one of the most important issues in CRC research. For the purpose of mining CRC metastasis-associated genes, an integrated analysis of mJcroarray data was presented, by combined with evidence acquired from comparative genornic hybridization (CGH) data. Methods: Gene expression profile data of CRC samples were obtained at Gene Expression Omnibus (GEO) website. The 15 important chromosomal aberration sites detected by using CGH technology were used for integrated genomic and transcriptomic analysis. Significant Analysis of Microarray (SAM) was used to detect significantly differentially expressed genes across the whole genome. The overlapping genes were selected in their corresponding chromosomal aberration regions, and analyzed by using the Database for Annotation, Visualization and Integrated Discovery (DAVID). Finally, SVM-T-RFE gene selection algorithm was applied to identify ted genes in CRC. Results: A minimum gene set was obtained with the minimum number [14] of genes, and the highest classification accuracy (100%) in both PRI and META datasets. A fraction of selected genes are associated with CRC or its metastasis. Conclusions- Our results demonstrated that integration analysis is an effective strategy for mining cancer- associated genes.展开更多
目的使用高斯核函数和欧式距离函数改进微阵列显著分析法(significance analysis of microarray,SAM)得到MSAM1法(modified significance analysis of microarray-1,MSAM1)和MSAM2法(modified significance analysis of microarray-2,MS...目的使用高斯核函数和欧式距离函数改进微阵列显著分析法(significance analysis of microarray,SAM)得到MSAM1法(modified significance analysis of microarray-1,MSAM1)和MSAM2法(modified significance analysis of microarray-2,MSAM2),与SAM法、Relief法、支持向量机递归特征消除法(support vector machine recursive feature elimination, SVM-RFE)进行对比,评价在基因表达数据中MSAM1法、MSAM2法的基因选择和分类预测能力。方法从Bioconductor中的golubEsets包获得leukemia数据集(Golub等人给出了该数据集所包含的50个差异基因),运用R软件实现5种算法,分别用正确率和ROC曲线下面积即AUC值评价基因选择能力和分类预测能力,用Kruskal-Wallis H检验比较5种方法的正确率和AUC值的组间差异,进一步的两两比较采用SNK-q检验。结果正确率和AUC值均表现为MSAM1和MSAM2最优,SAM和SVM-RFE法次之,Relief法排在最后;5种方法的组间差异有统计学意义(H=150.333,P<0.0001和H=293.2579,P<0.0001),两两比较结果显示虽然MSAM1和MSAM2之间差异无统计学意义(P>0.05),但两种方法与其他3种方法之间差异均有统计学意义(P<0.05)。结论用高斯核函数和欧式距离函数改进的加权SAM法提高了SAM法的基因选择和分类预测能力,在实际基因表达数据的应用中可以得到更为稳定的分析结果。展开更多
基金supported by a grant from the National Natural Science Foundation of China(Grant No.61373057)a grant from the Zhejiang Provincial Natural Science Foundation of China(Grant No.Y1110763)
文摘Objective: Identification of colorectal cancer (CRC) metastasis genes is one of the most important issues in CRC research. For the purpose of mining CRC metastasis-associated genes, an integrated analysis of mJcroarray data was presented, by combined with evidence acquired from comparative genornic hybridization (CGH) data. Methods: Gene expression profile data of CRC samples were obtained at Gene Expression Omnibus (GEO) website. The 15 important chromosomal aberration sites detected by using CGH technology were used for integrated genomic and transcriptomic analysis. Significant Analysis of Microarray (SAM) was used to detect significantly differentially expressed genes across the whole genome. The overlapping genes were selected in their corresponding chromosomal aberration regions, and analyzed by using the Database for Annotation, Visualization and Integrated Discovery (DAVID). Finally, SVM-T-RFE gene selection algorithm was applied to identify ted genes in CRC. Results: A minimum gene set was obtained with the minimum number [14] of genes, and the highest classification accuracy (100%) in both PRI and META datasets. A fraction of selected genes are associated with CRC or its metastasis. Conclusions- Our results demonstrated that integration analysis is an effective strategy for mining cancer- associated genes.