摘要
利用SAS软件对GEO的一个肺癌芯片实验进行挖掘。采用非参数检验,判别分析和回归分析对该芯片实验中14个核受体的表达信息进行分析。结果表明,在0.05显著性水平下,ER1、VDR、RARα和RORα四个基因在腺癌和鳞癌表达具有统计学差异;RARβ在复发组和非复发组表达有差异。判别分析结果显示VDR和RORα表达量可以对病理类型进行预测,但是总误判率很高(0.2389);RARβ和PPARα对判别是否复发的总误判率更高(0.3457)。建立回归方程预测病理类型,入选模型的变量也是VDR和RORα,两者OR分别为0.126和4.452。可见,基于SAS的多元统计方法是芯片数据挖掘的一种潜在方法,一旦芯片实验标准化,利用SAS对不同芯片实验数据整合分析的结论将有益于推动假说形成。
Multivariate statistics using SAS is applied to mine a dataset from GEO. Expression data of fourteen nuclear receptors in a lung cancer mieroarray experiment is analyzed by non - parameter test, diseriminant analysis and regression analysis. As a result, ER1, VDR, RARer and RORα is differentially expressed between adenoeareinoma and squamous cell carcinoma under signifieanee of 0.05 ; RARβ is differentially expressed between recurrent and non - recurrent cancer ; diseriminant analysis shows VDR and RORα together can predict pathotype, and RARβ and PPARα together can discriminate recurrence ; the false - rate is 0. 2389 and 0.3457, respectively. Logistic regression is established to predict pathotype and variables included are also VDR and RORα, with OR at 0. 126 and 4. 452, respectively. Therefore, multivariate statistics based on SAS is a potential way to mine mieroarray data and conclusions based on SAS integration of different mieroarray experiments might be helpful for establishing hypothesis once mieroarray experiments can be standardized.
出处
《生物信息学》
2010年第2期147-149,共3页
Chinese Journal of Bioinformatics
基金
北京市教委科研基金资助(2005年度)