基因集分析方法统计理论探讨

Statistical Theory of Gene-set Analysis Methods

下载PDF

导出

摘要目的从统计理论角度探讨基因集分析方法,初步建立微阵列数据基因集统计分析理论框架。方法采用计算机模拟技术,比较不同原假设、理论分布生成方法在进行基因集分析时的统计学性质。结果自限性原假设方法ROC曲线下面积AUC为0.858。而竞争性原假设方法曲线下面积AUC为0.512。相同设定条件下,bootstrap方法的错误发现率(最高为0.015)低于permutation检验(最高为0.075);而permutation方法的检验效能(0.89)优于bootstrap法(0.67)。结论有效的基因集分析方法应在正确使用生物学注释基因的基础上,建立自限性原假设、采用基因表达水平标准化值构建基因集统计量并根据需求利用有效的随机化算法构建统计量的理论分布进行推断。 Objective To explore the gene set analysis methods theoretically,and construct the framew ork for dealing w ith microarray data.Methods Computer simulation technology w as used to compare the statistical performance of different gene set analysis approaches based on different null hypotheses and theoretic distribution generating method.Results The area under the ROC curve of competitive null hypothesis was 0.858,w hile that from self-contained null hypothesis w as 0.512.Under the same conditions,the false discovery rate(FDR) of permutation test(up to 0.075) w as higher than that of bootstrap test(up to 0.015) at the sacrifice of pow er to some extent,w hile the pow er of bootstrap test(0.67) w as low er than that of permutation test(0.89).Conclusion An effective gene set analysis method w as based not only on the proper use of annotation,but also on self-contained null hypothesis,appropriate gene set statistics established on normalized gene expression levels and the construction of theoretic distribution of statistics using suitable randomization algorithm.

作者曹文君侯国强李运明张威张扬陈长生

机构地区山西省长治医学院基础部山西省长治市妇幼保健医院四川省成都军区总医院第四军医大学军事预防医学院卫生统计学教研室

出处《中国卫生统计》 CSCD 北大核心 2013年第4期484-486,共3页 Chinese Journal of Health Statistics

基金国家自然科学基金资助项目(81172770)

关键词微阵列数据基因集方法统计理论 MONTE CARLO模拟 Microarray data Gene set analysis method Statistical theory Monte Carlo simulation

分类号 R195 [医药卫生—卫生统计学]

引文网络
相关文献

参考文献12

1Tian al, Greenberg SA, Kong SW, et al. Discovering statistically signifi- cant pathways in expression profiling studies. Proc Nat Acad Sci ,2005, 102 (38) : 13544-13549.
2Goeman J], Bhlmann P. Analyzing gene expression data in terms of gene sets : methodological issues. Bioinformatics, 2007,23 ( 8 ) : 980- 987.
3曹文君,李运明,陈长生.两种基因集分析方法的有效性比较[J].中国卫生统计,2009,26(5):462-465. 被引量：1
4Benjamini Y, Hochberg Y. Controlling the false discovery rate:a practi- cal and powerful approach to multiple Testing. J Roy Stat Soc, 1995,57 ( 1 ) :289-300.
5Khatri P, Drghici S. Ontological analysis of gene expression data: cur- rent tools, limitations, and open problems. Bioinformatics, 2005,21 ( 18 ) :3587-3595.
6Pavlidis P, Qin J, Arango V, et al. Using the gene ontology for microar- ray data mining : a comparison of methods and application to age effects in human prefrontal cortex. Neurochem. Res,2004,29 (6) : 1213-1222.
7Barry WT, Nobel AB, Wright FA. Significance analysis of functional categories in gene expression studies: a structured permutation ap- proach. Bioinformatics, 2005,21 : 1943-1949.
8Dinu I, Potter JD. Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics ,2007,8:242 ( 1 ) - ( 13 ).
9Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment a- nalysis:a knowledge-based approach for interpreting genome-wide ex- pression profiles. Proc Nat Acad Sci, 2005,102 ( 43 ) : 15545-15550.
10Hesterberg T, Moore DS, Monaghan S, et al. Bootstrap methods and Permutation tests. In DS Moore, GP McCabe ( eds. ), "Introduction to the Practice of Statistics," Freeman, New York,2005.

二级参考文献9

1Dinu I,Potter JD.Improving gene set analysis of microarray data bySAM-GS[].BMC Bioinformatics.2007
2Goeman JJ,Bhlmann P.Analyzing gene expression data in terms of genesets:methodological issues[].Bioinformatics.2007
3Goeman JJ,van de Geer SA,de Kort F,et al.A global test for groups ofgenes:testing association with a clinical outcome[].Bioinformatics.2004
4Liu Q,Dinu I,Adewale AJ,et al.Comparative evaluation of gene-set a-nalysis methods[].BMC Bioinformatics.2007
5Subramanian A,Tamayo P,Mootha VK,et al.Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles[].Proceedings of the National Academy of Sciences of the United States of America.2005
6Benjamini Y,Hochberg Y.Controlling the false discovery rate: a practical and powerful approach to multiple testing[].Journal of the Royal Statistical Society Series B Statistical Methodology.1995
7Storey,JD.A direct approach to false discovery rates[].Journal of the Royal Statistical Society Series B Statistical Methodology.2002
8Barry,WT,Nobel,AB,Wright,FA.Significance analysis of functional categories in gene expression studies: a structured permutation approach[].Bioinformatics.2005
9Draghici S,Khatri P,Martins R P,et al.Global functional profiling of gene expression[].Genomics.2003

1曹文君,陈长生,徐勇勇,李运明,谭志军.基于蒙特卡洛模拟分析不同基因集方法的效能[J].现代生物医学进展,2010,10(10):1963-1967.
2何义明,赵建华,毛丽萍.我院2006—2012年铜绿假单胞菌的耐药分析[J].齐齐哈尔医学院学报,2013,34(16):2430-2431. 被引量：2
3杨爽,王宇,杨立夫.手外科鲍曼不动杆菌临床分布及耐药性分析[J].实用手外科杂志,2014,28(1):69-70.
4姜良刚,徐正红,张镇西,朱京平,张云尧.用于Monte Carlo模拟的复杂生物组织模型[J].激光生物学报,2011,20(5):586-590. 被引量：1
5美国计划运用计算机模拟推动HIV药物研发[J].传染病网络动态,2006(5):15-15.
6黄宝华,陈荣,曾海山,王月云,谢树森.皮肤组织血液含量对皮肤光谱的影响[J].光谱学与光谱分析,2007,27(1):95-98. 被引量：11
7李剑平,陈冰泉.漫射近似在测量生物组织光学性质中的适用范围[J].应用光学,2005,26(1):20-24. 被引量：2
8单筱莜,刘迎,田会娟,高宗慧,王利军.生物组织中高阶光学参量对空间分辨漫反射的影响[J].光电子．激光,2006,17(6):767-771.
9王素珍,夏结来,郑亮.自适应设计中盲态和揭盲状态样本量调整的模拟比较[J].中国卫生统计,2009,26(5):477-479.
10王瑜华,杨洪钦,谢树森,叶真,苏毅明.离体正常乳腺组织350～850nm波段光谱特性[J].光谱学与光谱分析,2009,29(10):2751-2755.

中国卫生统计

2013年第4期

浏览历史

内容加载中请稍等...

基因集分析方法统计理论探讨

参考文献12

二级参考文献9

相关作者

相关机构

相关主题

浏览历史