摘要
目的从统计理论角度探讨基因集分析方法,初步建立微阵列数据基因集统计分析理论框架。方法采用计算机模拟技术,比较不同原假设、理论分布生成方法在进行基因集分析时的统计学性质。结果自限性原假设方法ROC曲线下面积AUC为0.858。而竞争性原假设方法曲线下面积AUC为0.512。相同设定条件下,bootstrap方法的错误发现率(最高为0.015)低于permutation检验(最高为0.075);而permutation方法的检验效能(0.89)优于bootstrap法(0.67)。结论有效的基因集分析方法应在正确使用生物学注释基因的基础上,建立自限性原假设、采用基因表达水平标准化值构建基因集统计量并根据需求利用有效的随机化算法构建统计量的理论分布进行推断。
Objective To explore the gene set analysis methods theoretically,and construct the framew ork for dealing w ith microarray data.Methods Computer simulation technology w as used to compare the statistical performance of different gene set analysis approaches based on different null hypotheses and theoretic distribution generating method.Results The area under the ROC curve of competitive null hypothesis was 0.858,w hile that from self-contained null hypothesis w as 0.512.Under the same conditions,the false discovery rate(FDR) of permutation test(up to 0.075) w as higher than that of bootstrap test(up to 0.015) at the sacrifice of pow er to some extent,w hile the pow er of bootstrap test(0.67) w as low er than that of permutation test(0.89).Conclusion An effective gene set analysis method w as based not only on the proper use of annotation,but also on self-contained null hypothesis,appropriate gene set statistics established on normalized gene expression levels and the construction of theoretic distribution of statistics using suitable randomization algorithm.
出处
《中国卫生统计》
CSCD
北大核心
2013年第4期484-486,共3页
Chinese Journal of Health Statistics
基金
国家自然科学基金资助项目(81172770)