摘要
在蛋白质组学研究中,差异表达分析能够帮助寻找与疾病相关的重要蛋白和生物标志物。目前广泛应用的差异表达分析方法大多是在单蛋白水平进行的,但是很多复杂疾病或者表型是由一些关键模块或通路上多个蛋白的微弱变化累加所致。本研究中,我们比较评估了5种基于模块的差异表达统计学方法,这些方法相比单蛋白差异表达分析方法,应该能够找到更多跟癌症相关的功能模块。通过模拟数据集的评估结果表明,基于Mean的方法在不同的模拟数据集中都展现了较好的统计效力,L2Norm、GM和FM 3种方法的统计效力基本相同,而WKS的统计效力较差。此外,我们将5种方法应用到结直肠癌患者样本蛋白质表达谱数据分析中。结果显示,单蛋白水平差异分析方法和基于模块的统计分析方法都找到了跟端粒酶相关的功能模块,而基于5种方法综合排名的结果找到了更多与癌症密切关联的途径,包括p53调节的内源性细胞凋亡的功能模块。
Differential expression analysis can help us identify important disease-related proteins and biomarkers in proteomics study. The commonly used statistical methods are designed at single protein-level. Nevertheless,many complex diseases and clinical phenotypes are proved to be associated with the accumulation of protein expression of subtle changes in modules or pathways. In this study, five module/pathway based statistical methods were systematically evaluated. These methods, compared with single protein differential expression analysis,should be able to find more functional modules related to cancer. In the simulation study, we found that mean-based method achieved better statistical validity in different simulated datasets.The statistical validity of L2 Norm, GM and FM was basically the same, while that of WKS was poor. In addition, we applied the five methods to data analysis of protein expression profiles in colorectal cancer patients. The results showed that the telomerase-related module could be identified by the methods at the single-protein and the module-based levels,but the modules-based statistical methods found more cancer-associated modules, including the module of intrinsic apoptotic signaling pathway by p53 class mediator.
出处
《基因组学与应用生物学》
CAS
CSCD
北大核心
2017年第10期4134-4140,共7页
Genomics and Applied Biology
基金
国家自然科学基金(31271416)资助
关键词
串联质谱技术
高通量蛋白质组学
差异表达分析
蛋白质功能模块
Tandem mass spectrometry, High throughput proteomics, Differential expression analysis, Protein function module