SHEsis PCA: A GPU-Based Software to Correct for Population Stratification that Efficiently Accelerates the Process for Handling Genome-Wide Datasets

SHEsis PCA: A GPU-Based Software to Correct for Population Stratification that Efficiently Accelerates the Process for Handling Genome-Wide Datasets

原文传递

导出

摘要 Population stratification is a problem in genetic association studies because it is likely to highlight loci that underlie the population structure rather than disease-related loci. At present, principal component analysis （PCA） has been proven to be an effective way to correct for population stratification. However, the conventional PCA algorithm is time-consuming when dealing with large datasets. We developed a Graphic processing unit （GPU）-based PCA software named SHEsisPCA （http：//analysis.bio-x.cn/SHEsisMain.htm） that is highly parallel with a highest speedup greater than 100 compared with its CPU version. A cluster algorithm based on X-means was also implemented as a way to detect population subgroups and to obtain matched cases and controls in order to reduce the genomic inflation and increase the power. A study of both simulated and real datasets showed that SHEsisPCA ran at an extremely high speed while the accuracy was hardly reduced. Therefore, SHEsisPCA can help correct for population stratification much more efficiently than the conventional CPU-based algorithms. Population stratification is a problem in genetic association studies because it is likely to highlight loci that underlie the population structure rather than disease-related loci. At present, principal component analysis （PCA） has been proven to be an effective way to correct for population stratification. However, the conventional PCA algorithm is time-consuming when dealing with large datasets. We developed a Graphic processing unit （GPU）-based PCA software named SHEsisPCA （http：//analysis.bio-x.cn/SHEsisMain.htm） that is highly parallel with a highest speedup greater than 100 compared with its CPU version. A cluster algorithm based on X-means was also implemented as a way to detect population subgroups and to obtain matched cases and controls in order to reduce the genomic inflation and increase the power. A study of both simulated and real datasets showed that SHEsisPCA ran at an extremely high speed while the accuracy was hardly reduced. Therefore, SHEsisPCA can help correct for population stratification much more efficiently than the conventional CPU-based algorithms.

作者 Jiawei Shen Zhiqiang Li Yongyong Shi

机构地区 Bio-X Institutes Institute of Social Cognitive and Behavioral Sciences School of Bio-medical Engineering Shanghai Changning Mental Health Center

出处《Journal of Genetics and Genomics》 SCIE CAS CSCD 2015年第8期445-453,共9页 遗传学报（英文版）

基金 supported by the National Key Basic Research Program of China (973 Program) (No. 2015CB559100) the National High Technology Research and Development Program of China (863 Program) (Nos. 2012AA02A515 and2012AA021802) the Natural Science Foundation of China (Nos. 31325014, 81130022, 81272302 and 81421061) the National Program for Support of Top-Notch Young Professionals, the Program of Shanghai Subject Chief Scientist (No. 15XD1502200) "Shu Guang" project supported by Shanghai Municipal Education Commission and Shanghai Education Development Foundation (No. 12SG17)

关键词 Population stratification Principal component analysis Graphic processing unit CLUSTER Matched cases and controls Genetic studies Population stratification Principal component analysis Graphic processing unit Cluster Matched cases and controls Genetic studies

分类号 R394 [医药卫生—医学遗传学]

引文网络
相关文献

参考文献13

1Asimit, J., Zeggini, E., 2010. Rare variant association analysis methods for complex traits. Annu. Rev. Genet. 44, 293--308.
2Duda, R.O., Hart, P.E., 1973. Pattern Classification and Scene Analysis. Wiley, New York. Epstein, M.P., Duncan, R., Broadaway, K.A., He, M., Allen, A.S., Satten, G.A., 2012. Stratification-score matching improves correction for confounding by population stratification in case-control association studies. Genet. Epidemiol. 36, 195--205.
3Ewens, W.J., 2004. Mathematical Population Genetics: I. Theoretical Intro- duction. Springer. Helgason, A., Yngvad6ttir, B., Hrafnkelsson, B., Gulcher, J., Stefnsson, K., 2004. An Icelandic example of the impact of population structure on as- sociation studies. Nat. Genet. 37, 90--95.
4Madsen, B.E., Browning, S.R., 2009. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000384.
5Marchini, J., Cardon, L.R., Phillips, M.S., Donnelly, P., 2004. The effects of human population structure on large genetic association studies. Nat. Genet. 36, 512-517.
6NVIDIA Corporation, 2008. CUDA Toolkit CUBLAS Library. NVIDIA Cor- poration, Santa Clara, California, pp. 15. Patterson, N., Price, A., Reich, D., 2006. Population structure and Eigena- nalysis. PLoS Genet. 2, el90.
7Pelleg, D., Moore, A., 2000. X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning. San Francisco, pp. 727-734.
8Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A.R., Bender, D., Maller, J., Sklar, P., De Bakker, P.I.W., Daly, M.J., 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81,559--575.
9Schwarz, G., 1978. Estimating the dimension of a model. Ann. Stat. 6, 461 --464.
10Shi, Y., Li, L., Hu, Z., Li, S., Wang, S., Liu, J., Wu, C., He, L., Zhou, J., Li, Z., 2013. A genome-wide association study identifies two new cervical cancer susceptibility loci at 4q12 and 17q12. Nat. Genet. 45, 918--922.

1易晓峰,黄琛.联想网御——PKI CA[J].安全技术防范,2002(3):32-33.
2亚略特Bio-X生物识别平台[J].信息网络安全,2004(10):57-57.
3张洁.化学网络发现制药新途径[J].世界科学,2012(9):37-38.
4IMS Research认为，智能业务的发展有助于推动智能设备市场的发展[J].A&S（安全&自动化）,2009(11):38-38.
5IMSResearch认为视频分析市场前景乐观[J].A&S（安全&自动化）,2010(1):31-31.
6CA软件解决2000年问题[J].石油工业计算机应用,1998,6(1):51-51.
7洪汉妮.FrontPage 2000[J].电子测试,1999,0(10):101-101.
8梁冰,张中会,于文群,唐文杰.校园LDAP证书发布研究[J].现代计算机,2009,15(1):130-132.
9那年的“唱片版本比较会”……[J].歌剧,2006(3):29-29.
10Voices[J].China Today,2013,62(9):10-11.

Journal of Genetics and Genomics

2015年第8期

浏览历史

内容加载中请稍等...

SHEsis PCA: A GPU-Based Software to Correct for Population Stratification that Efficiently Accelerates the Process for Handling Genome-Wide Datasets

参考文献13

相关作者

相关机构

相关主题

浏览历史