Computational efficiency has become a key issue in genomic prediction(GP) owing to the massive historical datasets accumulated. We developed hereby a new super-fast GP approach(SHEAPY) combining randomized Haseman-Els...Computational efficiency has become a key issue in genomic prediction(GP) owing to the massive historical datasets accumulated. We developed hereby a new super-fast GP approach(SHEAPY) combining randomized Haseman-Elston regression(RHE-reg) with a modified Algorithm for Proven and Young(APY) in an additive-effect model, using the former to estimate heritability and then the latter to invert a large genomic relationship matrix for best linear prediction. In simulation results with varied sizes of training population, GBLUP, HEAPY|A and SHEAPY showed similar predictive performance when the size of a core population was half that of a large training population and the heritability was a fixed value, and the computational speed of SHEAPY was faster than that of GBLUP and HEAPY|A. In simulation results with varied heritability, SHEAPY showed better predictive ability than GBLUP in all cases and than HEAPY|A in most cases when the size of a core population was 4/5 that of a small training population and the training population size was a fixed value. As a proof of concept, SHEAPY was applied to the analysis of two real datasets. In an Arabidopsis thaliana F2 population, the predictive performance of SHEAPY was similar to or better than that of GBLUP and HEAPY|A in most cases when the size of a core population(2 0 0) was 2/3 of that of a small training population(3 0 0). In a sorghum multiparental population,SHEAPY showed higher predictive accuracy than HEAPY|A for all of three traits, and than GBLUP for two traits. SHEAPY may become the GP method of choice for large-scale genomic data.展开更多
基金supported by the National Natural Science Foundation of China to Guo-Bo Chen(31771392)Zhejiang Provincial People’s Hospital Research Startup to Guo-Bo Chen(ZRY2018A004)。
文摘Computational efficiency has become a key issue in genomic prediction(GP) owing to the massive historical datasets accumulated. We developed hereby a new super-fast GP approach(SHEAPY) combining randomized Haseman-Elston regression(RHE-reg) with a modified Algorithm for Proven and Young(APY) in an additive-effect model, using the former to estimate heritability and then the latter to invert a large genomic relationship matrix for best linear prediction. In simulation results with varied sizes of training population, GBLUP, HEAPY|A and SHEAPY showed similar predictive performance when the size of a core population was half that of a large training population and the heritability was a fixed value, and the computational speed of SHEAPY was faster than that of GBLUP and HEAPY|A. In simulation results with varied heritability, SHEAPY showed better predictive ability than GBLUP in all cases and than HEAPY|A in most cases when the size of a core population was 4/5 that of a small training population and the training population size was a fixed value. As a proof of concept, SHEAPY was applied to the analysis of two real datasets. In an Arabidopsis thaliana F2 population, the predictive performance of SHEAPY was similar to or better than that of GBLUP and HEAPY|A in most cases when the size of a core population(2 0 0) was 2/3 of that of a small training population(3 0 0). In a sorghum multiparental population,SHEAPY showed higher predictive accuracy than HEAPY|A for all of three traits, and than GBLUP for two traits. SHEAPY may become the GP method of choice for large-scale genomic data.