Objective: As one of the most popular designs used in genetic research, family-based design has been well recognized for its advantages, such as robustness against population stratification and admixture. With vast am...Objective: As one of the most popular designs used in genetic research, family-based design has been well recognized for its advantages, such as robustness against population stratification and admixture. With vast amounts of genetic data collected from family-based studies, there is a great interest in studying the role of genetic markers from the aspect of risk prediction. This study aims to develop a new statistical approach for family-based risk prediction analysis with an improved prediction accuracy compared with existing methods based on family history. Methods: In this study, we propose an ensemble-based likelihood ratio(ELR) approach, Fam-ELR, for family-based genomic risk prediction. Fam-ELR incorporates a clustered receiver operating characteristic(ROC) curve method to consider correlations among family samples, and uses a computationally efficient tree-assembling procedure for variable selection and model building. Results: Through simulations, Fam-ELR shows its robustness in various underlying disease models and pedigree structures, and attains better performance than two existing family-based risk prediction methods. In a real-data application to a family-based genome-wide dataset of conduct disorder, Fam-ELR demonstrates its ability to integrate potential risk predictors and interactions into the model for improved accuracy, especially on a genome-wide level. Conclusions: By comparing existing approaches, such as genetic risk-score approach, Fam-ELR has the capacity of incorporating genetic variants with small or moderate marginal effects and their interactions into an improved risk prediction model. Therefore, it is a robust and useful approach for high-dimensional family-based risk prediction, especially on complex disease with unknown or less known disease etiology.展开更多
Covariance matrix plays an important role in risk management, asset pricing, and portfolio allocation. Covariance matrix estimation becomes challenging when the dimensionality is comparable or much larger than the sam...Covariance matrix plays an important role in risk management, asset pricing, and portfolio allocation. Covariance matrix estimation becomes challenging when the dimensionality is comparable or much larger than the sample size. A widely used approach for reducing dimensionality is based on multi-factor models. Although it has been well studied and quite successful in many applications, the quality of the estimated covariance matrix is often degraded due to a nontrivial amount of missing data in the factor matrix for both technical and cost reasons. Since the factor matrix is only approximately low rank or even has full rank, existing matrix completion algorithms are not applicable. We consider a new matrix completion paradigm using the factor models directly and apply the alternating direction method of multipliers for the recovery. Numerical experiments show that the nuclear-norm matrix completion approaches are not suitable but our proposed models and algorithms are promising.展开更多
基金Project supported by the National Natural Science Foundation of China(No.81402762)the National Institute on Drug Abuse(Nos.K01DA033346 and R01DA043501),USA
文摘Objective: As one of the most popular designs used in genetic research, family-based design has been well recognized for its advantages, such as robustness against population stratification and admixture. With vast amounts of genetic data collected from family-based studies, there is a great interest in studying the role of genetic markers from the aspect of risk prediction. This study aims to develop a new statistical approach for family-based risk prediction analysis with an improved prediction accuracy compared with existing methods based on family history. Methods: In this study, we propose an ensemble-based likelihood ratio(ELR) approach, Fam-ELR, for family-based genomic risk prediction. Fam-ELR incorporates a clustered receiver operating characteristic(ROC) curve method to consider correlations among family samples, and uses a computationally efficient tree-assembling procedure for variable selection and model building. Results: Through simulations, Fam-ELR shows its robustness in various underlying disease models and pedigree structures, and attains better performance than two existing family-based risk prediction methods. In a real-data application to a family-based genome-wide dataset of conduct disorder, Fam-ELR demonstrates its ability to integrate potential risk predictors and interactions into the model for improved accuracy, especially on a genome-wide level. Conclusions: By comparing existing approaches, such as genetic risk-score approach, Fam-ELR has the capacity of incorporating genetic variants with small or moderate marginal effects and their interactions into an improved risk prediction model. Therefore, it is a robust and useful approach for high-dimensional family-based risk prediction, especially on complex disease with unknown or less known disease etiology.
基金supported by National Natural Science Foundation of China(Grant Nos.10971122,11101274 and 11322109)Scientific and Technological Projects of Shandong Province(Grant No.2009GG10001012)Excellent Young Scientist Foundation of Shandong Province(Grant No.BS2012SF025)
文摘Covariance matrix plays an important role in risk management, asset pricing, and portfolio allocation. Covariance matrix estimation becomes challenging when the dimensionality is comparable or much larger than the sample size. A widely used approach for reducing dimensionality is based on multi-factor models. Although it has been well studied and quite successful in many applications, the quality of the estimated covariance matrix is often degraded due to a nontrivial amount of missing data in the factor matrix for both technical and cost reasons. Since the factor matrix is only approximately low rank or even has full rank, existing matrix completion algorithms are not applicable. We consider a new matrix completion paradigm using the factor models directly and apply the alternating direction method of multipliers for the recovery. Numerical experiments show that the nuclear-norm matrix completion approaches are not suitable but our proposed models and algorithms are promising.