The traditional approaches to false discovery rate(FDR)control in multiple hypothesis testing are usually based on the null distribution of a test statistic.However,all types of null distributions,including the theore...The traditional approaches to false discovery rate(FDR)control in multiple hypothesis testing are usually based on the null distribution of a test statistic.However,all types of null distributions,including the theoretical,permutation-based and empirical ones,have some inherent drawbacks.For example,the theoretical null might fail because of improper assumptions on the sample distribution.Here,we propose a null distributionfree approach to FDR control for multiple hypothesis testing in the case-control study.This approach,named target-decoy procedure,simply builds on the ordering of tests by some statistic or score,the null distribution of which is not required to be known.Competitive decoy tests are constructed from permutations of original samples and are used to estimate the false target discoveries.We prove that this approach controls the FDR when the score function is symmetric and the scores are independent between different tests.Simulation demonstrates that it is more stable and powerful than two popular traditional approaches,even in the existence of dependency.Evaluation is also made on two real datasets,including an arabidopsis genomics dataset and a COVID-19 proteomics dataset.展开更多
Analysis Unking directly genomics, neuroimaging phenotypes and clinical measurements is crucial for understanding psychiatric disorders, but remains rare.Here, we describe a multi-scale analysis using genome-wide SNPs...Analysis Unking directly genomics, neuroimaging phenotypes and clinical measurements is crucial for understanding psychiatric disorders, but remains rare.Here, we describe a multi-scale analysis using genome-wide SNPs, gene expression, grey matter volume (GMV), and the positive and negative syndrome scale scores (PANSS) to explore the etiology of schizophrenia. With 72 drug-naive schizophrenic first episode patients (FEPs) and 73 matched heathy controls, we identified 108 genes, from schizophrenia risk genes, that correlated significantly with GMV, which are highly co-expressed in the brain during development. Among these 108 candidates, 19 distinct genes were found associated with 16 brain regions referred to as hot clusters (HCs), primarily in the frontal cortex, sensory-motor regions and temporal and parietal regions.The patients were subtyped into three groups with distinguishable PANSS scores by the GMV of the identified HCs. Furthermore, we found that HCs with common GMV among patient groups are related to genes that mostly mapped to pathways relevant to neural signaling, which are associated with the risk for schizophrenia.Our results provide an integrated view of how genetic variants may affect brain structures that lead to distinct disease phenotypes.The method of multi-scale analysis that was described in this research, may help to advance the understanding of the etiology of schizophrenia.展开更多
基金supported by the National Key R&D Program of China(No.2018YFB0704304)the National Natural Science Foundation of China(Nos.32070668,62002231,61832003,61433014)the K.C.Wong Education Foundation。
文摘The traditional approaches to false discovery rate(FDR)control in multiple hypothesis testing are usually based on the null distribution of a test statistic.However,all types of null distributions,including the theoretical,permutation-based and empirical ones,have some inherent drawbacks.For example,the theoretical null might fail because of improper assumptions on the sample distribution.Here,we propose a null distributionfree approach to FDR control for multiple hypothesis testing in the case-control study.This approach,named target-decoy procedure,simply builds on the ordering of tests by some statistic or score,the null distribution of which is not required to be known.Competitive decoy tests are constructed from permutations of original samples and are used to estimate the false target discoveries.We prove that this approach controls the FDR when the score function is symmetric and the scores are independent between different tests.Simulation demonstrates that it is more stable and powerful than two popular traditional approaches,even in the existence of dependency.Evaluation is also made on two real datasets,including an arabidopsis genomics dataset and a COVID-19 proteomics dataset.
文摘Analysis Unking directly genomics, neuroimaging phenotypes and clinical measurements is crucial for understanding psychiatric disorders, but remains rare.Here, we describe a multi-scale analysis using genome-wide SNPs, gene expression, grey matter volume (GMV), and the positive and negative syndrome scale scores (PANSS) to explore the etiology of schizophrenia. With 72 drug-naive schizophrenic first episode patients (FEPs) and 73 matched heathy controls, we identified 108 genes, from schizophrenia risk genes, that correlated significantly with GMV, which are highly co-expressed in the brain during development. Among these 108 candidates, 19 distinct genes were found associated with 16 brain regions referred to as hot clusters (HCs), primarily in the frontal cortex, sensory-motor regions and temporal and parietal regions.The patients were subtyped into three groups with distinguishable PANSS scores by the GMV of the identified HCs. Furthermore, we found that HCs with common GMV among patient groups are related to genes that mostly mapped to pathways relevant to neural signaling, which are associated with the risk for schizophrenia.Our results provide an integrated view of how genetic variants may affect brain structures that lead to distinct disease phenotypes.The method of multi-scale analysis that was described in this research, may help to advance the understanding of the etiology of schizophrenia.