The conditional kernel correlation is proposed to measure the relationship between two random variables under covariates for multivariate data.Relying on the framework of reproducing kernel Hilbert spaces,we give the ...The conditional kernel correlation is proposed to measure the relationship between two random variables under covariates for multivariate data.Relying on the framework of reproducing kernel Hilbert spaces,we give the definitions of the conditional kernel covariance and conditional kernel correlation.We also provide their respective sample estimators and give the asymptotic properties,which help us construct a conditional independence test.According to the numerical results,the proposed test is more effective compared to the existing one under the considered scenarios.A real data is further analyzed to illustrate the efficacy of the proposed method.展开更多
The genetic models are greatly important in the analysis of genetic epidemiologic studies and many of the studies are conducted using the trend test under the additive model. However, for many complex diseases and tra...The genetic models are greatly important in the analysis of genetic epidemiologic studies and many of the studies are conducted using the trend test under the additive model. However, for many complex diseases and traits, the underlying genetic model for a genetic locus is usually uncertain. So a robust test free of genetic model is appropriate. In this paper, the authors propose a model-embedded trend test by incorporating Hardy-Weinberg equilibrium information and obtain the explicit formula to calculate its statistical significance. Extensive simulation studies show the proposed test is more robust than the existing procedures. Finally, a real application is further analyzed to show the performance of the proposed test.展开更多
Distance-based regression model,as a nonparametric multivariate method,has been widely used to detect the association between variations in a distance or dissimilarity matrix for outcomes and predictor variables of in...Distance-based regression model,as a nonparametric multivariate method,has been widely used to detect the association between variations in a distance or dissimilarity matrix for outcomes and predictor variables of interest in genetic association studies,genomic analyses,and many other research areas.Based on it,a pseudo-F statistic which partitions the variation in distance matrices is often constructed to achieve the aim.To the best of our knowledge,the statistical properties of the pseudo-F statistic has not yet been well established in the literature.To fill this gap,the authors study the asymptotic null distribution of the pseudo-F statistic and show that it is asymptotically equivalent to a mixture of chi-squared random variables.Given that the pseudo-F test statistic has unsatisfactory power when the correlations of the response variables are large,the authors propose a square-root F-type test statistic which replaces the similarity matrix with its square root.The asymptotic null distribution of the new test statistic and power of both tests are also investigated.Simulation studies are conducted to validate the asymptotic distributions of the tests and demonstrate that the proposed test has more robust power than the pseudo-F test.Both test statistics are exemplified with a gene expression dataset for a prostate cancer pathway.展开更多
Base substitution is one of the raw fuels that produce genetic variation; drive evolution. Recent studies have shown that the genome components affect mutation patterns to some extent. In order to infer the correlatio...Base substitution is one of the raw fuels that produce genetic variation; drive evolution. Recent studies have shown that the genome components affect mutation patterns to some extent. In order to infer the correlation between the Transition/Transversion ratio (Ts/Tv); the number of immediately adjacent A&T nucleotides, we investigated 3611007 Oryza sativa SNPs (including 45462 coding SNPs,; 242811 intronic SNPs); 32019 Arabidopsis SNPs. The results show that Ts/Tv is negatively correlated with the number of immediately adjacent A&T in O. Sativa; Arabidopsis. We further calculated AT2 (the number of SNPs whose immediately adjacent nucleotides are either A or T); AT0 (the number of SNPs whose immediately adjacent nucleotides are either C or G) for all 6 types of SNPs. C/G SNP of O. sativa; Arabidopsis has the highest AT2/AT0, which denotes C/G SNP may be influenced by the adjacent A&T nucleotides mostly. For SNPs in O. sativa, the neighboring effect of A&T nucleotides is limited to 2 nucleotides on both sides; for SNPs in Arabidopsis, the effect extends no more than 4 nucleotides on both sides.展开更多
In biomedical research,in order to evaluate the effect of a drug,investigators often need to compare the differences between one treatment group and another one by using multiple outcomes.The rank-sum tests can handle...In biomedical research,in order to evaluate the effect of a drug,investigators often need to compare the differences between one treatment group and another one by using multiple outcomes.The rank-sum tests can handle the case where the outcome differences between two groups are in the same direction.If they are not,MAX can handle it and is very useful when one/some of the differences is/are relatively larger than the others.When the individual outcome difference between two groups is moderate,a new method,summation of the absolute value of rank-based test for each outcome,is proposed in this work.Power comparison with the existing methods based on simulation studies and a real example show that the proposed test is a robust test,and works well when the difference for each outcome is moderate.The authors also derive some theoretical results for comparing the power between MAX and the the proposed method.展开更多
基金partially supported by Knowledge Innovation Program of Hubei Province(No.2019CFB810)partially supported by NSFC(No.12325110)the CAS Project for Young Scientists in Basic Research(No.YSBR-034)。
文摘The conditional kernel correlation is proposed to measure the relationship between two random variables under covariates for multivariate data.Relying on the framework of reproducing kernel Hilbert spaces,we give the definitions of the conditional kernel covariance and conditional kernel correlation.We also provide their respective sample estimators and give the asymptotic properties,which help us construct a conditional independence test.According to the numerical results,the proposed test is more effective compared to the existing one under the considered scenarios.A real data is further analyzed to illustrate the efficacy of the proposed method.
基金partial supported by Special National Key Research and Development Plan under Grant No.2016YFD0400206the Breakthrough Project of Strategic Priority Program of Chinese Academy of Sciences under Grant No.XDB13040600+2 种基金Youth Innovation Promotion Association of Chinese Academy of Sciencethe National Science Foundation of China under Grant Nos.11371353,11661080,61134013Special Fund of the University of Chinese Academy of Sciences for Scientific Research Cooperation
文摘The genetic models are greatly important in the analysis of genetic epidemiologic studies and many of the studies are conducted using the trend test under the additive model. However, for many complex diseases and traits, the underlying genetic model for a genetic locus is usually uncertain. So a robust test free of genetic model is appropriate. In this paper, the authors propose a model-embedded trend test by incorporating Hardy-Weinberg equilibrium information and obtain the explicit formula to calculate its statistical significance. Extensive simulation studies show the proposed test is more robust than the existing procedures. Finally, a real application is further analyzed to show the performance of the proposed test.
基金partially supported by Beijing Natural Science Foundation under Grant No.Z180006.
文摘Distance-based regression model,as a nonparametric multivariate method,has been widely used to detect the association between variations in a distance or dissimilarity matrix for outcomes and predictor variables of interest in genetic association studies,genomic analyses,and many other research areas.Based on it,a pseudo-F statistic which partitions the variation in distance matrices is often constructed to achieve the aim.To the best of our knowledge,the statistical properties of the pseudo-F statistic has not yet been well established in the literature.To fill this gap,the authors study the asymptotic null distribution of the pseudo-F statistic and show that it is asymptotically equivalent to a mixture of chi-squared random variables.Given that the pseudo-F test statistic has unsatisfactory power when the correlations of the response variables are large,the authors propose a square-root F-type test statistic which replaces the similarity matrix with its square root.The asymptotic null distribution of the new test statistic and power of both tests are also investigated.Simulation studies are conducted to validate the asymptotic distributions of the tests and demonstrate that the proposed test has more robust power than the pseudo-F test.Both test statistics are exemplified with a gene expression dataset for a prostate cancer pathway.
基金supported partly by the National 863 Project(Grant No.2003AA231050)National Natural Science Foundation of China(Grant No.10371126)Mega-projects of Science Research for the 10th Five-year Plan(Grant No.2002BA711A09).
文摘Base substitution is one of the raw fuels that produce genetic variation; drive evolution. Recent studies have shown that the genome components affect mutation patterns to some extent. In order to infer the correlation between the Transition/Transversion ratio (Ts/Tv); the number of immediately adjacent A&T nucleotides, we investigated 3611007 Oryza sativa SNPs (including 45462 coding SNPs,; 242811 intronic SNPs); 32019 Arabidopsis SNPs. The results show that Ts/Tv is negatively correlated with the number of immediately adjacent A&T in O. Sativa; Arabidopsis. We further calculated AT2 (the number of SNPs whose immediately adjacent nucleotides are either A or T); AT0 (the number of SNPs whose immediately adjacent nucleotides are either C or G) for all 6 types of SNPs. C/G SNP of O. sativa; Arabidopsis has the highest AT2/AT0, which denotes C/G SNP may be influenced by the adjacent A&T nucleotides mostly. For SNPs in O. sativa, the neighboring effect of A&T nucleotides is limited to 2 nucleotides on both sides; for SNPs in Arabidopsis, the effect extends no more than 4 nucleotides on both sides.
基金partially supported by by the National Young Science Foundation of China under No.10901155the National Social Science Foundation of China under No.10CTJ004
文摘In biomedical research,in order to evaluate the effect of a drug,investigators often need to compare the differences between one treatment group and another one by using multiple outcomes.The rank-sum tests can handle the case where the outcome differences between two groups are in the same direction.If they are not,MAX can handle it and is very useful when one/some of the differences is/are relatively larger than the others.When the individual outcome difference between two groups is moderate,a new method,summation of the absolute value of rank-based test for each outcome,is proposed in this work.Power comparison with the existing methods based on simulation studies and a real example show that the proposed test is a robust test,and works well when the difference for each outcome is moderate.The authors also derive some theoretical results for comparing the power between MAX and the the proposed method.