Group testing is a method that can be used to estimate the prevalence of rare infectious diseases,which can effectively save time and reduce costs compared to the method of random sampling.However,previous literature ...Group testing is a method that can be used to estimate the prevalence of rare infectious diseases,which can effectively save time and reduce costs compared to the method of random sampling.However,previous literature only demonstrated the optimality of group testing strategy while estimating prevalence under some strong assumptions.This article weakens the assumption of misclassification rate in the previous literature,considers the misclassification rate of the infected samples as a differentiable function of the pool size,and explores some optimal properties of group testing for estimating prevalence in the presence of differential misclassification conforming to this assumption.This article theoretically demonstrates that the group testing strategy performs better than the sample by sample procedure in estimating disease prevalence when the total number of sample pools is given or the size of the test population is determined.Numerical simulation experiments were conducted to evaluate the performance of group tests in estimating prevalence in the presence of dilution effect.展开更多
In this paper,we consider testing the hypothesis concerning the means of two independent semicontinuous distributions whose observations are zero-inflated,characterized by a sizable number of zeros and positive observ...In this paper,we consider testing the hypothesis concerning the means of two independent semicontinuous distributions whose observations are zero-inflated,characterized by a sizable number of zeros and positive observations from a continuous distribution.The continuous parts of the two semicontinuous distributions are assumed to follow a density ratio model.A new two-part test is developed for this kind of data.The proposed test takes the sum of one test for equality of proportions of zero values and one conditional test for the continuous distribution.The test is proved to follow a2 distribution with two degrees of freedom.Simulation studies show that the proposed test controls the type I error rates at the desired level,and is competitive to,and most of the time more powerful than two popular tests.A real data example from a dietary intervention study is used to illustrate the usefulness of the proposed test.展开更多
The distance-based regression model has many applications in analysis of multivariate response regression in various ?elds, such as ecology, genomics, genetics, human microbiomics, and neuroimaging. It yields a pseudo...The distance-based regression model has many applications in analysis of multivariate response regression in various ?elds, such as ecology, genomics, genetics, human microbiomics, and neuroimaging. It yields a pseudo F test statistic that assesses the relation between the distance(dissimilarity) of the subjects and the predictors of interest. Despite its popularity in recent decades, the statistical properties of the pseudo F test statistic have not been revealed to our knowledge. This study derives the asymptotic properties of the pseudo F test statistic using spectral decomposition under the matrix normal assumption, when the utilized dissimilarity measure is the Euclidean or Mahalanobis distance. The pseudo F test statistic with the Euclidean distance has the same distribution as the quotient of two Chi-squared-type mixtures. The denominator and numerator of the quotient are approximated using a random variable of the form ξχ_d^2+ η, and the approximate error bound is given. The pseudo F test statistic with the Mahalanobis distance follows an F distribution.In simulation studies, the approximated distribution well matched the "exact" distribution obtained by the permutation procedure. The obtained distribution was further validated on H1N1 in?uenza data, aging human brain data, and embryonic imprint data.展开更多
Distance-based regression model,as a nonparametric multivariate method,has been widely used to detect the association between variations in a distance or dissimilarity matrix for outcomes and predictor variables of in...Distance-based regression model,as a nonparametric multivariate method,has been widely used to detect the association between variations in a distance or dissimilarity matrix for outcomes and predictor variables of interest in genetic association studies,genomic analyses,and many other research areas.Based on it,a pseudo-F statistic which partitions the variation in distance matrices is often constructed to achieve the aim.To the best of our knowledge,the statistical properties of the pseudo-F statistic has not yet been well established in the literature.To fill this gap,the authors study the asymptotic null distribution of the pseudo-F statistic and show that it is asymptotically equivalent to a mixture of chi-squared random variables.Given that the pseudo-F test statistic has unsatisfactory power when the correlations of the response variables are large,the authors propose a square-root F-type test statistic which replaces the similarity matrix with its square root.The asymptotic null distribution of the new test statistic and power of both tests are also investigated.Simulation studies are conducted to validate the asymptotic distributions of the tests and demonstrate that the proposed test has more robust power than the pseudo-F test.Both test statistics are exemplified with a gene expression dataset for a prostate cancer pathway.展开更多
Linear mixed effects models with general skew normal-symmetric (SNS) error are considered and several properties of the SNS distributions are obtained. Under the SNS settings, ANOVA-type estimates of variance compon...Linear mixed effects models with general skew normal-symmetric (SNS) error are considered and several properties of the SNS distributions are obtained. Under the SNS settings, ANOVA-type estimates of variance components in the model are unbiased, the ANOVA-type F-tests are exact F-tests in SNS setting, and the exact confidence intervals for fixed effects are constructed. Also the power of ANOVA-type F-tests for components are free of the skewing function if the random effects normally distributed. For illustration of the main results, simulation studies on the robustness of the models are given by comparisons of multivariate skew-normal, multivariate skew normal-Laplace, multivariate skew normal-uniform, multivariate skew normal-symmetric, and multivariate normal distributed errors. A real example is provided for the illustration of the proposed method.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.72091212).
文摘Group testing is a method that can be used to estimate the prevalence of rare infectious diseases,which can effectively save time and reduce costs compared to the method of random sampling.However,previous literature only demonstrated the optimality of group testing strategy while estimating prevalence under some strong assumptions.This article weakens the assumption of misclassification rate in the previous literature,considers the misclassification rate of the infected samples as a differentiable function of the pool size,and explores some optimal properties of group testing for estimating prevalence in the presence of differential misclassification conforming to this assumption.This article theoretically demonstrates that the group testing strategy performs better than the sample by sample procedure in estimating disease prevalence when the total number of sample pools is given or the size of the test population is determined.Numerical simulation experiments were conducted to evaluate the performance of group tests in estimating prevalence in the presence of dilution effect.
基金Supported by the National Natural Science Foundation of China(No.11971433)the First Class Discipline of Zhejiang-A(Zhejiang Gongshang University-Statistics)the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development.
文摘In this paper,we consider testing the hypothesis concerning the means of two independent semicontinuous distributions whose observations are zero-inflated,characterized by a sizable number of zeros and positive observations from a continuous distribution.The continuous parts of the two semicontinuous distributions are assumed to follow a density ratio model.A new two-part test is developed for this kind of data.The proposed test takes the sum of one test for equality of proportions of zero values and one conditional test for the continuous distribution.The test is proved to follow a2 distribution with two degrees of freedom.Simulation studies show that the proposed test controls the type I error rates at the desired level,and is competitive to,and most of the time more powerful than two popular tests.A real data example from a dietary intervention study is used to illustrate the usefulness of the proposed test.
基金supported by National Natural Science Foundation of China (Grant No. 11722113)
文摘The distance-based regression model has many applications in analysis of multivariate response regression in various ?elds, such as ecology, genomics, genetics, human microbiomics, and neuroimaging. It yields a pseudo F test statistic that assesses the relation between the distance(dissimilarity) of the subjects and the predictors of interest. Despite its popularity in recent decades, the statistical properties of the pseudo F test statistic have not been revealed to our knowledge. This study derives the asymptotic properties of the pseudo F test statistic using spectral decomposition under the matrix normal assumption, when the utilized dissimilarity measure is the Euclidean or Mahalanobis distance. The pseudo F test statistic with the Euclidean distance has the same distribution as the quotient of two Chi-squared-type mixtures. The denominator and numerator of the quotient are approximated using a random variable of the form ξχ_d^2+ η, and the approximate error bound is given. The pseudo F test statistic with the Mahalanobis distance follows an F distribution.In simulation studies, the approximated distribution well matched the "exact" distribution obtained by the permutation procedure. The obtained distribution was further validated on H1N1 in?uenza data, aging human brain data, and embryonic imprint data.
基金partially supported by Beijing Natural Science Foundation under Grant No.Z180006.
文摘Distance-based regression model,as a nonparametric multivariate method,has been widely used to detect the association between variations in a distance or dissimilarity matrix for outcomes and predictor variables of interest in genetic association studies,genomic analyses,and many other research areas.Based on it,a pseudo-F statistic which partitions the variation in distance matrices is often constructed to achieve the aim.To the best of our knowledge,the statistical properties of the pseudo-F statistic has not yet been well established in the literature.To fill this gap,the authors study the asymptotic null distribution of the pseudo-F statistic and show that it is asymptotically equivalent to a mixture of chi-squared random variables.Given that the pseudo-F test statistic has unsatisfactory power when the correlations of the response variables are large,the authors propose a square-root F-type test statistic which replaces the similarity matrix with its square root.The asymptotic null distribution of the new test statistic and power of both tests are also investigated.Simulation studies are conducted to validate the asymptotic distributions of the tests and demonstrate that the proposed test has more robust power than the pseudo-F test.Both test statistics are exemplified with a gene expression dataset for a prostate cancer pathway.
基金The authors are grateful to the referees for their valuable suggestions which considerably improved the paper. This work was supported by the National Natural Science Foundation of China (Grant Nos. 11171011, 11471036), the Natural Science Foundation of Beijing (Grant No. 1132007), and Beijing Municipal Science and Technology Project (Grant No. km201410005011). Research of A. Liu was supported by the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) and the National Institutes of Health (NIH).
文摘Linear mixed effects models with general skew normal-symmetric (SNS) error are considered and several properties of the SNS distributions are obtained. Under the SNS settings, ANOVA-type estimates of variance components in the model are unbiased, the ANOVA-type F-tests are exact F-tests in SNS setting, and the exact confidence intervals for fixed effects are constructed. Also the power of ANOVA-type F-tests for components are free of the skewing function if the random effects normally distributed. For illustration of the main results, simulation studies on the robustness of the models are given by comparisons of multivariate skew-normal, multivariate skew normal-Laplace, multivariate skew normal-uniform, multivariate skew normal-symmetric, and multivariate normal distributed errors. A real example is provided for the illustration of the proposed method.