Group testing is a method that can be used to estimate the prevalence of rare infectious diseases,which can effectively save time and reduce costs compared to the method of random sampling.However,previous literature ...Group testing is a method that can be used to estimate the prevalence of rare infectious diseases,which can effectively save time and reduce costs compared to the method of random sampling.However,previous literature only demonstrated the optimality of group testing strategy while estimating prevalence under some strong assumptions.This article weakens the assumption of misclassification rate in the previous literature,considers the misclassification rate of the infected samples as a differentiable function of the pool size,and explores some optimal properties of group testing for estimating prevalence in the presence of differential misclassification conforming to this assumption.This article theoretically demonstrates that the group testing strategy performs better than the sample by sample procedure in estimating disease prevalence when the total number of sample pools is given or the size of the test population is determined.Numerical simulation experiments were conducted to evaluate the performance of group tests in estimating prevalence in the presence of dilution effect.展开更多
The hedgehog (Hh) signaling pathway plays an essential role in the embryonic development and homeostasis of diverse adult tissues, and its deregulation has been implicated in the tumorigenesis and metastasis of vari...The hedgehog (Hh) signaling pathway plays an essential role in the embryonic development and homeostasis of diverse adult tissues, and its deregulation has been implicated in the tumorigenesis and metastasis of various malignancies including breast cancer. Aberrant activation of the Hh pathway includes the following mechanisms: (I) Hh ligand-independent mechanism - Loss of function mutations in the Hh receptor Patched 1 (PTCH1) or gain of function mutations in the Smoothened (SMO) lead to constitutive activation of this pathway; (II) Autocrine signaling- Ith ligand produced by tumor cells stimulates the Hh signaling in tumor cells; (III) Paracrine signaling - tumor cell produced-Hh ligand activates stromal and endothelial cells that produce growth factors in microenvironment for supporting tumor growth and survival; and (IV) Reverse paracrine signaling - Hh ligand produced by stromal cells support tumor growth and survival. Upon the pathway activation, the Gli transcription factors, effectors of the Hh signaling, activate or inhibit transcription by binding to their responsive genes and interacting with the transcriptional complex. The Gli transcription factor family includes Glil, Gli2, and Gli3 (1). Glil is a transcriptional activator whose expression has been recognized as an activation state of the Hh signaling pathway, Gli2 is either an activator or repressor, and Gli3 is a strong repressor of transcriptional activities. To date, a ligand-dependent autocrine model of activating the Hh signaling has been described in breast cancer, and both an autocrine and paracrine mechanisms in colorectal cancer, pancreatic cancer and prostate cancer (2,3). Notably, a ligand-independent mechanism (mutationsin PTCHI and SMO) of the signaling has been well demonstrated in basal cell carcinoma and medulloblastoma (4,5).展开更多
In this paper,we consider testing the hypothesis concerning the means of two independent semicontinuous distributions whose observations are zero-inflated,characterized by a sizable number of zeros and positive observ...In this paper,we consider testing the hypothesis concerning the means of two independent semicontinuous distributions whose observations are zero-inflated,characterized by a sizable number of zeros and positive observations from a continuous distribution.The continuous parts of the two semicontinuous distributions are assumed to follow a density ratio model.A new two-part test is developed for this kind of data.The proposed test takes the sum of one test for equality of proportions of zero values and one conditional test for the continuous distribution.The test is proved to follow a2 distribution with two degrees of freedom.Simulation studies show that the proposed test controls the type I error rates at the desired level,and is competitive to,and most of the time more powerful than two popular tests.A real data example from a dietary intervention study is used to illustrate the usefulness of the proposed test.展开更多
Distance-based regression model,as a nonparametric multivariate method,has been widely used to detect the association between variations in a distance or dissimilarity matrix for outcomes and predictor variables of in...Distance-based regression model,as a nonparametric multivariate method,has been widely used to detect the association between variations in a distance or dissimilarity matrix for outcomes and predictor variables of interest in genetic association studies,genomic analyses,and many other research areas.Based on it,a pseudo-F statistic which partitions the variation in distance matrices is often constructed to achieve the aim.To the best of our knowledge,the statistical properties of the pseudo-F statistic has not yet been well established in the literature.To fill this gap,the authors study the asymptotic null distribution of the pseudo-F statistic and show that it is asymptotically equivalent to a mixture of chi-squared random variables.Given that the pseudo-F test statistic has unsatisfactory power when the correlations of the response variables are large,the authors propose a square-root F-type test statistic which replaces the similarity matrix with its square root.The asymptotic null distribution of the new test statistic and power of both tests are also investigated.Simulation studies are conducted to validate the asymptotic distributions of the tests and demonstrate that the proposed test has more robust power than the pseudo-F test.Both test statistics are exemplified with a gene expression dataset for a prostate cancer pathway.展开更多
We investigated the false-negative,true-negative,false-positive,and true-positive predictive values from a general group testing procedure for a heterogeneous population.We show that its false(true)-negative predictiv...We investigated the false-negative,true-negative,false-positive,and true-positive predictive values from a general group testing procedure for a heterogeneous population.We show that its false(true)-negative predictive value of a specimen is larger(smaller),and the false(true)-positive predictive value is smaller(larger)than that from individual testing procedure,where the former is in aversion.Then we propose a nested group testing procedure,and show that it can keep the sterling characteristics and also improve the false-negative predictive values for a specimen,not larger than that from individual testing.These characteristics are studied from both theoretical and numerical points of view.The nested group testing procedure is better than individual testing on both false-positive and false-negative predictive values,while retains the efficiency as a basic characteristic of a group testing procedure.Applications to Dorfman’s,Halving and Sterrett procedures are discussed.Results from extensive simulation studies and an application to malaria infection in microscopy-negative Malawian women exemplify the findings.展开更多
The distance-based regression model has many applications in analysis of multivariate response regression in various ?elds, such as ecology, genomics, genetics, human microbiomics, and neuroimaging. It yields a pseudo...The distance-based regression model has many applications in analysis of multivariate response regression in various ?elds, such as ecology, genomics, genetics, human microbiomics, and neuroimaging. It yields a pseudo F test statistic that assesses the relation between the distance(dissimilarity) of the subjects and the predictors of interest. Despite its popularity in recent decades, the statistical properties of the pseudo F test statistic have not been revealed to our knowledge. This study derives the asymptotic properties of the pseudo F test statistic using spectral decomposition under the matrix normal assumption, when the utilized dissimilarity measure is the Euclidean or Mahalanobis distance. The pseudo F test statistic with the Euclidean distance has the same distribution as the quotient of two Chi-squared-type mixtures. The denominator and numerator of the quotient are approximated using a random variable of the form ξχ_d^2+ η, and the approximate error bound is given. The pseudo F test statistic with the Mahalanobis distance follows an F distribution.In simulation studies, the approximated distribution well matched the "exact" distribution obtained by the permutation procedure. The obtained distribution was further validated on H1N1 in?uenza data, aging human brain data, and embryonic imprint data.展开更多
Linear mixed effects models with general skew normal-symmetric (SNS) error are considered and several properties of the SNS distributions are obtained. Under the SNS settings, ANOVA-type estimates of variance compon...Linear mixed effects models with general skew normal-symmetric (SNS) error are considered and several properties of the SNS distributions are obtained. Under the SNS settings, ANOVA-type estimates of variance components in the model are unbiased, the ANOVA-type F-tests are exact F-tests in SNS setting, and the exact confidence intervals for fixed effects are constructed. Also the power of ANOVA-type F-tests for components are free of the skewing function if the random effects normally distributed. For illustration of the main results, simulation studies on the robustness of the models are given by comparisons of multivariate skew-normal, multivariate skew normal-Laplace, multivariate skew normal-uniform, multivariate skew normal-symmetric, and multivariate normal distributed errors. A real example is provided for the illustration of the proposed method.展开更多
Alternative splicing(AS)regulates biological processes governing phenotypes and diseases.Differential AS(DAS)gene test methods have been developed to investigate important exonic expression from high-throughput datase...Alternative splicing(AS)regulates biological processes governing phenotypes and diseases.Differential AS(DAS)gene test methods have been developed to investigate important exonic expression from high-throughput datasets.However,the DAS events extracted using statistical tests are insufficient to delineate relevant biological processes.In this study,we developed a novel application,Alternative Splicing Encyclopedia:Functional Interaction(ASpediaFI),to systemically identify DAS events and co-regulated genes and pathways.ASpediaFI establishes a heterogeneous interaction network of genes and their feature nodes(i.e.,AS events and pathways)connected by coexpression or pathway gene set knowledge.Next,ASpediaFI explores the interaction network using the random walk with restart algorithm and interrogates the proximity from a query gene set.Finally,ASpediaFI extracts significant AS events,genes,and pathways.To evaluate the performance of our method,we simulated RNA sequencing(RNA-seq)datasets to consider various conditions of sequencing depth and sample size.The performance was compared with that of other methods.Additionally,we analyzed three public datasets of cancer patients or cell lines to evaluate how well ASpediaFI detects biologically relevant candidates.ASpediaFI exhibits strong performance in both simulated and public datasets.Our integrative approach reveals that DAS events that recognize a global co-expression network and relevant pathways determine the functional importance of spliced genes in the subnetwork.ASpediaFI is publicly available at https://bioconductor.org/packages/ASpediaFI.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.72091212).
文摘Group testing is a method that can be used to estimate the prevalence of rare infectious diseases,which can effectively save time and reduce costs compared to the method of random sampling.However,previous literature only demonstrated the optimality of group testing strategy while estimating prevalence under some strong assumptions.This article weakens the assumption of misclassification rate in the previous literature,considers the misclassification rate of the infected samples as a differentiable function of the pool size,and explores some optimal properties of group testing for estimating prevalence in the presence of differential misclassification conforming to this assumption.This article theoretically demonstrates that the group testing strategy performs better than the sample by sample procedure in estimating disease prevalence when the total number of sample pools is given or the size of the test population is determined.Numerical simulation experiments were conducted to evaluate the performance of group tests in estimating prevalence in the presence of dilution effect.
文摘The hedgehog (Hh) signaling pathway plays an essential role in the embryonic development and homeostasis of diverse adult tissues, and its deregulation has been implicated in the tumorigenesis and metastasis of various malignancies including breast cancer. Aberrant activation of the Hh pathway includes the following mechanisms: (I) Hh ligand-independent mechanism - Loss of function mutations in the Hh receptor Patched 1 (PTCH1) or gain of function mutations in the Smoothened (SMO) lead to constitutive activation of this pathway; (II) Autocrine signaling- Ith ligand produced by tumor cells stimulates the Hh signaling in tumor cells; (III) Paracrine signaling - tumor cell produced-Hh ligand activates stromal and endothelial cells that produce growth factors in microenvironment for supporting tumor growth and survival; and (IV) Reverse paracrine signaling - Hh ligand produced by stromal cells support tumor growth and survival. Upon the pathway activation, the Gli transcription factors, effectors of the Hh signaling, activate or inhibit transcription by binding to their responsive genes and interacting with the transcriptional complex. The Gli transcription factor family includes Glil, Gli2, and Gli3 (1). Glil is a transcriptional activator whose expression has been recognized as an activation state of the Hh signaling pathway, Gli2 is either an activator or repressor, and Gli3 is a strong repressor of transcriptional activities. To date, a ligand-dependent autocrine model of activating the Hh signaling has been described in breast cancer, and both an autocrine and paracrine mechanisms in colorectal cancer, pancreatic cancer and prostate cancer (2,3). Notably, a ligand-independent mechanism (mutationsin PTCHI and SMO) of the signaling has been well demonstrated in basal cell carcinoma and medulloblastoma (4,5).
基金Supported by the National Natural Science Foundation of China(No.11971433)the First Class Discipline of Zhejiang-A(Zhejiang Gongshang University-Statistics)the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development.
文摘In this paper,we consider testing the hypothesis concerning the means of two independent semicontinuous distributions whose observations are zero-inflated,characterized by a sizable number of zeros and positive observations from a continuous distribution.The continuous parts of the two semicontinuous distributions are assumed to follow a density ratio model.A new two-part test is developed for this kind of data.The proposed test takes the sum of one test for equality of proportions of zero values and one conditional test for the continuous distribution.The test is proved to follow a2 distribution with two degrees of freedom.Simulation studies show that the proposed test controls the type I error rates at the desired level,and is competitive to,and most of the time more powerful than two popular tests.A real data example from a dietary intervention study is used to illustrate the usefulness of the proposed test.
基金partially supported by Beijing Natural Science Foundation under Grant No.Z180006.
文摘Distance-based regression model,as a nonparametric multivariate method,has been widely used to detect the association between variations in a distance or dissimilarity matrix for outcomes and predictor variables of interest in genetic association studies,genomic analyses,and many other research areas.Based on it,a pseudo-F statistic which partitions the variation in distance matrices is often constructed to achieve the aim.To the best of our knowledge,the statistical properties of the pseudo-F statistic has not yet been well established in the literature.To fill this gap,the authors study the asymptotic null distribution of the pseudo-F statistic and show that it is asymptotically equivalent to a mixture of chi-squared random variables.Given that the pseudo-F test statistic has unsatisfactory power when the correlations of the response variables are large,the authors propose a square-root F-type test statistic which replaces the similarity matrix with its square root.The asymptotic null distribution of the new test statistic and power of both tests are also investigated.Simulation studies are conducted to validate the asymptotic distributions of the tests and demonstrate that the proposed test has more robust power than the pseudo-F test.Both test statistics are exemplified with a gene expression dataset for a prostate cancer pathway.
基金supported by National Natural Science Foundation of China (Grant Nos.11801102,11861017)Beijing Natural Science Foundation (Z180006)the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development of the National Institutes of Health.
文摘We investigated the false-negative,true-negative,false-positive,and true-positive predictive values from a general group testing procedure for a heterogeneous population.We show that its false(true)-negative predictive value of a specimen is larger(smaller),and the false(true)-positive predictive value is smaller(larger)than that from individual testing procedure,where the former is in aversion.Then we propose a nested group testing procedure,and show that it can keep the sterling characteristics and also improve the false-negative predictive values for a specimen,not larger than that from individual testing.These characteristics are studied from both theoretical and numerical points of view.The nested group testing procedure is better than individual testing on both false-positive and false-negative predictive values,while retains the efficiency as a basic characteristic of a group testing procedure.Applications to Dorfman’s,Halving and Sterrett procedures are discussed.Results from extensive simulation studies and an application to malaria infection in microscopy-negative Malawian women exemplify the findings.
基金supported by National Natural Science Foundation of China (Grant No. 11722113)
文摘The distance-based regression model has many applications in analysis of multivariate response regression in various ?elds, such as ecology, genomics, genetics, human microbiomics, and neuroimaging. It yields a pseudo F test statistic that assesses the relation between the distance(dissimilarity) of the subjects and the predictors of interest. Despite its popularity in recent decades, the statistical properties of the pseudo F test statistic have not been revealed to our knowledge. This study derives the asymptotic properties of the pseudo F test statistic using spectral decomposition under the matrix normal assumption, when the utilized dissimilarity measure is the Euclidean or Mahalanobis distance. The pseudo F test statistic with the Euclidean distance has the same distribution as the quotient of two Chi-squared-type mixtures. The denominator and numerator of the quotient are approximated using a random variable of the form ξχ_d^2+ η, and the approximate error bound is given. The pseudo F test statistic with the Mahalanobis distance follows an F distribution.In simulation studies, the approximated distribution well matched the "exact" distribution obtained by the permutation procedure. The obtained distribution was further validated on H1N1 in?uenza data, aging human brain data, and embryonic imprint data.
基金The authors are grateful to the referees for their valuable suggestions which considerably improved the paper. This work was supported by the National Natural Science Foundation of China (Grant Nos. 11171011, 11471036), the Natural Science Foundation of Beijing (Grant No. 1132007), and Beijing Municipal Science and Technology Project (Grant No. km201410005011). Research of A. Liu was supported by the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) and the National Institutes of Health (NIH).
文摘Linear mixed effects models with general skew normal-symmetric (SNS) error are considered and several properties of the SNS distributions are obtained. Under the SNS settings, ANOVA-type estimates of variance components in the model are unbiased, the ANOVA-type F-tests are exact F-tests in SNS setting, and the exact confidence intervals for fixed effects are constructed. Also the power of ANOVA-type F-tests for components are free of the skewing function if the random effects normally distributed. For illustration of the main results, simulation studies on the robustness of the models are given by comparisons of multivariate skew-normal, multivariate skew normal-Laplace, multivariate skew normal-uniform, multivariate skew normal-symmetric, and multivariate normal distributed errors. A real example is provided for the illustration of the proposed method.
基金supported by the Korea Research Environment Open Network (KREONET)the National Research Foundation of Korea Grant funded by the Korean government (Grant No. NRF-2019R1A2C1003401)+1 种基金the National Cancer Center Grant (Grant No. NCC-1910040)supported by Korea Research Environment Open Network(KREONE)
文摘Alternative splicing(AS)regulates biological processes governing phenotypes and diseases.Differential AS(DAS)gene test methods have been developed to investigate important exonic expression from high-throughput datasets.However,the DAS events extracted using statistical tests are insufficient to delineate relevant biological processes.In this study,we developed a novel application,Alternative Splicing Encyclopedia:Functional Interaction(ASpediaFI),to systemically identify DAS events and co-regulated genes and pathways.ASpediaFI establishes a heterogeneous interaction network of genes and their feature nodes(i.e.,AS events and pathways)connected by coexpression or pathway gene set knowledge.Next,ASpediaFI explores the interaction network using the random walk with restart algorithm and interrogates the proximity from a query gene set.Finally,ASpediaFI extracts significant AS events,genes,and pathways.To evaluate the performance of our method,we simulated RNA sequencing(RNA-seq)datasets to consider various conditions of sequencing depth and sample size.The performance was compared with that of other methods.Additionally,we analyzed three public datasets of cancer patients or cell lines to evaluate how well ASpediaFI detects biologically relevant candidates.ASpediaFI exhibits strong performance in both simulated and public datasets.Our integrative approach reveals that DAS events that recognize a global co-expression network and relevant pathways determine the functional importance of spliced genes in the subnetwork.ASpediaFI is publicly available at https://bioconductor.org/packages/ASpediaFI.