The genome-wide association study (GWAS) is a powerful experimental design that is applied to detect disease susceptible genetic variants. The main goal of these studies is to provide a better understanding of the bio...The genome-wide association study (GWAS) is a powerful experimental design that is applied to detect disease susceptible genetic variants. The main goal of these studies is to provide a better understanding of the biology of disease, which further facilitates prevention or better treatment. A statistical inferential process is finally carried out in this study, where an association is usually observed between the single-nucleotide polymorphism (SNPs) and the traits in a case-control setting. To detect the disease responsible loci correctly, the investigation of the statistical association should be carefully conducted along with the other necessary steps. This research provides an introductory guideline for conducting such statistical association tests for these studies using SNP genotype data.展开更多
Under Type-Ⅱ progressively hybrid censoring, this paper discusses statistical inference and optimal design on stepstress partially accelerated life test for hybrid system in presence of masked data. It is assumed tha...Under Type-Ⅱ progressively hybrid censoring, this paper discusses statistical inference and optimal design on stepstress partially accelerated life test for hybrid system in presence of masked data. It is assumed that the lifetime of the component in hybrid systems follows independent and identical modified Weibull distributions. The maximum likelihood estimations(MLEs)of the unknown parameters, acceleration factor and reliability indexes are derived by using the Newton-Raphson algorithm. The asymptotic variance-covariance matrix and the approximate confidence intervals are obtained based on normal approximation to the asymptotic distribution of MLEs of model parameters. Moreover,two bootstrap confidence intervals are constructed by using the parametric bootstrap method. The optimal time of changing stress levels is determined under D-optimality and A-optimality criteria.Finally, the Monte Carlo simulation study is carried out to illustrate the proposed procedures.展开更多
Comparing two population proportions using confidence interval could be misleading in many cases, such </span><span style="font-family:Verdana;">as</span><span style="font-family:Ve...Comparing two population proportions using confidence interval could be misleading in many cases, such </span><span style="font-family:Verdana;">as</span><span style="font-family:Verdana;"> the sample size </span><span style="font-family:Verdana;">being</span><span style="font-family:Verdana;"> small and the test </span><span style="font-family:Verdana;">being</span><span style="font-family:Verdana;"> based on normal approximation. In this case, the only </span><span style="font-family:Verdana;">one</span><span style="font-family:Verdana;"> option that we have is to collect a large sample. Unfortunately, the large sample might not be possible. One example is a person suffering from a rare disease. The main purpose of this journal is to derive a closed formula for the exact distribution of the difference between two independent sample proportions, and use it to perform related inferences such as a confidence interval, regardless of the sample sizes and compare with the existing Wald, Agresti-Caffo </span><span style="font-family:Verdana;">and</span><span style="font-family:Verdana;"> Score. In this journal, we have derived a closed formula for the exact distribution of the difference between two independent sample proportions. This distribution doesn’t need any </span><span style="font-family:Verdana;">requirements,</span><span style="font-family:Verdana;"> and can be used to perform inferences such </span><span style="font-family:Verdana;">as:</span><span style="font-family:Verdana;"> a hypothesis test for two population proportions, regardless of the nature of the distribution and the sample sizes. We claim </span><span style="font-family:Verdana;">that</span><span style="font-family:Verdana;"> exact distribution has the </span><span style="font-family:Verdana;">least</span><span style="font-family:Verdana;"> confidence width among Wald, Agresti-Caffo </span><span style="font-family:Verdana;">and</span><span style="font-family:Verdana;"> Score, so it is suitable for inferences of the difference between the population proportion regardless of sample size.展开更多
This paper examines the effect of the observation time on source identification of a discrete-time susceptible-infectedrecovered diffusion process in a network with snapshot of partial nodes.We formulate the source id...This paper examines the effect of the observation time on source identification of a discrete-time susceptible-infectedrecovered diffusion process in a network with snapshot of partial nodes.We formulate the source identification problem as a maximum likelihood(ML)estimator and develop a statistical inference method based on Monte Carlo simulation(MCS)to estimate the source location and the initial time of diffusion.Experimental results in synthetic networks and real-world networks demonstrate evident impact of the observation time as well as the fraction of the observers on the concerned problem.展开更多
The present article gives an introduction to information geometry and surveys its applications in the area of machine learning,optimization and statistical inference.Information geometry is explained intuitively by us...The present article gives an introduction to information geometry and surveys its applications in the area of machine learning,optimization and statistical inference.Information geometry is explained intuitively by using divergence functions introduced in a manifold of probability distributions and other general manifolds.They give a Riemannian structure together with a pair of dual flatness criteria.Many manifolds are dually flat.When a manifold is dually flat,a generalized Pythagorean theorem and related projection theorem are introduced.They provide useful means for various approximation and optimization problems.We apply them to alternative minimization problems,Ying-Yang machines and belief propagation algorithm in machine learning.展开更多
In this paper, a probabilistic distribution of fatigue strength as a function of fatigue life is discussed. The transformation method between fatigue life and fatigue strength developed in the past by other methods ...In this paper, a probabilistic distribution of fatigue strength as a function of fatigue life is discussed. The transformation method between fatigue life and fatigue strength developed in the past by other methods did not deal with the fact that the scatter factor of fatigue life increases at a low stress level. A modified model of fatigue strength distribution based on the fatigue scatter investigation is presented in this paper. The results of the statistical inferences from a large sample fatigue test indicate that the proposed model can be accepted at a 10% level of significance.展开更多
Let {Tn } be a renewal process in R+ representing the successive arrival times of some natural events. We studied this process by using a record process approach under the assumption that the interarrival times T,, =...Let {Tn } be a renewal process in R+ representing the successive arrival times of some natural events. We studied this process by using a record process approach under the assumption that the interarrival times T,, = Tn, - Ta-1, n = 1, 2...are exponentially i.i.d (independent and identically distributed). The goal is to test that the first observed events are sporadic events. For testing the hypothesis "sporadic" we used the non-parametric test based on the probability distribution of the statistic of the number of records N, among{Xx }k-1= where Xk = (ΔTk)-1. We showed that it is independent of the cumulative distribution of the observations and that it is exactly calculated for each n. We illustrated this statistic on a simulated trajectory and we compared it with descriptive smoothing methods. We studied an application to a data set as storms in France and US.展开更多
Market beta is a measure of the volatility or systematic risk of a security or portfolio compared to the market as a whole. This paper considers the distributed estimation of market beta in the case of massive data, a...Market beta is a measure of the volatility or systematic risk of a security or portfolio compared to the market as a whole. This paper considers the distributed estimation of market beta in the case of massive data, and obtains the consistency and asymptotic normality of the estimator. Further, simulations show the finite sample properties of this estimator.展开更多
We study a new family of random variables that each arise as the distribution of the maximum or minimum of a random number N of i.i.d. random variables X<sub>1</sub>, X<sub>2</sub>,…, X<sub...We study a new family of random variables that each arise as the distribution of the maximum or minimum of a random number N of i.i.d. random variables X<sub>1</sub>, X<sub>2</sub>,…, X<sub>N</sub>, each distributed as a variable X with support on [0, 1]. The general scheme is first outlined, and several special cases are studied in detail. Wherever appropriate, we find estimates of the parameter θ in the one-parameter family in question.展开更多
Background:Dengue,an infectious tropical disease,has recently emerged as one of the most important mosquito-borne viral diseases in the world.We perform a retrospective analysis of the 2011 dengue fever epidemic in Pa...Background:Dengue,an infectious tropical disease,has recently emerged as one of the most important mosquito-borne viral diseases in the world.We perform a retrospective analysis of the 2011 dengue fever epidemic in Pakistan in order to assess the transmissibility of the disease.We obtain estimates of the basic reproduction number R0 from epidemic data using different methodologies applied to different epidemic models in order to evaluate the robustness of our estimate.Results:We first estimate model parameters by fitting a deterministic ODE vector-host model for the transmission dynamics of single-strain dengue to the epidemic data,using both a basic ordinary least squares(OLS)as well as a generalized least squares(GLS)scheme.Moreover,we perform the same analysis for a direct-transmission ODE model,thereby allowing us to compare our results across different models.In addition,we formulate a direct-transmission stochastic model for the transmission dynamics of dengue and obtain parameter estimates for the stochastic model using Markov chain Monte Carlo(MCMC)methods.In each of the cases we have considered,the estimate for the basic reproduction number R0 is initially greater than unity leading to an epidemic outbreak.However,control measures implemented several weeks after the initial outbreak successfully reduce R0 to less than unity,thus resulting in disease elimination.Furthermore,it is observed that there is strong agreement in our estimates for the pre-control value of R0,both across different methodologies as well across different models.However,there are also significant differences between our estimates for the post-control value of the basic reproduction number across the two different models.Conclusion:In conclusion,we have obtained robust estimates for the value of the basic reproduction number R0 associated with the 2011 dengue fever epidemic before the implementation of public health control measures.Furthermore,we have shown that there is close agreement between our estimates for the post-control value of R0 across the different methodologies.Nevertheless,there are also significant differences between the estimates for the post-control value of R0 across the two different models.展开更多
In the paper,the autoregressive moving average model for matrix time series(MARMA)is inves-tigated.The properties of the MARMA model are investigated by using the conditional least square estimation,the conditional ma...In the paper,the autoregressive moving average model for matrix time series(MARMA)is inves-tigated.The properties of the MARMA model are investigated by using the conditional least square estimation,the conditional maximum likelihood estimation,the projection theorem in Hilbert space and the decomposition technique of time series,which include necessary and suf-ficient conditions for stationarity and invertibility,model parameter estimation,model testing and model forecasting.展开更多
New statistics are proposed to estimate and test the structural change when the data dimension is comparable to or larger than the sample size. Consistency of the new statistic in estimating the change point position ...New statistics are proposed to estimate and test the structural change when the data dimension is comparable to or larger than the sample size. Consistency of the new statistic in estimating the change point position is established under the alternative hypothesis. The asymptotic distribution of the new statistic in testing the existence of a change point is obtained under the null hypothesis. Some simulation results are presented which show that the numerical performance of our method is satisfactory. The method is illustrated via the analysis of the house price index of US.展开更多
文摘The genome-wide association study (GWAS) is a powerful experimental design that is applied to detect disease susceptible genetic variants. The main goal of these studies is to provide a better understanding of the biology of disease, which further facilitates prevention or better treatment. A statistical inferential process is finally carried out in this study, where an association is usually observed between the single-nucleotide polymorphism (SNPs) and the traits in a case-control setting. To detect the disease responsible loci correctly, the investigation of the statistical association should be carefully conducted along with the other necessary steps. This research provides an introductory guideline for conducting such statistical association tests for these studies using SNP genotype data.
基金supported by the National Natural Science Foundation of China(71401134 71571144+1 种基金 71171164)the Program of International Cooperation and Exchanges in Science and Technology Funded by Shaanxi Province(2016KW-033)
文摘Under Type-Ⅱ progressively hybrid censoring, this paper discusses statistical inference and optimal design on stepstress partially accelerated life test for hybrid system in presence of masked data. It is assumed that the lifetime of the component in hybrid systems follows independent and identical modified Weibull distributions. The maximum likelihood estimations(MLEs)of the unknown parameters, acceleration factor and reliability indexes are derived by using the Newton-Raphson algorithm. The asymptotic variance-covariance matrix and the approximate confidence intervals are obtained based on normal approximation to the asymptotic distribution of MLEs of model parameters. Moreover,two bootstrap confidence intervals are constructed by using the parametric bootstrap method. The optimal time of changing stress levels is determined under D-optimality and A-optimality criteria.Finally, the Monte Carlo simulation study is carried out to illustrate the proposed procedures.
文摘Comparing two population proportions using confidence interval could be misleading in many cases, such </span><span style="font-family:Verdana;">as</span><span style="font-family:Verdana;"> the sample size </span><span style="font-family:Verdana;">being</span><span style="font-family:Verdana;"> small and the test </span><span style="font-family:Verdana;">being</span><span style="font-family:Verdana;"> based on normal approximation. In this case, the only </span><span style="font-family:Verdana;">one</span><span style="font-family:Verdana;"> option that we have is to collect a large sample. Unfortunately, the large sample might not be possible. One example is a person suffering from a rare disease. The main purpose of this journal is to derive a closed formula for the exact distribution of the difference between two independent sample proportions, and use it to perform related inferences such as a confidence interval, regardless of the sample sizes and compare with the existing Wald, Agresti-Caffo </span><span style="font-family:Verdana;">and</span><span style="font-family:Verdana;"> Score. In this journal, we have derived a closed formula for the exact distribution of the difference between two independent sample proportions. This distribution doesn’t need any </span><span style="font-family:Verdana;">requirements,</span><span style="font-family:Verdana;"> and can be used to perform inferences such </span><span style="font-family:Verdana;">as:</span><span style="font-family:Verdana;"> a hypothesis test for two population proportions, regardless of the nature of the distribution and the sample sizes. We claim </span><span style="font-family:Verdana;">that</span><span style="font-family:Verdana;"> exact distribution has the </span><span style="font-family:Verdana;">least</span><span style="font-family:Verdana;"> confidence width among Wald, Agresti-Caffo </span><span style="font-family:Verdana;">and</span><span style="font-family:Verdana;"> Score, so it is suitable for inferences of the difference between the population proportion regardless of sample size.
基金the National Natural Science Foundation of China(Grant Nos.61673027 and 62106047)the Beijing Social Science Foundation(Grant No.21GLC042)the Humanity and Social Science Youth foundation of Ministry of Education,China(Grant No.20YJCZH228)。
文摘This paper examines the effect of the observation time on source identification of a discrete-time susceptible-infectedrecovered diffusion process in a network with snapshot of partial nodes.We formulate the source identification problem as a maximum likelihood(ML)estimator and develop a statistical inference method based on Monte Carlo simulation(MCS)to estimate the source location and the initial time of diffusion.Experimental results in synthetic networks and real-world networks demonstrate evident impact of the observation time as well as the fraction of the observers on the concerned problem.
文摘The present article gives an introduction to information geometry and surveys its applications in the area of machine learning,optimization and statistical inference.Information geometry is explained intuitively by using divergence functions introduced in a manifold of probability distributions and other general manifolds.They give a Riemannian structure together with a pair of dual flatness criteria.Many manifolds are dually flat.When a manifold is dually flat,a generalized Pythagorean theorem and related projection theorem are introduced.They provide useful means for various approximation and optimization problems.We apply them to alternative minimization problems,Ying-Yang machines and belief propagation algorithm in machine learning.
文摘In this paper, a probabilistic distribution of fatigue strength as a function of fatigue life is discussed. The transformation method between fatigue life and fatigue strength developed in the past by other methods did not deal with the fact that the scatter factor of fatigue life increases at a low stress level. A modified model of fatigue strength distribution based on the fatigue scatter investigation is presented in this paper. The results of the statistical inferences from a large sample fatigue test indicate that the proposed model can be accepted at a 10% level of significance.
文摘Let {Tn } be a renewal process in R+ representing the successive arrival times of some natural events. We studied this process by using a record process approach under the assumption that the interarrival times T,, = Tn, - Ta-1, n = 1, 2...are exponentially i.i.d (independent and identically distributed). The goal is to test that the first observed events are sporadic events. For testing the hypothesis "sporadic" we used the non-parametric test based on the probability distribution of the statistic of the number of records N, among{Xx }k-1= where Xk = (ΔTk)-1. We showed that it is independent of the cumulative distribution of the observations and that it is exactly calculated for each n. We illustrated this statistic on a simulated trajectory and we compared it with descriptive smoothing methods. We studied an application to a data set as storms in France and US.
文摘Market beta is a measure of the volatility or systematic risk of a security or portfolio compared to the market as a whole. This paper considers the distributed estimation of market beta in the case of massive data, and obtains the consistency and asymptotic normality of the estimator. Further, simulations show the finite sample properties of this estimator.
文摘We study a new family of random variables that each arise as the distribution of the maximum or minimum of a random number N of i.i.d. random variables X<sub>1</sub>, X<sub>2</sub>,…, X<sub>N</sub>, each distributed as a variable X with support on [0, 1]. The general scheme is first outlined, and several special cases are studied in detail. Wherever appropriate, we find estimates of the parameter θ in the one-parameter family in question.
文摘Background:Dengue,an infectious tropical disease,has recently emerged as one of the most important mosquito-borne viral diseases in the world.We perform a retrospective analysis of the 2011 dengue fever epidemic in Pakistan in order to assess the transmissibility of the disease.We obtain estimates of the basic reproduction number R0 from epidemic data using different methodologies applied to different epidemic models in order to evaluate the robustness of our estimate.Results:We first estimate model parameters by fitting a deterministic ODE vector-host model for the transmission dynamics of single-strain dengue to the epidemic data,using both a basic ordinary least squares(OLS)as well as a generalized least squares(GLS)scheme.Moreover,we perform the same analysis for a direct-transmission ODE model,thereby allowing us to compare our results across different models.In addition,we formulate a direct-transmission stochastic model for the transmission dynamics of dengue and obtain parameter estimates for the stochastic model using Markov chain Monte Carlo(MCMC)methods.In each of the cases we have considered,the estimate for the basic reproduction number R0 is initially greater than unity leading to an epidemic outbreak.However,control measures implemented several weeks after the initial outbreak successfully reduce R0 to less than unity,thus resulting in disease elimination.Furthermore,it is observed that there is strong agreement in our estimates for the pre-control value of R0,both across different methodologies as well across different models.However,there are also significant differences between our estimates for the post-control value of the basic reproduction number across the two different models.Conclusion:In conclusion,we have obtained robust estimates for the value of the basic reproduction number R0 associated with the 2011 dengue fever epidemic before the implementation of public health control measures.Furthermore,we have shown that there is close agreement between our estimates for the post-control value of R0 across the different methodologies.Nevertheless,there are also significant differences between the estimates for the post-control value of R0 across the two different models.
基金This paper is partially supported by the basic scientific research business expenses of Universities in Xinjiang,China[Grant Number XQZX20230057]the National Natural Science Foundation of China[Grant Number 11671142].
文摘In the paper,the autoregressive moving average model for matrix time series(MARMA)is inves-tigated.The properties of the MARMA model are investigated by using the conditional least square estimation,the conditional maximum likelihood estimation,the projection theorem in Hilbert space and the decomposition technique of time series,which include necessary and suf-ficient conditions for stationarity and invertibility,model parameter estimation,model testing and model forecasting.
基金supported by National Natural Science Foundation of China (Grant No. 11571337)the Ministry of Education of Singapore (Grant No. # ARC 14/11)the National University of Singapore (Grant No. R-155-151-112)
文摘New statistics are proposed to estimate and test the structural change when the data dimension is comparable to or larger than the sample size. Consistency of the new statistic in estimating the change point position is established under the alternative hypothesis. The asymptotic distribution of the new statistic in testing the existence of a change point is obtained under the null hypothesis. Some simulation results are presented which show that the numerical performance of our method is satisfactory. The method is illustrated via the analysis of the house price index of US.