Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed dat...Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed data estimates the relative effect, whereas it is often the absolute effect of a predictor that is of interest. We propose a maximum likelihood (ML)-based approach to estimate a linear regression model on log-normal, heteroscedastic data. The new method was evaluated with a large simulation study. Log-normal observations were generated according to the simulation models and parameters were estimated using the new ML method, ordinary least-squares regression (LS) and weighed least-squares regression (WLS). All three methods produced unbiased estimates of parameters and expected response, and ML and WLS yielded smaller standard errors than LS. The approximate normality of the Wald statistic, used for tests of the ML estimates, in most situations produced correct type I error risk. Only ML and WLS produced correct confidence intervals for the estimated expected value. ML had the highest power for tests regarding β1.展开更多
One of the most powerful algorithms for obtaining maximum likelihood estimates for many incomplete-data problems is the EM algorithm.However,when the parameters satisfy a set of nonlinear restrictions,It is difficult ...One of the most powerful algorithms for obtaining maximum likelihood estimates for many incomplete-data problems is the EM algorithm.However,when the parameters satisfy a set of nonlinear restrictions,It is difficult to apply the EM algorithm directly.In this paper,we propose an asymptotic maximum likelihood estimation procedure under a set of nonlinear inequalities restrictions on the parameters,in which the EM algorithm can be used.Essentially this kind of estimation problem is a stochastic optimization problem in the M-step.We make use of methods in stochastic optimization to overcome the difficulty caused by nonlinearity in the given constraints.展开更多
There exist many iterative methods for computing the maximum likelihood estimator but most of them suffer from one or several drawbacks such as the need to inverse a Hessian matrix and the need to find good initial ap...There exist many iterative methods for computing the maximum likelihood estimator but most of them suffer from one or several drawbacks such as the need to inverse a Hessian matrix and the need to find good initial approximations of the parameters that are unknown in practice. In this paper, we present an estimation method without matrix inversion based on a linear approximation of the likelihood equations in a neighborhood of the constrained maximum likelihood estimator. We obtain closed-form approximations of solutions and standard errors. Then, we propose an iterative algorithm which cycles through the components of the vector parameter and updates one component at a time. The initial solution, which is necessary to start the iterative procedure, is automated. The proposed algorithm is compared to some of the best iterative optimization algorithms available on R and MATLAB software through a simulation study and applied to the statistical analysis of a road safety measure.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
自适应技术可以用较少的数据来调整声学模型参数,从而达到较好的语音识别效果,它们大多用于自适应有口音的语音。将最大似然线性回归(Maximum Likelihood Linear Regression,MLLR)、最大后验概率(Maximum A Posteriori,MAP)自适应技术...自适应技术可以用较少的数据来调整声学模型参数,从而达到较好的语音识别效果,它们大多用于自适应有口音的语音。将最大似然线性回归(Maximum Likelihood Linear Regression,MLLR)、最大后验概率(Maximum A Posteriori,MAP)自适应技术用在远场噪声混响环境下来分析其在此环境下的识别性能。实验结果表明,仿真条件下,在墙壁反射系数为0.6,各种噪声环境下MAP有最好的自适应性能,在信噪比(Signal-to-Noise Ratio,SNR)分别为5 dB、10 dB、15 dB时,MAP使远场连续语音词错率(Word Error Rate,WER)平均降低了1.51%、12.82%、2.95%。真实条件下,MAP使WER下降幅度最大达到了37.13%。进一步验证了MAP良好的渐进性,且当自适应句数为1 000时,用MAP声学模型自适应方法得到的远场噪声混响连续语音的识别词错率比自适应前平均降低了12.5%。展开更多
Logistic regression is usually used to model probabilities of categorical responses as functions of covariates. However, the link connecting the probabilities to the covariates is non-linear. We show in this paper tha...Logistic regression is usually used to model probabilities of categorical responses as functions of covariates. However, the link connecting the probabilities to the covariates is non-linear. We show in this paper that when the cross-classification of all the covariates and the dependent variable have no empty cells, then the probabilities of responses can be expressed as linear functions of the covariates. We demonstrate this for both the dichotmous and polytomous dependent variables.展开更多
文摘Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed data estimates the relative effect, whereas it is often the absolute effect of a predictor that is of interest. We propose a maximum likelihood (ML)-based approach to estimate a linear regression model on log-normal, heteroscedastic data. The new method was evaluated with a large simulation study. Log-normal observations were generated according to the simulation models and parameters were estimated using the new ML method, ordinary least-squares regression (LS) and weighed least-squares regression (WLS). All three methods produced unbiased estimates of parameters and expected response, and ML and WLS yielded smaller standard errors than LS. The approximate normality of the Wald statistic, used for tests of the ML estimates, in most situations produced correct type I error risk. Only ML and WLS produced correct confidence intervals for the estimated expected value. ML had the highest power for tests regarding β1.
基金Supported by Teaching reform project of Zhengzhou University of Science and Technology(KFCZ201909)National Foundation for Cultivating Scientific Research Projects of Zhengzhou Institute of Technology(GJJKTPY2018K4)+1 种基金Henan Big Data Double Base of Zhengzhou Institute of Technology(20174101546503022265)the Key Scientific Research Foundation of Education Bureau of Henan Province(20B110020)
文摘One of the most powerful algorithms for obtaining maximum likelihood estimates for many incomplete-data problems is the EM algorithm.However,when the parameters satisfy a set of nonlinear restrictions,It is difficult to apply the EM algorithm directly.In this paper,we propose an asymptotic maximum likelihood estimation procedure under a set of nonlinear inequalities restrictions on the parameters,in which the EM algorithm can be used.Essentially this kind of estimation problem is a stochastic optimization problem in the M-step.We make use of methods in stochastic optimization to overcome the difficulty caused by nonlinearity in the given constraints.
文摘There exist many iterative methods for computing the maximum likelihood estimator but most of them suffer from one or several drawbacks such as the need to inverse a Hessian matrix and the need to find good initial approximations of the parameters that are unknown in practice. In this paper, we present an estimation method without matrix inversion based on a linear approximation of the likelihood equations in a neighborhood of the constrained maximum likelihood estimator. We obtain closed-form approximations of solutions and standard errors. Then, we propose an iterative algorithm which cycles through the components of the vector parameter and updates one component at a time. The initial solution, which is necessary to start the iterative procedure, is automated. The proposed algorithm is compared to some of the best iterative optimization algorithms available on R and MATLAB software through a simulation study and applied to the statistical analysis of a road safety measure.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
文摘Logistic regression is usually used to model probabilities of categorical responses as functions of covariates. However, the link connecting the probabilities to the covariates is non-linear. We show in this paper that when the cross-classification of all the covariates and the dependent variable have no empty cells, then the probabilities of responses can be expressed as linear functions of the covariates. We demonstrate this for both the dichotmous and polytomous dependent variables.