The diameter distribution function(DDF)is a crucial tool for accurately predicting stand carbon storage(CS).The current key issue,however,is how to construct a high-precision DDF based on stand factors,site quality,an...The diameter distribution function(DDF)is a crucial tool for accurately predicting stand carbon storage(CS).The current key issue,however,is how to construct a high-precision DDF based on stand factors,site quality,and aridity index to predict stand CS in multi-species mixed forests with complex structures.This study used data from70 survey plots for mixed broadleaf Populus davidiana and Betula platyphylla forests in the Mulan Rangeland State Forest,Hebei Province,China,to construct the DDF based on maximum likelihood estimation and finite mixture model(FMM).Ordinary least squares(OLS),linear seemingly unrelated regression(LSUR),and back propagation neural network(BPNN)were used to investigate the influences of stand factors,site quality,and aridity index on the shape and scale parameters of DDF and predicted stand CS of mixed broadleaf forests.The results showed that FMM accurately described the stand-level diameter distribution of the mixed P.davidiana and B.platyphylla forests;whereas the Weibull function constructed by MLE was more accurate in describing species-level diameter distribution.The combined variable of quadratic mean diameter(Dq),stand basal area(BA),and site quality improved the accuracy of the shape parameter models of FMM;the combined variable of Dq,BA,and De Martonne aridity index improved the accuracy of the scale parameter models.Compared to OLS and LSUR,the BPNN had higher accuracy in the re-parameterization process of FMM.OLS,LSUR,and BPNN overestimated the CS of P.davidiana but underestimated the CS of B.platyphylla in the large diameter classes(DBH≥18 cm).BPNN accurately estimated stand-and species-level CS,but it was more suitable for estimating stand-level CS compared to species-level CS,thereby providing a scientific basis for the optimization of stand structure and assessment of carbon sequestration capacity in mixed broadleaf forests.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calcula...In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calculation method of selection statistic and an applied example.展开更多
Recursive algorithms are very useful for computing M-estimators of regression coefficients and scatter parameters. In this article, it is shown that for a nondecreasing ul (t), under some mild conditions the recursi...Recursive algorithms are very useful for computing M-estimators of regression coefficients and scatter parameters. In this article, it is shown that for a nondecreasing ul (t), under some mild conditions the recursive M-estimators of regression coefficients and scatter parameters are strongly consistent and the recursive M-estimator of the regression coefficients is also asymptotically normal distributed. Furthermore, optimal recursive M-estimators, asymptotic efficiencies of recursive M-estimators and asymptotic relative efficiencies between recursive M-estimators of regression coefficients are studied.展开更多
In this paper, we study some robustness aspects of linear regression models of the presence of outliers or discordant observations considering the use of stable distributions for the response in place of the usual nor...In this paper, we study some robustness aspects of linear regression models of the presence of outliers or discordant observations considering the use of stable distributions for the response in place of the usual normality assumption. It is well known that, in general, there is no closed form for the probability density function of stable distributions. However, under a Bayesian approach, the use of a latent or auxiliary random variable gives some simplification to obtain any posterior distribution when related to stable distributions. To show the usefulness of the computational aspects, the methodology is applied to two examples: one is related to a standard linear regression model with an explanatory variable and the other is related to a simulated data set assuming a 23 factorial experiment. Posterior summaries of interest are obtained using MCMC (Markov Chain Monte Carlo) methods and the OpenBugs software.展开更多
Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model int...Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.展开更多
Multiple linear regression (MLR) method was applied to quantify the effects of the net heat flux (NHF), the net freshwater flux (NFF) and the wind stress on the mixed layer depth (MLD) of the South China Sea ...Multiple linear regression (MLR) method was applied to quantify the effects of the net heat flux (NHF), the net freshwater flux (NFF) and the wind stress on the mixed layer depth (MLD) of the South China Sea (SCS) based on the simple ocean data assimilation (SODA) dataset. The spatio-temporal distributions of the MLD, the buoyancy flux (combining the NHF and the NFF) and the wind stress of the SCS were presented. Then using an oceanic vertical mixing model, the MLD after a certain time under the same initial conditions but various pairs of boundary conditions (the three factors) was simulated. Applying the MLR method to the results, regression equations which modeling the relationship between the simulated MLD and the three factors were calculated. The equations indicate that when the NHF was negative, it was the primary driver of the mixed layer deepening; and when the NHF was positive, the wind stress played a more important role than that of the NHF while the NFF had the least effect. When the NHF was positive, the relative quantitative effects of the wind stress, the NHF, and the NFF were about i0, 6 and 2. The above conclusions were applied to explaining the spatio-temporal distributions of the MLD in the SCS and thus proved to be valid.展开更多
This study explored and reviewed the logistic regression (LR) model, a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, with emphasis on m...This study explored and reviewed the logistic regression (LR) model, a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, with emphasis on medical research. Thirty seven research articles published between 2000 and 2018 which employed logistic regression as the main statistical tool as well as six text books on logistic regression were reviewed. Logistic regression concepts such as odds, odds ratio, logit transformation, logistic curve, assumption, selecting dependent and independent variables, model fitting, reporting and interpreting were presented. Upon perusing the literature, considerable deficiencies were found in both the use and reporting of LR. For many studies, the ratio of the number of outcome events to predictor variables (events per variable) was sufficiently small to call into question the accuracy of the regression model. Also, most studies did not report on validation analysis, regression diagnostics or goodness-of-fit measures;measures which authenticate the robustness of the LR model. Here, we demonstrate a good example of the application of the LR model using data obtained on a cohort of pregnant women and the factors that influence their decision to opt for caesarean delivery or vaginal birth. It is recommended that researchers should be more rigorous and pay greater attention to guidelines concerning the use and reporting of LR models.展开更多
The purpose of this study was to determine a suitable model for investigating the effects of climate factors on the area burned by forest fire in the Tahe forest region, Daxing'an Mountains, in northeast China. The r...The purpose of this study was to determine a suitable model for investigating the effects of climate factors on the area burned by forest fire in the Tahe forest region, Daxing'an Mountains, in northeast China. The response variables were the area burned by lightning- caused fire, human-caused fire, and total burned area. The predictor variables were nine climate variables collected from the local weather station. Three regression models were utilized, including multiple linear regression, log- linear model (log-transformation on both response and predictor variables), and gamma-generalized linear model. The goodness-of-fit of the models were compared based on model fitting statistics such as R2, AIC, and RMSE. The results revealed that the gamma-generalized linear model was generally superior to both multiple linear regressionmodel and log-linear model for fitting the fire data. Further, the best models were selected based on the criteria that the climate variables were statistically significant at at = 0.05. The gamma best models indicated that maximum wind speed, precipitation, and days that rainfall greater than 0.1 mm had significant impacts on the area burned by the lightning-caused fire, while the mean temperature and minimum relative humidity were the .main drivers of the burned area caused by human activities. Overall, the total burned area by forest fire was significantly influenced by days that rainfall greater than 0.1 mm and minimum rela- tive humidity, indicating that the moisture condition of forest stands determine the burned area by forest fire.展开更多
In this paper we consider the empirical Bayes (EB) estimation problem for estimable function of regression coefficient in a multiple linear regression model Y=Xβ+e. where e with given β has a multivariate standard n...In this paper we consider the empirical Bayes (EB) estimation problem for estimable function of regression coefficient in a multiple linear regression model Y=Xβ+e. where e with given β has a multivariate standard normal distribution. We get the EB estimators by using kernel estimation of multivariate density function and its first order partial derivatives. It is shown that the convergence rates of the EB estimators are under the condition where an integer k > 1 . is an arbitrary small number and m is the dimension of the vector Y.展开更多
This paper transforms fuzzy number into clear number using the centroid method, thus we can research the traditional linear regression model which is transformed from the fuzzy linear regression model. The model’s in...This paper transforms fuzzy number into clear number using the centroid method, thus we can research the traditional linear regression model which is transformed from the fuzzy linear regression model. The model’s input and output are fuzzy numbers, and the regression coefficients are clear numbers. This paper considers the parameter estimation and impact analysis based on data deletion. Through the study of example and comparison with other models, it can be concluded that the model in this paper is applied easily and better.展开更多
The data on the coal production and consumption in Jilin Province for the last ten years were collected,and the Grey System GM( 1,1) model and unary linear regression model were applied to predict the coal consumption...The data on the coal production and consumption in Jilin Province for the last ten years were collected,and the Grey System GM( 1,1) model and unary linear regression model were applied to predict the coal consumption of Jilin Production in 2014 and 2015. Through calculation,the predictive value on the coal consumption of Jilin Province was attained,namely consumption of 2014 is 114. 84 × 106 t and of 2015 is 117. 98 ×106t,respectively. Analysis of error data indicated that the predicted accuracy of Grey System GM( 1,1) model on the coal consumption in Jilin Province improved 0. 21% in comparison to unary linear regression model.展开更多
A linear regression model in conjunction with cluster analysis was applied to the groundwater quality parameters for the Vaniyambadi industrial area, Tamil Nadu, India. These physico-chemical parameters were collected...A linear regression model in conjunction with cluster analysis was applied to the groundwater quality parameters for the Vaniyambadi industrial area, Tamil Nadu, India. These physico-chemical parameters were collected from 25 wells by intensive groundwater sampling conducted during January 2010. All the major ions, pH and electrical conductivity were analyzed. The abundances of cations were in the order of Na <Ca <Mg <K and those of anions were in the order of Cl <HCO3 <SO4 <CO3, respectively. This was in agreement with the water types, Na-Cl and Na-Ca-HCO3, determined by the Piper plot. High concentrations of the ions Na, Cl and SO4 were recorded near the tanneries that operate within the study area. While the elevated concentrations of HCO3 and F were observed away from the tanneries. This peculiar hydrochemical behaviour suggests that the chemistry of water is predominantly influenced by tannery effluents and weathering of silicate minerals. Results of the linear regression model yielded 11 regression equations for the 5 most correlated parameters. A dendrogram from the cluster analysis showed 2 major clusters representing the influence of tanneries and geological formations in the study area, which confirmed the results of major ion chemistry.展开更多
Taking the nonlinear nature of runoff system into account,and combining auto-regression method and multi-regression method,a Nonlinear Mixed Regression Model (NMR) was established to analyze the impact of temperature ...Taking the nonlinear nature of runoff system into account,and combining auto-regression method and multi-regression method,a Nonlinear Mixed Regression Model (NMR) was established to analyze the impact of temperature and precipitation changes on annual river runoff process. The model was calibrated and verified by using BP neural network with observed meteorological and runoff data from Daiying Hydrological Station in the Chaohe River of Hebei Province in 1956–2000. Compared with auto-regression model,linear multi-regression model and linear mixed regression model,NMR can improve forecasting precision remarkably. Therefore,the simulation of climate change scenarios was carried out by NMR. The results show that the nonlinear mixed regression model can simulate annual river runoff well.展开更多
Cost effective sampling design is a major concern in some experiments especially when the measurement of the characteristic of interest is costly or painful or time consuming.Ranked set sampling(RSS)was first proposed...Cost effective sampling design is a major concern in some experiments especially when the measurement of the characteristic of interest is costly or painful or time consuming.Ranked set sampling(RSS)was first proposed by McIntyre[1952.A method for unbiased selective sampling,using ranked sets.Australian Journal of Agricultural Research 3,385-390]as an effective way to estimate the pasture mean.In the current paper,a modification of ranked set sampling called moving extremes ranked set sampling(MERSS)is considered for the best linear unbiased estimators(BLUEs)for the simple linear regression model.The BLUEs for this model under MERSS are derived.The BLUEs under MERSS are shown to be markedly more efficient for normal data when compared with the BLUEs under simple random sampling.展开更多
In this paper, we investigate the empirical likelihood diagnosis of modal linear regression models. The empirical likelihood ratio function based on modal regression estimation method for the regression coefficient is...In this paper, we investigate the empirical likelihood diagnosis of modal linear regression models. The empirical likelihood ratio function based on modal regression estimation method for the regression coefficient is introduced. First, the estimation equation based on empirical likelihood method is established. Then, some diagnostic statistics are proposed. At last, we also examine the performance of proposed method for finite sample sizes through simulation study.展开更多
In this paper, the frequency of an earthquake occurrence and magnitude relationship has been modeled with generalized linear models for the set of earthquake data of Nepal. A goodness of fit of a statistical model is ...In this paper, the frequency of an earthquake occurrence and magnitude relationship has been modeled with generalized linear models for the set of earthquake data of Nepal. A goodness of fit of a statistical model is applied for generalized linear models and considering the model selection information criterion, Akaike information criterion and Bayesian information criterion, generalized Poisson regression model has been selected as a suitable model for the study. The objective of this study is to determine the parameters (a and b values), estimate the probability of an earthquake occurrence and its return period using a Poisson regression model and compared with the Gutenberg-Richter model. The study suggests that the probabilities of earthquake occurrences and return periods estimated by both the models are relatively close to each other. The return periods from the generalized Poisson regression model are comparatively smaller than the Gutenberg-Richter model.展开更多
The development of many estimators of parameters of linear regression model is traceable to non-validity of the assumptions under which the model is formulated, especially when applied to real life situation. This not...The development of many estimators of parameters of linear regression model is traceable to non-validity of the assumptions under which the model is formulated, especially when applied to real life situation. This notwithstanding, regression analysis may aim at prediction. Consequently, this paper examines the performances of the Ordinary Least Square (OLS) estimator, Cochrane-Orcutt (COR) estimator, Maximum Likelihood (ML) estimator and the estimators based on Principal Component (PC) analysis in prediction of linear regression model under the joint violations of the assumption of non-stochastic regressors, independent regressors and error terms. With correlated stochastic normal variables as regressors and autocorrelated error terms, Monte-Carlo experiments were conducted and the study further identifies the best estimator that can be used for prediction purpose by adopting the goodness of fit statistics of the estimators. From the results, it is observed that the performances of COR at each level of correlation (multicollinearity) and that of ML, especially when the sample size is large, over the levels of autocorrelation have a convex-like pattern while that of OLS and PC are concave-like. Also, as the levels of multicollinearity increase, the estimators, except the PC estimators when multicollinearity is negative, rapidly perform better over the levels autocorrelation. The COR and ML estimators are generally best for prediction in the presence of multicollinearity and autocorrelated error terms. However, at low levels of autocorrelation, the OLS estimator is either best or competes consistently with the best estimator, while the PC estimator is either best or competes with the best when multicollinearity level is high(λ>0.8 or λ-0.49).展开更多
Piecewise linear regression models are very flexible models for modeling the data. If the piecewise linear regression models are matched against the data, then the parameters are generally not known. This paper studie...Piecewise linear regression models are very flexible models for modeling the data. If the piecewise linear regression models are matched against the data, then the parameters are generally not known. This paper studies the problem of parameter estimation ofpiecewise linear regression models. The method used to estimate the parameters ofpicewise linear regression models is Bayesian method. But the Bayes estimator can not be found analytically. To overcome these problems, the reversible jump MCMC (Marcov Chain Monte Carlo) algorithm is proposed. Reversible jump MCMC algorithm generates the Markov chain converges to the limit distribution of the posterior distribution of the parameters ofpicewise linear regression models. The resulting Markov chain is used to calculate the Bayes estimator for the parameters of picewise linear regression models.展开更多
This paper selects seven indicators of financial revenue and housing sales price in recent 19 years in China,and uses SPSS and Excel to carry out descriptive statistics,independent sample t-test,correlation analysis a...This paper selects seven indicators of financial revenue and housing sales price in recent 19 years in China,and uses SPSS and Excel to carry out descriptive statistics,independent sample t-test,correlation analysis and regression analysis to comprehensively study the correlation between financial revenue and housing sales price in China,and establishes the relationship between financial revenue and housing sales price When the average selling price of commercial housing increases by one unit,the fiscal revenue will increase by 27.855 points.展开更多
基金funded by the National Key Research and Development Program of China(No.2022YFD2200503-02)。
文摘The diameter distribution function(DDF)is a crucial tool for accurately predicting stand carbon storage(CS).The current key issue,however,is how to construct a high-precision DDF based on stand factors,site quality,and aridity index to predict stand CS in multi-species mixed forests with complex structures.This study used data from70 survey plots for mixed broadleaf Populus davidiana and Betula platyphylla forests in the Mulan Rangeland State Forest,Hebei Province,China,to construct the DDF based on maximum likelihood estimation and finite mixture model(FMM).Ordinary least squares(OLS),linear seemingly unrelated regression(LSUR),and back propagation neural network(BPNN)were used to investigate the influences of stand factors,site quality,and aridity index on the shape and scale parameters of DDF and predicted stand CS of mixed broadleaf forests.The results showed that FMM accurately described the stand-level diameter distribution of the mixed P.davidiana and B.platyphylla forests;whereas the Weibull function constructed by MLE was more accurate in describing species-level diameter distribution.The combined variable of quadratic mean diameter(Dq),stand basal area(BA),and site quality improved the accuracy of the shape parameter models of FMM;the combined variable of Dq,BA,and De Martonne aridity index improved the accuracy of the scale parameter models.Compared to OLS and LSUR,the BPNN had higher accuracy in the re-parameterization process of FMM.OLS,LSUR,and BPNN overestimated the CS of P.davidiana but underestimated the CS of B.platyphylla in the large diameter classes(DBH≥18 cm).BPNN accurately estimated stand-and species-level CS,but it was more suitable for estimating stand-level CS compared to species-level CS,thereby providing a scientific basis for the optimization of stand structure and assessment of carbon sequestration capacity in mixed broadleaf forests.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
基金Supported by the Natural Science Foundation of Anhui Education Committee
文摘In this paper, based on the theory of parameter estimation, we give a selection method and, in a sense of a good character of the parameter estimation, we think that it is very reasonable. Moreover, we offer a calculation method of selection statistic and an applied example.
基金supported by the Natural Sciences and Engineering Research Council of Canadathe National Natural Science Foundation of China+2 种基金the Doctorial Fund of Education Ministry of Chinasupported by the Natural Sciences and Engineering Research Council of Canadasupported by the National Natural Science Foundation of China
文摘Recursive algorithms are very useful for computing M-estimators of regression coefficients and scatter parameters. In this article, it is shown that for a nondecreasing ul (t), under some mild conditions the recursive M-estimators of regression coefficients and scatter parameters are strongly consistent and the recursive M-estimator of the regression coefficients is also asymptotically normal distributed. Furthermore, optimal recursive M-estimators, asymptotic efficiencies of recursive M-estimators and asymptotic relative efficiencies between recursive M-estimators of regression coefficients are studied.
基金financial support from the Brazilian Institution Conselho Nacional de Desenvolvimento Cientifico e Tecnologico(CNPq).
文摘In this paper, we study some robustness aspects of linear regression models of the presence of outliers or discordant observations considering the use of stable distributions for the response in place of the usual normality assumption. It is well known that, in general, there is no closed form for the probability density function of stable distributions. However, under a Bayesian approach, the use of a latent or auxiliary random variable gives some simplification to obtain any posterior distribution when related to stable distributions. To show the usefulness of the computational aspects, the methodology is applied to two examples: one is related to a standard linear regression model with an explanatory variable and the other is related to a simulated data set assuming a 23 factorial experiment. Posterior summaries of interest are obtained using MCMC (Markov Chain Monte Carlo) methods and the OpenBugs software.
基金This work was supported by the 2021 Project of the“14th Five-Year Plan”of Shaanxi Education Science“Research on the Application of Educational Data Mining in Applied Undergraduate Teaching-Taking the Course of‘Computer Application Technology’as an Example”(SGH21Y0403)the Teaching Reform and Research Projects for Practical Teaching in 2022“Research on Practical Teaching of Applied Undergraduate Projects Based on‘Combination of Courses and Certificates”-Taking Computer Application Technology Courses as an Example”(SJJG02012)the 11th batch of Teaching Reform Research Project of Xi’an Jiaotong University City College“Project-Driven Cultivation and Research on Information Literacy of Applied Undergraduate Students in the Information Times-Taking Computer Application Technology Course Teaching as an Example”(111001).
文摘Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.
基金The National Natural Science Foundation of China under contract No.11174235the Science and Technology Development Project of Shaanxi Province of China under contract No.2010KJXX-02+2 种基金the Program for New Century Excellent Talents in University of China under contract No. NCET-08-0455the Science and Technology Innovation Foundation of Northwestern Polytechnical University of Chinathe Doctorate Foundation of Northwestern Polytechnical University of China under contract No.CX201226.
文摘Multiple linear regression (MLR) method was applied to quantify the effects of the net heat flux (NHF), the net freshwater flux (NFF) and the wind stress on the mixed layer depth (MLD) of the South China Sea (SCS) based on the simple ocean data assimilation (SODA) dataset. The spatio-temporal distributions of the MLD, the buoyancy flux (combining the NHF and the NFF) and the wind stress of the SCS were presented. Then using an oceanic vertical mixing model, the MLD after a certain time under the same initial conditions but various pairs of boundary conditions (the three factors) was simulated. Applying the MLR method to the results, regression equations which modeling the relationship between the simulated MLD and the three factors were calculated. The equations indicate that when the NHF was negative, it was the primary driver of the mixed layer deepening; and when the NHF was positive, the wind stress played a more important role than that of the NHF while the NFF had the least effect. When the NHF was positive, the relative quantitative effects of the wind stress, the NHF, and the NFF were about i0, 6 and 2. The above conclusions were applied to explaining the spatio-temporal distributions of the MLD in the SCS and thus proved to be valid.
文摘This study explored and reviewed the logistic regression (LR) model, a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, with emphasis on medical research. Thirty seven research articles published between 2000 and 2018 which employed logistic regression as the main statistical tool as well as six text books on logistic regression were reviewed. Logistic regression concepts such as odds, odds ratio, logit transformation, logistic curve, assumption, selecting dependent and independent variables, model fitting, reporting and interpreting were presented. Upon perusing the literature, considerable deficiencies were found in both the use and reporting of LR. For many studies, the ratio of the number of outcome events to predictor variables (events per variable) was sufficiently small to call into question the accuracy of the regression model. Also, most studies did not report on validation analysis, regression diagnostics or goodness-of-fit measures;measures which authenticate the robustness of the LR model. Here, we demonstrate a good example of the application of the LR model using data obtained on a cohort of pregnant women and the factors that influence their decision to opt for caesarean delivery or vaginal birth. It is recommended that researchers should be more rigorous and pay greater attention to guidelines concerning the use and reporting of LR models.
基金funded by Asia-Pacific Forests Net(APFNET/2010/FPF/001)National Natural Science Foundation of China(Grant No.31400552)Forestry industry research special funds for public welfare projects(201404402)
文摘The purpose of this study was to determine a suitable model for investigating the effects of climate factors on the area burned by forest fire in the Tahe forest region, Daxing'an Mountains, in northeast China. The response variables were the area burned by lightning- caused fire, human-caused fire, and total burned area. The predictor variables were nine climate variables collected from the local weather station. Three regression models were utilized, including multiple linear regression, log- linear model (log-transformation on both response and predictor variables), and gamma-generalized linear model. The goodness-of-fit of the models were compared based on model fitting statistics such as R2, AIC, and RMSE. The results revealed that the gamma-generalized linear model was generally superior to both multiple linear regressionmodel and log-linear model for fitting the fire data. Further, the best models were selected based on the criteria that the climate variables were statistically significant at at = 0.05. The gamma best models indicated that maximum wind speed, precipitation, and days that rainfall greater than 0.1 mm had significant impacts on the area burned by the lightning-caused fire, while the mean temperature and minimum relative humidity were the .main drivers of the burned area caused by human activities. Overall, the total burned area by forest fire was significantly influenced by days that rainfall greater than 0.1 mm and minimum rela- tive humidity, indicating that the moisture condition of forest stands determine the burned area by forest fire.
文摘In this paper we consider the empirical Bayes (EB) estimation problem for estimable function of regression coefficient in a multiple linear regression model Y=Xβ+e. where e with given β has a multivariate standard normal distribution. We get the EB estimators by using kernel estimation of multivariate density function and its first order partial derivatives. It is shown that the convergence rates of the EB estimators are under the condition where an integer k > 1 . is an arbitrary small number and m is the dimension of the vector Y.
文摘This paper transforms fuzzy number into clear number using the centroid method, thus we can research the traditional linear regression model which is transformed from the fuzzy linear regression model. The model’s input and output are fuzzy numbers, and the regression coefficients are clear numbers. This paper considers the parameter estimation and impact analysis based on data deletion. Through the study of example and comparison with other models, it can be concluded that the model in this paper is applied easily and better.
基金Supported by project of National Natural Science Foundation of China(No.41272360)
文摘The data on the coal production and consumption in Jilin Province for the last ten years were collected,and the Grey System GM( 1,1) model and unary linear regression model were applied to predict the coal consumption of Jilin Production in 2014 and 2015. Through calculation,the predictive value on the coal consumption of Jilin Province was attained,namely consumption of 2014 is 114. 84 × 106 t and of 2015 is 117. 98 ×106t,respectively. Analysis of error data indicated that the predicted accuracy of Grey System GM( 1,1) model on the coal consumption in Jilin Province improved 0. 21% in comparison to unary linear regression model.
文摘A linear regression model in conjunction with cluster analysis was applied to the groundwater quality parameters for the Vaniyambadi industrial area, Tamil Nadu, India. These physico-chemical parameters were collected from 25 wells by intensive groundwater sampling conducted during January 2010. All the major ions, pH and electrical conductivity were analyzed. The abundances of cations were in the order of Na <Ca <Mg <K and those of anions were in the order of Cl <HCO3 <SO4 <CO3, respectively. This was in agreement with the water types, Na-Cl and Na-Ca-HCO3, determined by the Piper plot. High concentrations of the ions Na, Cl and SO4 were recorded near the tanneries that operate within the study area. While the elevated concentrations of HCO3 and F were observed away from the tanneries. This peculiar hydrochemical behaviour suggests that the chemistry of water is predominantly influenced by tannery effluents and weathering of silicate minerals. Results of the linear regression model yielded 11 regression equations for the 5 most correlated parameters. A dendrogram from the cluster analysis showed 2 major clusters representing the influence of tanneries and geological formations in the study area, which confirmed the results of major ion chemistry.
基金Under the auspices of National Natural Science Foundation of China (No. 50809004)
文摘Taking the nonlinear nature of runoff system into account,and combining auto-regression method and multi-regression method,a Nonlinear Mixed Regression Model (NMR) was established to analyze the impact of temperature and precipitation changes on annual river runoff process. The model was calibrated and verified by using BP neural network with observed meteorological and runoff data from Daiying Hydrological Station in the Chaohe River of Hebei Province in 1956–2000. Compared with auto-regression model,linear multi-regression model and linear mixed regression model,NMR can improve forecasting precision remarkably. Therefore,the simulation of climate change scenarios was carried out by NMR. The results show that the nonlinear mixed regression model can simulate annual river runoff well.
基金Supported by the National Natural Science Foundation of China(11901236)the Scientific Research Fund of Hunan Provincial Science and Technology Department(2019JJ50479)+3 种基金the Scientific Research Fund of Hunan Provincial Education Department(18B322)the Winning Bid Project of Hunan Province for the 4th National Economic Census([2020]1)the Young Core Teacher Foundation of Hunan Province([2020]43)the Funda-mental Research Fund of Xiangxi Autonomous Prefecture(2018SF5026)。
文摘Cost effective sampling design is a major concern in some experiments especially when the measurement of the characteristic of interest is costly or painful or time consuming.Ranked set sampling(RSS)was first proposed by McIntyre[1952.A method for unbiased selective sampling,using ranked sets.Australian Journal of Agricultural Research 3,385-390]as an effective way to estimate the pasture mean.In the current paper,a modification of ranked set sampling called moving extremes ranked set sampling(MERSS)is considered for the best linear unbiased estimators(BLUEs)for the simple linear regression model.The BLUEs for this model under MERSS are derived.The BLUEs under MERSS are shown to be markedly more efficient for normal data when compared with the BLUEs under simple random sampling.
文摘In this paper, we investigate the empirical likelihood diagnosis of modal linear regression models. The empirical likelihood ratio function based on modal regression estimation method for the regression coefficient is introduced. First, the estimation equation based on empirical likelihood method is established. Then, some diagnostic statistics are proposed. At last, we also examine the performance of proposed method for finite sample sizes through simulation study.
文摘In this paper, the frequency of an earthquake occurrence and magnitude relationship has been modeled with generalized linear models for the set of earthquake data of Nepal. A goodness of fit of a statistical model is applied for generalized linear models and considering the model selection information criterion, Akaike information criterion and Bayesian information criterion, generalized Poisson regression model has been selected as a suitable model for the study. The objective of this study is to determine the parameters (a and b values), estimate the probability of an earthquake occurrence and its return period using a Poisson regression model and compared with the Gutenberg-Richter model. The study suggests that the probabilities of earthquake occurrences and return periods estimated by both the models are relatively close to each other. The return periods from the generalized Poisson regression model are comparatively smaller than the Gutenberg-Richter model.
文摘The development of many estimators of parameters of linear regression model is traceable to non-validity of the assumptions under which the model is formulated, especially when applied to real life situation. This notwithstanding, regression analysis may aim at prediction. Consequently, this paper examines the performances of the Ordinary Least Square (OLS) estimator, Cochrane-Orcutt (COR) estimator, Maximum Likelihood (ML) estimator and the estimators based on Principal Component (PC) analysis in prediction of linear regression model under the joint violations of the assumption of non-stochastic regressors, independent regressors and error terms. With correlated stochastic normal variables as regressors and autocorrelated error terms, Monte-Carlo experiments were conducted and the study further identifies the best estimator that can be used for prediction purpose by adopting the goodness of fit statistics of the estimators. From the results, it is observed that the performances of COR at each level of correlation (multicollinearity) and that of ML, especially when the sample size is large, over the levels of autocorrelation have a convex-like pattern while that of OLS and PC are concave-like. Also, as the levels of multicollinearity increase, the estimators, except the PC estimators when multicollinearity is negative, rapidly perform better over the levels autocorrelation. The COR and ML estimators are generally best for prediction in the presence of multicollinearity and autocorrelated error terms. However, at low levels of autocorrelation, the OLS estimator is either best or competes consistently with the best estimator, while the PC estimator is either best or competes with the best when multicollinearity level is high(λ>0.8 or λ-0.49).
文摘Piecewise linear regression models are very flexible models for modeling the data. If the piecewise linear regression models are matched against the data, then the parameters are generally not known. This paper studies the problem of parameter estimation ofpiecewise linear regression models. The method used to estimate the parameters ofpicewise linear regression models is Bayesian method. But the Bayes estimator can not be found analytically. To overcome these problems, the reversible jump MCMC (Marcov Chain Monte Carlo) algorithm is proposed. Reversible jump MCMC algorithm generates the Markov chain converges to the limit distribution of the posterior distribution of the parameters ofpicewise linear regression models. The resulting Markov chain is used to calculate the Bayes estimator for the parameters of picewise linear regression models.
基金Thank you for your valuable comments and suggestions.This research was supported by Yunnan applied basic research project(NO.2017FD150)Chuxiong Normal University General Research Project(NO.XJYB2001).
文摘This paper selects seven indicators of financial revenue and housing sales price in recent 19 years in China,and uses SPSS and Excel to carry out descriptive statistics,independent sample t-test,correlation analysis and regression analysis to comprehensively study the correlation between financial revenue and housing sales price in China,and establishes the relationship between financial revenue and housing sales price When the average selling price of commercial housing increases by one unit,the fiscal revenue will increase by 27.855 points.