In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically ind...In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.展开更多
The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by tradit...The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by traditional spectrophotometric methods.In this paper,the partial least-squares(PLS)regression is applied to the simultaneous determination of these compounds in mixtures by UV spectrophtometry without any pretreatment of the samples.Ten synthetic mixture samples are analyzed by the proposed method.The mean recoveries are 99.4%,996%,100.2%,99.3% and 99.1%,and the relative standard deviations(RSD) are 1.87%,1.98%,1.94%,0.960% and 0.672%,respectively.展开更多
Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of hea...Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2 500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respec-tively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demon-strates that it is feasible to estimate the disease severity of rice brown spot using hyperspectral reflectance data at the leaf level.展开更多
In this paper, we proposed an iterative reweighted l1?penalty regression approach to solve the line spectral estimation problem. In each iteration process, we first use the ideal of Bayesian lasso to update the sparse...In this paper, we proposed an iterative reweighted l1?penalty regression approach to solve the line spectral estimation problem. In each iteration process, we first use the ideal of Bayesian lasso to update the sparse vectors;the derivative of the penalty function forms the regularization parameter. We choose the anti-trigonometric function as a penalty function to approximate the?l0? norm. Then we use the gradient descent method to update the dictionary parameters. The theoretical analysis and simulation results demonstrate the effectiveness of the method and show that the proposed algorithm outperforms other state-of-the-art methods for many practical cases.展开更多
In this paper, auxiliary information is used to determine an estimator of finite population total using nonparametric regression under stratified random sampling. To achieve this, a model-based approach is adopted by ...In this paper, auxiliary information is used to determine an estimator of finite population total using nonparametric regression under stratified random sampling. To achieve this, a model-based approach is adopted by making use of the local polynomial regression estimation to predict the nonsampled values of the survey variable y. The performance of the proposed estimator is investigated against some design-based and model-based regression estimators. The simulation experiments show that the resulting estimator exhibits good properties. Generally, good confidence intervals are seen for the nonparametric regression estimators, and use of the proposed estimator leads to relatively smaller values of RE compared to other estimators.展开更多
The field experiment designs with single replication were frequently used for factorial experiments in which the numbers of field plots were limited, but the experimental error was difficult to be estimated. To study ...The field experiment designs with single replication were frequently used for factorial experiments in which the numbers of field plots were limited, but the experimental error was difficult to be estimated. To study a new statistical method for improving precision of regression analysis of such experiments in rice, 84 fertilizer experiments were conducted in 15 provinces of China, including Zhejiang, Jiangsu, Anhui, Hunan, Sichuan, Heilongjiang, etc. Three factors with 14 treatments (N: 0—225kg/ha, P: 0 —112. 5kg/ha, K: 0—150kg/ha) and two replications were employed using approaching optimun design. There were 2352 (84×14×2=2352) Yield deviations (d) between the individual treatment yields and its arithmetic mean. The results indicated that:展开更多
To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to...To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to the pH value and levels of Ca2+,NH4+,Na+,K+,Mg2+,SO42-,NO3-,and Cl-in acid rain. We selected vegetables which were sensitive to acid rain as the sample crops,and collected 12 groups of data,of which 8 groups were used for modeling and 4 groups for testing. Using the cross validation method to evaluate the performace of this prediction model indicates that the optimum number of principal components was 3,determined by the minimum of prediction residual error sum of squares,and the prediction error of the regression equation ranges from -2.25% to 4.32%. The model predicted that the economic loss of vegetables from acid rain is negatively corrrelated to pH and the concentrations of NH4+,SO42-,NO3-,and Cl-in the rain,and positively correlated to the concentrations of Ca2+,Na+,K+ and Mg2+. The precision of the model may be improved if the non-linearity of original data is addressed.展开更多
When the population, from which the samples are extracted, is not normally distributed, or if the sample size is particularly reduced, become preferable the use of not parametric statistic test. An alternative to the ...When the population, from which the samples are extracted, is not normally distributed, or if the sample size is particularly reduced, become preferable the use of not parametric statistic test. An alternative to the normal model is the permutation or randomization model. The permutation model is nonparametric because no formal assumptions are made about the population parameters of the reference distribution, i.e., the distribution to which an obtained result is compared to determine its probability when the null hypothesis is true. Typically the reference distribution is a sampling distribution for parametric tests and a permutation distribution for many nonparametric tests. Within the regression models, it is possible to use the permutation tests, considering their ownerships of optimality, especially in the multivariate context and the normal distribution of the response variables is not guaranteed. In the literature there are numerous permutation tests applicable to the estimation of the regression models. The purpose of this study is to examine different kinds of permutation tests applied to linear models, focused our attention on the specific test statistic on which they are based. In this paper we focused our attention on permutation test of the independent variables, proposed by Oja, and other methods to effect the inference in non parametric way, in a regression model. Moreover, we show the recent advances in this context and try to compare them.展开更多
The indicator kriging (IK) is one of the most efficient nonparametric methods in geo-statistics. The order relation problem in the conditional cumulative distribution values obtained by IK is the most severe drawbac...The indicator kriging (IK) is one of the most efficient nonparametric methods in geo-statistics. The order relation problem in the conditional cumulative distribution values obtained by IK is the most severe drawback of it. The correction of order relation deviations is an essential and important part of IK approach. A monotone regression was proposed as a new correction method which could minimize the deviation from original quintiles value, although, ensuring all order relations.展开更多
Challenges in Big Data analysis arise due to the way the data are recorded, maintained, processed and stored. We demonstrate that a hierarchical, multivariate, statistical machine learning algorithm, namely Boosted Re...Challenges in Big Data analysis arise due to the way the data are recorded, maintained, processed and stored. We demonstrate that a hierarchical, multivariate, statistical machine learning algorithm, namely Boosted Regression Tree (BRT) can address Big Data challenges to drive decision making. The challenge of this study is lack of interoperability since the data, a collection of GIS shapefiles, remotely sensed imagery, and aggregated and interpolated spatio-temporal information, are stored in monolithic hardware components. For the modelling process, it was necessary to create one common input file. By merging the data sources together, a structured but noisy input file, showing inconsistencies and redundancies, was created. Here, it is shown that BRT can process different data granularities, heterogeneous data and missingness. In particular, BRT has the advantage of dealing with missing data by default by allowing a split on whether or not a value is missing as well as what the value is. Most importantly, the BRT offers a wide range of possibilities regarding the interpretation of results and variable selection is automatically performed by considering how frequently a variable is used to define a split in the tree. A comparison with two similar regression models (Random Forests and Least Absolute Shrinkage and Selection Operator, LASSO) shows that BRT outperforms these in this instance. BRT can also be a starting point for sophisticated hierarchical modelling in real world scenarios. For example, a single or ensemble approach of BRT could be tested with existing models in order to improve results for a wide range of data-driven decisions and applications.展开更多
Pedestrian safety has recently been considered as one of the most serious issues in the research of traffic safety. This study aims at analyzing the spatial correlation between the frequency of pedestrian crashes and ...Pedestrian safety has recently been considered as one of the most serious issues in the research of traffic safety. This study aims at analyzing the spatial correlation between the frequency of pedestrian crashes and various predictor variables based on open source point-of-interest (POI) data which can provide specific land use features and user characteristics. Spatial regression models were developed at Traffic Analysis Zone (TAZ) level using 10,333 pedestrian crash records within the Fifth Ring of Beijing in 2015. Several spatial econometrics approaches were used to examine the spatial autocorrelation in crash count per TAZ, and the spatial heterogeneity was investigated by a geographically weighted regression model. The results showed that spatial error model performed better than other two spatial models and a traditional ordinary least squares model. Specifically, bus stops, hospitals, pharmacies, restaurants, and office buildings had positive impacts on pedestrian crashes, while hotels were negatively associated with the occurrence of pedestrian crashes. In addition, it was proven that there was a significant sign of localization effects for different POIs. Depending on these findings, lots of recommendations and countermeasures can be proposed to better improve the traffic safety for pedestrians.展开更多
Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed dat...Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed data estimates the relative effect, whereas it is often the absolute effect of a predictor that is of interest. We propose a maximum likelihood (ML)-based approach to estimate a linear regression model on log-normal, heteroscedastic data. The new method was evaluated with a large simulation study. Log-normal observations were generated according to the simulation models and parameters were estimated using the new ML method, ordinary least-squares regression (LS) and weighed least-squares regression (WLS). All three methods produced unbiased estimates of parameters and expected response, and ML and WLS yielded smaller standard errors than LS. The approximate normality of the Wald statistic, used for tests of the ML estimates, in most situations produced correct type I error risk. Only ML and WLS produced correct confidence intervals for the estimated expected value. ML had the highest power for tests regarding β1.展开更多
The pricing of moving window Asian option with an early exercise feature is considered a challenging problem in option pricing. The computational challenge lies in the unknown optimal exercise strategy and in the high...The pricing of moving window Asian option with an early exercise feature is considered a challenging problem in option pricing. The computational challenge lies in the unknown optimal exercise strategy and in the high dimensionality required for approximating the early exercise boundary. We use sparse grid basis functions in the Least Squares Monte Carlo approach to solve this “curse of dimensionality” problem. The resulting algorithm provides a general and convergent method for pricing moving window Asian options. The sparse grid technique presented in this paper can be generalized to pricing other high-dimensional, early-exercisable derivatives.展开更多
Fuzzy regression provides more approaches for us to deal with imprecise or vague problems. Traditional fuzzy regression is established on triangular fuzzy numbers, which can be represented by trapezoidal numbers. The ...Fuzzy regression provides more approaches for us to deal with imprecise or vague problems. Traditional fuzzy regression is established on triangular fuzzy numbers, which can be represented by trapezoidal numbers. The independent variables, coefficients of independent variables and dependent variable in the regression model are fuzzy numbers in different times and TW, the shape preserving operator, is the only T-norm which induces a shape preserving multiplication of LL-type of fuzzy numbers. So, in this paper, we propose a new fuzzy regression model based on LL-type of trapezoidal fuzzy numbers and TW. Firstly, we introduce the basic fuzzy set theories, the basic arithmetic propositions of the shape preserving operator and a new distance measure between trapezoidal numbers. Secondly, we investigate the specific model algorithms for FIFCFO model (fuzzy input-fuzzy coefficient-fuzzy output model) and introduce three advantages of fit criteria, Error Index, Similarity Measure and Distance Criterion. Thirdly, we use a design set and two reference sets to make a comparison between our proposed model and the reference models and determine their goodness with the above three criteria. Finally, we draw the conclusion that our proposed model is reasonable and has better prediction accuracy, but short of robust, comparing to the reference models by the three goodness of fit criteria. So, we can expand our traditional fuzzy regression model to our proposed new model.展开更多
We construct a fuzzy varying coefficient bilinear regression model to deal with the interval financial data and then adopt the least-squares method based on symmetric fuzzy number space. Firstly, we propose a varying ...We construct a fuzzy varying coefficient bilinear regression model to deal with the interval financial data and then adopt the least-squares method based on symmetric fuzzy number space. Firstly, we propose a varying coefficient model on the basis of the fuzzy bilinear regression model. Secondly, we develop the least-squares method according to the complete distance between fuzzy numbers to estimate the coefficients and test the adaptability of the proposed model by means of generalized likelihood ratio test with SSE composite index. Finally, mean square errors and mean absolutely errors are employed to evaluate and compare the fitting of fuzzy auto regression, fuzzy bilinear regression and fuzzy varying coefficient bilinear regression models, and also the forecasting of three models. Empirical analysis turns out that the proposed model has good fitting and forecasting accuracy with regard to other regression models for the capital market.展开更多
Government credibility is an important asset of contemporary national governance, an important criterion for evaluating government legitimacy, and a key factor in measuring the effectiveness of government governance. ...Government credibility is an important asset of contemporary national governance, an important criterion for evaluating government legitimacy, and a key factor in measuring the effectiveness of government governance. In recent years, researchers’ research on government credibility has mostly focused on exploring theories and mechanisms, with little empirical research on this topic. This article intends to apply variable selection models in the field of social statistics to the issue of government credibility, in order to achieve empirical research on government credibility and explore its core influencing factors from a statistical perspective. Specifically, this article intends to use four regression-analysis-based methods and three random-forest-based methods to study the influencing factors of government credibility in various provinces in China, and compare the performance of these seven variable selection methods in different dimensions. The research results show that there are certain differences in simplicity, accuracy, and variable importance ranking among different variable selection methods, which present different importance in the study of government credibility issues. This study provides a methodological reference for variable selection models in the field of social science research, and also offers a multidimensional comparative perspective for analyzing the influencing factors of government credibility.展开更多
基金National Natural Science Foundation of China No.40301038
文摘In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.
文摘The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by traditional spectrophotometric methods.In this paper,the partial least-squares(PLS)regression is applied to the simultaneous determination of these compounds in mixtures by UV spectrophtometry without any pretreatment of the samples.Ten synthetic mixture samples are analyzed by the proposed method.The mean recoveries are 99.4%,996%,100.2%,99.3% and 99.1%,and the relative standard deviations(RSD) are 1.87%,1.98%,1.94%,0.960% and 0.672%,respectively.
基金the Hi-Tech Research and Development Program (863) of China (No. 2006AA10Z203)the National Scienceand Technology Task Force Project (No. 2006BAD10A01), China
文摘Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2 500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respec-tively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demon-strates that it is feasible to estimate the disease severity of rice brown spot using hyperspectral reflectance data at the leaf level.
文摘In this paper, we proposed an iterative reweighted l1?penalty regression approach to solve the line spectral estimation problem. In each iteration process, we first use the ideal of Bayesian lasso to update the sparse vectors;the derivative of the penalty function forms the regularization parameter. We choose the anti-trigonometric function as a penalty function to approximate the?l0? norm. Then we use the gradient descent method to update the dictionary parameters. The theoretical analysis and simulation results demonstrate the effectiveness of the method and show that the proposed algorithm outperforms other state-of-the-art methods for many practical cases.
文摘In this paper, auxiliary information is used to determine an estimator of finite population total using nonparametric regression under stratified random sampling. To achieve this, a model-based approach is adopted by making use of the local polynomial regression estimation to predict the nonsampled values of the survey variable y. The performance of the proposed estimator is investigated against some design-based and model-based regression estimators. The simulation experiments show that the resulting estimator exhibits good properties. Generally, good confidence intervals are seen for the nonparametric regression estimators, and use of the proposed estimator leads to relatively smaller values of RE compared to other estimators.
文摘The field experiment designs with single replication were frequently used for factorial experiments in which the numbers of field plots were limited, but the experimental error was difficult to be estimated. To study a new statistical method for improving precision of regression analysis of such experiments in rice, 84 fertilizer experiments were conducted in 15 provinces of China, including Zhejiang, Jiangsu, Anhui, Hunan, Sichuan, Heilongjiang, etc. Three factors with 14 treatments (N: 0—225kg/ha, P: 0 —112. 5kg/ha, K: 0—150kg/ha) and two replications were employed using approaching optimun design. There were 2352 (84×14×2=2352) Yield deviations (d) between the individual treatment yields and its arithmetic mean. The results indicated that:
基金Funded by the Natural Basic Research Program of China under the grant No. 2005CB422207.
文摘To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to the pH value and levels of Ca2+,NH4+,Na+,K+,Mg2+,SO42-,NO3-,and Cl-in acid rain. We selected vegetables which were sensitive to acid rain as the sample crops,and collected 12 groups of data,of which 8 groups were used for modeling and 4 groups for testing. Using the cross validation method to evaluate the performace of this prediction model indicates that the optimum number of principal components was 3,determined by the minimum of prediction residual error sum of squares,and the prediction error of the regression equation ranges from -2.25% to 4.32%. The model predicted that the economic loss of vegetables from acid rain is negatively corrrelated to pH and the concentrations of NH4+,SO42-,NO3-,and Cl-in the rain,and positively correlated to the concentrations of Ca2+,Na+,K+ and Mg2+. The precision of the model may be improved if the non-linearity of original data is addressed.
文摘When the population, from which the samples are extracted, is not normally distributed, or if the sample size is particularly reduced, become preferable the use of not parametric statistic test. An alternative to the normal model is the permutation or randomization model. The permutation model is nonparametric because no formal assumptions are made about the population parameters of the reference distribution, i.e., the distribution to which an obtained result is compared to determine its probability when the null hypothesis is true. Typically the reference distribution is a sampling distribution for parametric tests and a permutation distribution for many nonparametric tests. Within the regression models, it is possible to use the permutation tests, considering their ownerships of optimality, especially in the multivariate context and the normal distribution of the response variables is not guaranteed. In the literature there are numerous permutation tests applicable to the estimation of the regression models. The purpose of this study is to examine different kinds of permutation tests applied to linear models, focused our attention on the specific test statistic on which they are based. In this paper we focused our attention on permutation test of the independent variables, proposed by Oja, and other methods to effect the inference in non parametric way, in a regression model. Moreover, we show the recent advances in this context and try to compare them.
基金the National Natural Science Foundation of China (No. 10571074).
文摘The indicator kriging (IK) is one of the most efficient nonparametric methods in geo-statistics. The order relation problem in the conditional cumulative distribution values obtained by IK is the most severe drawback of it. The correction of order relation deviations is an essential and important part of IK approach. A monotone regression was proposed as a new correction method which could minimize the deviation from original quintiles value, although, ensuring all order relations.
文摘Challenges in Big Data analysis arise due to the way the data are recorded, maintained, processed and stored. We demonstrate that a hierarchical, multivariate, statistical machine learning algorithm, namely Boosted Regression Tree (BRT) can address Big Data challenges to drive decision making. The challenge of this study is lack of interoperability since the data, a collection of GIS shapefiles, remotely sensed imagery, and aggregated and interpolated spatio-temporal information, are stored in monolithic hardware components. For the modelling process, it was necessary to create one common input file. By merging the data sources together, a structured but noisy input file, showing inconsistencies and redundancies, was created. Here, it is shown that BRT can process different data granularities, heterogeneous data and missingness. In particular, BRT has the advantage of dealing with missing data by default by allowing a split on whether or not a value is missing as well as what the value is. Most importantly, the BRT offers a wide range of possibilities regarding the interpretation of results and variable selection is automatically performed by considering how frequently a variable is used to define a split in the tree. A comparison with two similar regression models (Random Forests and Least Absolute Shrinkage and Selection Operator, LASSO) shows that BRT outperforms these in this instance. BRT can also be a starting point for sophisticated hierarchical modelling in real world scenarios. For example, a single or ensemble approach of BRT could be tested with existing models in order to improve results for a wide range of data-driven decisions and applications.
文摘Pedestrian safety has recently been considered as one of the most serious issues in the research of traffic safety. This study aims at analyzing the spatial correlation between the frequency of pedestrian crashes and various predictor variables based on open source point-of-interest (POI) data which can provide specific land use features and user characteristics. Spatial regression models were developed at Traffic Analysis Zone (TAZ) level using 10,333 pedestrian crash records within the Fifth Ring of Beijing in 2015. Several spatial econometrics approaches were used to examine the spatial autocorrelation in crash count per TAZ, and the spatial heterogeneity was investigated by a geographically weighted regression model. The results showed that spatial error model performed better than other two spatial models and a traditional ordinary least squares model. Specifically, bus stops, hospitals, pharmacies, restaurants, and office buildings had positive impacts on pedestrian crashes, while hotels were negatively associated with the occurrence of pedestrian crashes. In addition, it was proven that there was a significant sign of localization effects for different POIs. Depending on these findings, lots of recommendations and countermeasures can be proposed to better improve the traffic safety for pedestrians.
文摘Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed data estimates the relative effect, whereas it is often the absolute effect of a predictor that is of interest. We propose a maximum likelihood (ML)-based approach to estimate a linear regression model on log-normal, heteroscedastic data. The new method was evaluated with a large simulation study. Log-normal observations were generated according to the simulation models and parameters were estimated using the new ML method, ordinary least-squares regression (LS) and weighed least-squares regression (WLS). All three methods produced unbiased estimates of parameters and expected response, and ML and WLS yielded smaller standard errors than LS. The approximate normality of the Wald statistic, used for tests of the ML estimates, in most situations produced correct type I error risk. Only ML and WLS produced correct confidence intervals for the estimated expected value. ML had the highest power for tests regarding β1.
文摘The pricing of moving window Asian option with an early exercise feature is considered a challenging problem in option pricing. The computational challenge lies in the unknown optimal exercise strategy and in the high dimensionality required for approximating the early exercise boundary. We use sparse grid basis functions in the Least Squares Monte Carlo approach to solve this “curse of dimensionality” problem. The resulting algorithm provides a general and convergent method for pricing moving window Asian options. The sparse grid technique presented in this paper can be generalized to pricing other high-dimensional, early-exercisable derivatives.
文摘Fuzzy regression provides more approaches for us to deal with imprecise or vague problems. Traditional fuzzy regression is established on triangular fuzzy numbers, which can be represented by trapezoidal numbers. The independent variables, coefficients of independent variables and dependent variable in the regression model are fuzzy numbers in different times and TW, the shape preserving operator, is the only T-norm which induces a shape preserving multiplication of LL-type of fuzzy numbers. So, in this paper, we propose a new fuzzy regression model based on LL-type of trapezoidal fuzzy numbers and TW. Firstly, we introduce the basic fuzzy set theories, the basic arithmetic propositions of the shape preserving operator and a new distance measure between trapezoidal numbers. Secondly, we investigate the specific model algorithms for FIFCFO model (fuzzy input-fuzzy coefficient-fuzzy output model) and introduce three advantages of fit criteria, Error Index, Similarity Measure and Distance Criterion. Thirdly, we use a design set and two reference sets to make a comparison between our proposed model and the reference models and determine their goodness with the above three criteria. Finally, we draw the conclusion that our proposed model is reasonable and has better prediction accuracy, but short of robust, comparing to the reference models by the three goodness of fit criteria. So, we can expand our traditional fuzzy regression model to our proposed new model.
文摘We construct a fuzzy varying coefficient bilinear regression model to deal with the interval financial data and then adopt the least-squares method based on symmetric fuzzy number space. Firstly, we propose a varying coefficient model on the basis of the fuzzy bilinear regression model. Secondly, we develop the least-squares method according to the complete distance between fuzzy numbers to estimate the coefficients and test the adaptability of the proposed model by means of generalized likelihood ratio test with SSE composite index. Finally, mean square errors and mean absolutely errors are employed to evaluate and compare the fitting of fuzzy auto regression, fuzzy bilinear regression and fuzzy varying coefficient bilinear regression models, and also the forecasting of three models. Empirical analysis turns out that the proposed model has good fitting and forecasting accuracy with regard to other regression models for the capital market.
文摘Government credibility is an important asset of contemporary national governance, an important criterion for evaluating government legitimacy, and a key factor in measuring the effectiveness of government governance. In recent years, researchers’ research on government credibility has mostly focused on exploring theories and mechanisms, with little empirical research on this topic. This article intends to apply variable selection models in the field of social statistics to the issue of government credibility, in order to achieve empirical research on government credibility and explore its core influencing factors from a statistical perspective. Specifically, this article intends to use four regression-analysis-based methods and three random-forest-based methods to study the influencing factors of government credibility in various provinces in China, and compare the performance of these seven variable selection methods in different dimensions. The research results show that there are certain differences in simplicity, accuracy, and variable importance ranking among different variable selection methods, which present different importance in the study of government credibility issues. This study provides a methodological reference for variable selection models in the field of social science research, and also offers a multidimensional comparative perspective for analyzing the influencing factors of government credibility.