In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically ind...In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.展开更多
The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by tradit...The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by traditional spectrophotometric methods.In this paper,the partial least-squares(PLS)regression is applied to the simultaneous determination of these compounds in mixtures by UV spectrophtometry without any pretreatment of the samples.Ten synthetic mixture samples are analyzed by the proposed method.The mean recoveries are 99.4%,996%,100.2%,99.3% and 99.1%,and the relative standard deviations(RSD) are 1.87%,1.98%,1.94%,0.960% and 0.672%,respectively.展开更多
Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of hea...Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2 500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respec-tively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demon-strates that it is feasible to estimate the disease severity of rice brown spot using hyperspectral reflectance data at the leaf level.展开更多
To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to...To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to the pH value and levels of Ca2+,NH4+,Na+,K+,Mg2+,SO42-,NO3-,and Cl-in acid rain. We selected vegetables which were sensitive to acid rain as the sample crops,and collected 12 groups of data,of which 8 groups were used for modeling and 4 groups for testing. Using the cross validation method to evaluate the performace of this prediction model indicates that the optimum number of principal components was 3,determined by the minimum of prediction residual error sum of squares,and the prediction error of the regression equation ranges from -2.25% to 4.32%. The model predicted that the economic loss of vegetables from acid rain is negatively corrrelated to pH and the concentrations of NH4+,SO42-,NO3-,and Cl-in the rain,and positively correlated to the concentrations of Ca2+,Na+,K+ and Mg2+. The precision of the model may be improved if the non-linearity of original data is addressed.展开更多
Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed dat...Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed data estimates the relative effect, whereas it is often the absolute effect of a predictor that is of interest. We propose a maximum likelihood (ML)-based approach to estimate a linear regression model on log-normal, heteroscedastic data. The new method was evaluated with a large simulation study. Log-normal observations were generated according to the simulation models and parameters were estimated using the new ML method, ordinary least-squares regression (LS) and weighed least-squares regression (WLS). All three methods produced unbiased estimates of parameters and expected response, and ML and WLS yielded smaller standard errors than LS. The approximate normality of the Wald statistic, used for tests of the ML estimates, in most situations produced correct type I error risk. Only ML and WLS produced correct confidence intervals for the estimated expected value. ML had the highest power for tests regarding β1.展开更多
The pricing of moving window Asian option with an early exercise feature is considered a challenging problem in option pricing. The computational challenge lies in the unknown optimal exercise strategy and in the high...The pricing of moving window Asian option with an early exercise feature is considered a challenging problem in option pricing. The computational challenge lies in the unknown optimal exercise strategy and in the high dimensionality required for approximating the early exercise boundary. We use sparse grid basis functions in the Least Squares Monte Carlo approach to solve this “curse of dimensionality” problem. The resulting algorithm provides a general and convergent method for pricing moving window Asian options. The sparse grid technique presented in this paper can be generalized to pricing other high-dimensional, early-exercisable derivatives.展开更多
Fuzzy regression provides more approaches for us to deal with imprecise or vague problems. Traditional fuzzy regression is established on triangular fuzzy numbers, which can be represented by trapezoidal numbers. The ...Fuzzy regression provides more approaches for us to deal with imprecise or vague problems. Traditional fuzzy regression is established on triangular fuzzy numbers, which can be represented by trapezoidal numbers. The independent variables, coefficients of independent variables and dependent variable in the regression model are fuzzy numbers in different times and TW, the shape preserving operator, is the only T-norm which induces a shape preserving multiplication of LL-type of fuzzy numbers. So, in this paper, we propose a new fuzzy regression model based on LL-type of trapezoidal fuzzy numbers and TW. Firstly, we introduce the basic fuzzy set theories, the basic arithmetic propositions of the shape preserving operator and a new distance measure between trapezoidal numbers. Secondly, we investigate the specific model algorithms for FIFCFO model (fuzzy input-fuzzy coefficient-fuzzy output model) and introduce three advantages of fit criteria, Error Index, Similarity Measure and Distance Criterion. Thirdly, we use a design set and two reference sets to make a comparison between our proposed model and the reference models and determine their goodness with the above three criteria. Finally, we draw the conclusion that our proposed model is reasonable and has better prediction accuracy, but short of robust, comparing to the reference models by the three goodness of fit criteria. So, we can expand our traditional fuzzy regression model to our proposed new model.展开更多
We construct a fuzzy varying coefficient bilinear regression model to deal with the interval financial data and then adopt the least-squares method based on symmetric fuzzy number space. Firstly, we propose a varying ...We construct a fuzzy varying coefficient bilinear regression model to deal with the interval financial data and then adopt the least-squares method based on symmetric fuzzy number space. Firstly, we propose a varying coefficient model on the basis of the fuzzy bilinear regression model. Secondly, we develop the least-squares method according to the complete distance between fuzzy numbers to estimate the coefficients and test the adaptability of the proposed model by means of generalized likelihood ratio test with SSE composite index. Finally, mean square errors and mean absolutely errors are employed to evaluate and compare the fitting of fuzzy auto regression, fuzzy bilinear regression and fuzzy varying coefficient bilinear regression models, and also the forecasting of three models. Empirical analysis turns out that the proposed model has good fitting and forecasting accuracy with regard to other regression models for the capital market.展开更多
China has the largest apple planting area and total yield in the world, and the Fuji apple is the major cultivar, accounting for more than 70% of apple planting acreage in China. Apple qualities are affected by meteo...China has the largest apple planting area and total yield in the world, and the Fuji apple is the major cultivar, accounting for more than 70% of apple planting acreage in China. Apple qualities are affected by meteorological conditions, soil types, nutrient content of soil, and management practices. Meteorological factors, such as light, temperature and moisture are key environmental conditions affecting apple quality that are difficult to regulate and control. This study was performed to determine the effect of meteorological factors on the qualities of Fuji apple and to provide evidence for a reasonable regional layout and planting of Fuji apple in China. Fruit samples of Fuji apple and meteorological data were investigated from 153 commercial Fuji apple orchards located in 51 counties of 11 regions in China from 2010 to 2011. Partial least-squares regression and linear programming were used to analyze the effect model and impact weight of meteorological factors on fruit quality, to determine the major meteorological factors influencing fruit quality attributes, and to establish a regression equation to optimize meteorological factors for high-quality Fuji apples. Results showed relationships between fruit quality attributes and meteorological factors among the various apple producing counties in China. The mean, minimum, and maximum temperatures from April to October had the highest positive effects on fruit qualities in model effect loadings and weights, followed by the mean annual temperature and the sunshine percentage, the temperature difference between day and night, and the total precipitation for the same period. In contrast, annual total precipitation and relative humidity from April to October had negative effects on fruit quality. The meteorological factors exhibited distinct effects on the different fruit quality attributes. Soluble solid content was affected from the high to the low row preface by annual total precipitation, the minimum temperature from April to October, the mean temperature from April to October, the temperature difference between day and night, and the mean annual temperature. The regression equation showed that the optimum meteorological factors on fruit quality were the mean annual temperature of 5.5-18°C and the annual total precipitation of 602-1121 mm for the whole year, and the mean temperature of 13.3-19.6°C, the minimum temperature of 7.8-18.5°C, the maximum temperature of 19.5°C, the temperature difference of 13.7°C between day and night, the total precipitation of 227 mm, the relative humidity of 57.5-84.0%, and the sunshine percentage of 36.5-70.0% during the growing period (from April to October).展开更多
Unusually severe weather is occurring more frequently due to global climate change. Heat waves, rainstorms, snowstorms, and droughts are becoming increasingly common all over the world, threatening human lives and pro...Unusually severe weather is occurring more frequently due to global climate change. Heat waves, rainstorms, snowstorms, and droughts are becoming increasingly common all over the world, threatening human lives and property. Both temperature and precipitation are representative variables usually used to directly reflect and forecast the influences of climate change. In this study, daily data (from 1953 to 1995) and monthly data (from 1950 to 2010) of temperature and precipitation in five regions of the Amur River were examined. The significance of changes in temperature and precipitation was tested using the Mann-Kendall test method. The amplitudes were computed using the linear least-squares regression model, and the extreme temperature and precipitation were analyzed using hydrological statistical methods. The results show the following: the mean annual temperature increased significantly from 1950 to 2010 in the five regions, mainly due to the warming in spring and winter; the annual precipitation changed significantly from 1950 to 2010 only in the lower mainstream of the Amur River; the frequency of extremely low temperature events decreased from 1953 to 1995 in the mainstream of the Amur River; the frequency of high temperature events increased from 1953 to 1995 in the mainstream of the Amur River; and the frequency of extreme precipitation events did not change significantly from 1953 to 1995 in the mainstream of the Amur River. This study provides a valuable theoretical basis for settling disputes between China and Russia on sustainable development and utilization of water resources of the Amur River.展开更多
Field infiltration measurement is often a tedious task thus can be easily estimated from proposed infiltration models. The Horton equation is one of the popular models used in the characterization of field infiltratio...Field infiltration measurement is often a tedious task thus can be easily estimated from proposed infiltration models. The Horton equation is one of the popular models used in the characterization of field infiltration. In this study, the least square curve firing technique was employed to estimate the model parameters from fifteen field measured data and gave resultant mean regression coefficients (R2) value of 0.811. Furthermore, plotting the measured against the calculated infiltration rate for the first six (6) measurement points yielded R2 values close to unity in the regression curve indicating a marked relationship between the two. This indicates that the Horton infiltration model can be applied to estimate infiltration characteristics of soils in Samaru, Northern Guinea Savanna of Nigeria.展开更多
6 Atomic fragment types of organic compound have been defined, and the multilevel atom-pair frequency matrix has been constructed according to the occurrence number in pairs of atomic fragments with different bond len...6 Atomic fragment types of organic compound have been defined, and the multilevel atom-pair frequency matrix has been constructed according to the occurrence number in pairs of atomic fragments with different bond lengths in the molecule. On the basis of them, a novel molecular coding technique: characteristic atom-pair holographic code (CAHC), is obtained. To some extent, this method exhibits a large number of benefits at the same time. For example, it can calculate 2D molecular topological descriptor easily, operate without difficulty and possess definite physicochemical meaning of 3D molecular structural characterization methods, and may fetch the complicated information of molecule, etc. Therefore, it is appropriate for the study on quantitative structure-property/activity relationship (QSPR/QSAR) of medicines and biological molecules. We attempt in this paper to utilize the method of CAHC to the quantitative prediction of reversed-phase liquid chromatogram (RPLC) retention data of 33 purine derivatives and 24 steroids. The fitting multiple correlation coefficient R2, cross-validated multiple correlation coefficient Q2 and predicted ability Q^2 pred over test set's samples of obtained partial least-square (PLS) regression model are respectively 0.990, 0.893 and 0.977, 0.897, 0.941.展开更多
The determination of material formula needs try-and-error experiment,and consumes large amount of time and fund.In order to solve the problem,a comprehensive method is established,via the experiment of artificial-simi...The determination of material formula needs try-and-error experiment,and consumes large amount of time and fund.In order to solve the problem,a comprehensive method is established,via the experiment of artificial-similar material formula of a mine slope.We controlled the samples by the compactness,and arranged the formula of the test group with the method of the uniform formula experiment.The physical and mechanical parameters of these samples were analyzed using the method of the partial least-squares regression(PLS).And a mathematical model of the indexes of physical and mechanics parameters relating to the factors of formulation constituents was established eventually.We used the model to analyze the effect of each formulation constituent on physical and mechanics parameters of samples.The experiment results and analysis illustrates that1)in the formulation of similar material,the effect of raw materials on the internal friction angleφand cohesion C is opposite;2)The method can highly facilitate the process of the of preparing artificial-similar materials,more economic and effective.展开更多
Logistic regression is usually used to model probabilities of categorical responses as functions of covariates. However, the link connecting the probabilities to the covariates is non-linear. We show in this paper tha...Logistic regression is usually used to model probabilities of categorical responses as functions of covariates. However, the link connecting the probabilities to the covariates is non-linear. We show in this paper that when the cross-classification of all the covariates and the dependent variable have no empty cells, then the probabilities of responses can be expressed as linear functions of the covariates. We demonstrate this for both the dichotmous and polytomous dependent variables.展开更多
In this study, the simultaneous determination of verapamil hydrochloride and gliclazide in pharmaceuticals by chemometric approaches using UV spectrophotometry has been reported. Verapamil hydrochloride (VER) (Benzene...In this study, the simultaneous determination of verapamil hydrochloride and gliclazide in pharmaceuticals by chemometric approaches using UV spectrophotometry has been reported. Verapamil hydrochloride (VER) (Benzeneacetonitrile, α-[3-[[2-(3,4-dimethoxyphenyl) ethyl] methylamino]propyl]-3, 4-dimethoxy-α-(1-methylethyl) hydrochloride) is an L-type calcium channel blocker of the phenylalkylamine class. It has been used in the treatment of hypertension, angina pectoris, and cardiac arrhythmia. Gliclazide (GLZ) (1-(Hexahydrocyclopenta[c]pyrrol-2(1H)-yl)-3-[(4-methylphenyl) sulphonyl]urea) is an oral hypoglycaemic (anti-diabetic) drug and is classified as a second generation sulfonylurea. Spectra of VER and GLZ were recorded at several concentrations within their linear ranges between wavelengths of 200 nm to 400 nm in 0.1N HCl. Partial least squares regression (PLS) and principle components regression (PCR) were used for chemometric analysis of data and the parameters of the chemometric procedures were optimized. The recoveries were satisfactory and statistically comparable. The method was successfully applied to pharmaceutical formulation, tablet, with no interference from excipients as indicated by the recovery study results. The proposed methods are simple, rapid and can be easily used in the quality control of drugs as alternative analysis tools.展开更多
In recent years,with rapid increases in the number of vehicles in China,the contribution of vehicle exhaust emissions to air pollution has become increasingly prominent.To achieve the precise control of emissions,on-r...In recent years,with rapid increases in the number of vehicles in China,the contribution of vehicle exhaust emissions to air pollution has become increasingly prominent.To achieve the precise control of emissions,on-road remote sensing(RS)technology has been developed and applied for law enforcement and supervision.However,data quality is still an existing issue affecting the development and application of RS.In this study,the RS data from a cross-road RS system used at a single site(from 2012 to 2015)were collected,the data screening process was reviewed,the issues with data quality were summarized,a new method of data screening and calibration was proposed,and the effectiveness of the improved data quality control methods was finally evaluated.The results showed that this method reduces the skewness and kurtosis of the data distribution by up to nearly 67%,which restores the actual characteristics of exhaust diffusion and is conducive to the identification of actual clean and high-emission vehicles.The annual variability of emission factors of nitric oxide decreases by 60%-on average-eliminating the annual drift of fleet emissions and improving data reliability.展开更多
Angstrom-Prescott equation(AP)is the algorithm recommended by the Food and Agriculture Organization(FAO)of the United Nations for calculating the surface solar radiation(R_(s))to support the estimation of crop evapotr...Angstrom-Prescott equation(AP)is the algorithm recommended by the Food and Agriculture Organization(FAO)of the United Nations for calculating the surface solar radiation(R_(s))to support the estimation of crop evapotranspiration.Thus,the a_(s) and b_(s) coefficients in the AP are vital.This study aims to obtain coefficients a_(s) and b_(s) in the AP,which are optimized for Chinas comprehensive agricultural divisions.The average monthly solar radiation and relative sunshine duration data at 121 stations from 1957-2016 were collected.Using data from 1957 to 2010,we calculated the monthly a_(s) and b_(s) coefficients for each subregion by least-squares regression.Then,taking the observation values of R_(s) from 2011 to 2016 as the true values,we estimated and compared the relative accuracy of R_(s) calculated using the regression values of coefficients a_(s) and b_(s) and that calculated with the FAO recommended coefficients.The monthly coefficients,a_(s) and b_(s),of each subregion are significantly different,both temporally and spatially,from the FAO recommended coefficients.The relative error range(0-54%)of R_(s) calculated via the regression values of the a_(s) and b_(s) coefficients is better than the relative error range(0-77%)of R_(s) calculated using the FAO suggested coefficients.The station-mean relative error was reduced by 1% to 6%.However,the regression values of the a_(s) and b_(s) coefficients performed worse in certain months and agricultural subregions during verification.Therefore,we selected the a_(s) and b_(s) coefficients with the minimum R_(s) estimation error as the final coefficients and constructed a coefficient recommendation table for 36 agricultural production and management subregions in China.These coefficient recommendations enrich the case study of coefficient calibration for the AP in China and can improve the accuracy of calculating R_(s) and crop evapotranspiration based on existing data.展开更多
Dry rubber content(DRC)is an important factor to be considered in evaluating the quality of cup lump rubber.The DRC analysis requires prolonged laboratory validation.To develop fast and effective DRC determination met...Dry rubber content(DRC)is an important factor to be considered in evaluating the quality of cup lump rubber.The DRC analysis requires prolonged laboratory validation.To develop fast and effective DRC determination methods,this study proposed methods to evaluate the DRC of cup lump rubber using different spectroscopic measurement approaches.This involved a complete fundamental analysis leading to an efficient measurement method based on either point-based measurement using NIR reflectance spectrometer or area-based measurement using hyperspectral imaging.A dataset was prepared that 120 samples were randomly divided into a calibration set of 90 samples and a validation set of 30 samples.To obtain an average spectrum to represent a cup lump rubber sample,the spectral data were collected by locating and scanning for point-based and area-based measurement,respectively.The spectral data were calibrated using partial least squares regression(PLSR)and the least-squares support vector machine(LS-SVM)methods against the reference values.The experiments showed that the area-based measurement approach with both algorithms performed outstandingly in predicting the DRC of cup lump rubber and was clearly better than the point-based measurement approach.The best predictions of PLSR represented by the coefficient of determination(R2),the root mean square error of prediction(RMSEP)and the residual predictive deviation(RPD)were 0.99,0.72%and 15.17,while the best prediction of LS-SVM were 0.99,0.64%and 16.83,respectively.In summary,the area-based measurement based on the LS-SVM prediction model provided a highly accurate estimate of the DRC of cup lump rubber.展开更多
基金National Natural Science Foundation of China No.40301038
文摘In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.
文摘The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by traditional spectrophotometric methods.In this paper,the partial least-squares(PLS)regression is applied to the simultaneous determination of these compounds in mixtures by UV spectrophtometry without any pretreatment of the samples.Ten synthetic mixture samples are analyzed by the proposed method.The mean recoveries are 99.4%,996%,100.2%,99.3% and 99.1%,and the relative standard deviations(RSD) are 1.87%,1.98%,1.94%,0.960% and 0.672%,respectively.
基金the Hi-Tech Research and Development Program (863) of China (No. 2006AA10Z203)the National Scienceand Technology Task Force Project (No. 2006BAD10A01), China
文摘Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2 500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respec-tively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demon-strates that it is feasible to estimate the disease severity of rice brown spot using hyperspectral reflectance data at the leaf level.
基金Funded by the Natural Basic Research Program of China under the grant No. 2005CB422207.
文摘To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to the pH value and levels of Ca2+,NH4+,Na+,K+,Mg2+,SO42-,NO3-,and Cl-in acid rain. We selected vegetables which were sensitive to acid rain as the sample crops,and collected 12 groups of data,of which 8 groups were used for modeling and 4 groups for testing. Using the cross validation method to evaluate the performace of this prediction model indicates that the optimum number of principal components was 3,determined by the minimum of prediction residual error sum of squares,and the prediction error of the regression equation ranges from -2.25% to 4.32%. The model predicted that the economic loss of vegetables from acid rain is negatively corrrelated to pH and the concentrations of NH4+,SO42-,NO3-,and Cl-in the rain,and positively correlated to the concentrations of Ca2+,Na+,K+ and Mg2+. The precision of the model may be improved if the non-linearity of original data is addressed.
文摘Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed data estimates the relative effect, whereas it is often the absolute effect of a predictor that is of interest. We propose a maximum likelihood (ML)-based approach to estimate a linear regression model on log-normal, heteroscedastic data. The new method was evaluated with a large simulation study. Log-normal observations were generated according to the simulation models and parameters were estimated using the new ML method, ordinary least-squares regression (LS) and weighed least-squares regression (WLS). All three methods produced unbiased estimates of parameters and expected response, and ML and WLS yielded smaller standard errors than LS. The approximate normality of the Wald statistic, used for tests of the ML estimates, in most situations produced correct type I error risk. Only ML and WLS produced correct confidence intervals for the estimated expected value. ML had the highest power for tests regarding β1.
文摘The pricing of moving window Asian option with an early exercise feature is considered a challenging problem in option pricing. The computational challenge lies in the unknown optimal exercise strategy and in the high dimensionality required for approximating the early exercise boundary. We use sparse grid basis functions in the Least Squares Monte Carlo approach to solve this “curse of dimensionality” problem. The resulting algorithm provides a general and convergent method for pricing moving window Asian options. The sparse grid technique presented in this paper can be generalized to pricing other high-dimensional, early-exercisable derivatives.
文摘Fuzzy regression provides more approaches for us to deal with imprecise or vague problems. Traditional fuzzy regression is established on triangular fuzzy numbers, which can be represented by trapezoidal numbers. The independent variables, coefficients of independent variables and dependent variable in the regression model are fuzzy numbers in different times and TW, the shape preserving operator, is the only T-norm which induces a shape preserving multiplication of LL-type of fuzzy numbers. So, in this paper, we propose a new fuzzy regression model based on LL-type of trapezoidal fuzzy numbers and TW. Firstly, we introduce the basic fuzzy set theories, the basic arithmetic propositions of the shape preserving operator and a new distance measure between trapezoidal numbers. Secondly, we investigate the specific model algorithms for FIFCFO model (fuzzy input-fuzzy coefficient-fuzzy output model) and introduce three advantages of fit criteria, Error Index, Similarity Measure and Distance Criterion. Thirdly, we use a design set and two reference sets to make a comparison between our proposed model and the reference models and determine their goodness with the above three criteria. Finally, we draw the conclusion that our proposed model is reasonable and has better prediction accuracy, but short of robust, comparing to the reference models by the three goodness of fit criteria. So, we can expand our traditional fuzzy regression model to our proposed new model.
文摘We construct a fuzzy varying coefficient bilinear regression model to deal with the interval financial data and then adopt the least-squares method based on symmetric fuzzy number space. Firstly, we propose a varying coefficient model on the basis of the fuzzy bilinear regression model. Secondly, we develop the least-squares method according to the complete distance between fuzzy numbers to estimate the coefficients and test the adaptability of the proposed model by means of generalized likelihood ratio test with SSE composite index. Finally, mean square errors and mean absolutely errors are employed to evaluate and compare the fitting of fuzzy auto regression, fuzzy bilinear regression and fuzzy varying coefficient bilinear regression models, and also the forecasting of three models. Empirical analysis turns out that the proposed model has good fitting and forecasting accuracy with regard to other regression models for the capital market.
基金supported by the Forest Scientific Research in the Public Interest,China(201404720)the earmarked fund for the China Agriculture Research System(CARS-27)the Beijing Municipal Education Commission,China(CEFF-PXM2017_014207_000043)
文摘China has the largest apple planting area and total yield in the world, and the Fuji apple is the major cultivar, accounting for more than 70% of apple planting acreage in China. Apple qualities are affected by meteorological conditions, soil types, nutrient content of soil, and management practices. Meteorological factors, such as light, temperature and moisture are key environmental conditions affecting apple quality that are difficult to regulate and control. This study was performed to determine the effect of meteorological factors on the qualities of Fuji apple and to provide evidence for a reasonable regional layout and planting of Fuji apple in China. Fruit samples of Fuji apple and meteorological data were investigated from 153 commercial Fuji apple orchards located in 51 counties of 11 regions in China from 2010 to 2011. Partial least-squares regression and linear programming were used to analyze the effect model and impact weight of meteorological factors on fruit quality, to determine the major meteorological factors influencing fruit quality attributes, and to establish a regression equation to optimize meteorological factors for high-quality Fuji apples. Results showed relationships between fruit quality attributes and meteorological factors among the various apple producing counties in China. The mean, minimum, and maximum temperatures from April to October had the highest positive effects on fruit qualities in model effect loadings and weights, followed by the mean annual temperature and the sunshine percentage, the temperature difference between day and night, and the total precipitation for the same period. In contrast, annual total precipitation and relative humidity from April to October had negative effects on fruit quality. The meteorological factors exhibited distinct effects on the different fruit quality attributes. Soluble solid content was affected from the high to the low row preface by annual total precipitation, the minimum temperature from April to October, the mean temperature from April to October, the temperature difference between day and night, and the mean annual temperature. The regression equation showed that the optimum meteorological factors on fruit quality were the mean annual temperature of 5.5-18°C and the annual total precipitation of 602-1121 mm for the whole year, and the mean temperature of 13.3-19.6°C, the minimum temperature of 7.8-18.5°C, the maximum temperature of 19.5°C, the temperature difference of 13.7°C between day and night, the total precipitation of 227 mm, the relative humidity of 57.5-84.0%, and the sunshine percentage of 36.5-70.0% during the growing period (from April to October).
基金supported by the Innovative Project of Scientific Research for Postgraduates in Ordinary Universities in Jiangsu Province (Grant No. CX09B_161Z)the Cultivation Project for Excellent Doctoral Dissertations in Hohai University+1 种基金the Fundamental Research Funds for the Central Universities (Grant No.2010B18714)Special Funds for Scientific Research on Public Causes of the Ministry of Water Resources of China (Grant No. 201001052)
文摘Unusually severe weather is occurring more frequently due to global climate change. Heat waves, rainstorms, snowstorms, and droughts are becoming increasingly common all over the world, threatening human lives and property. Both temperature and precipitation are representative variables usually used to directly reflect and forecast the influences of climate change. In this study, daily data (from 1953 to 1995) and monthly data (from 1950 to 2010) of temperature and precipitation in five regions of the Amur River were examined. The significance of changes in temperature and precipitation was tested using the Mann-Kendall test method. The amplitudes were computed using the linear least-squares regression model, and the extreme temperature and precipitation were analyzed using hydrological statistical methods. The results show the following: the mean annual temperature increased significantly from 1950 to 2010 in the five regions, mainly due to the warming in spring and winter; the annual precipitation changed significantly from 1950 to 2010 only in the lower mainstream of the Amur River; the frequency of extremely low temperature events decreased from 1953 to 1995 in the mainstream of the Amur River; the frequency of high temperature events increased from 1953 to 1995 in the mainstream of the Amur River; and the frequency of extreme precipitation events did not change significantly from 1953 to 1995 in the mainstream of the Amur River. This study provides a valuable theoretical basis for settling disputes between China and Russia on sustainable development and utilization of water resources of the Amur River.
文摘Field infiltration measurement is often a tedious task thus can be easily estimated from proposed infiltration models. The Horton equation is one of the popular models used in the characterization of field infiltration. In this study, the least square curve firing technique was employed to estimate the model parameters from fifteen field measured data and gave resultant mean regression coefficients (R2) value of 0.811. Furthermore, plotting the measured against the calculated infiltration rate for the first six (6) measurement points yielded R2 values close to unity in the regression curve indicating a marked relationship between the two. This indicates that the Horton infiltration model can be applied to estimate infiltration characteristics of soils in Samaru, Northern Guinea Savanna of Nigeria.
基金This work was supported by the State Key Laboratory of Chemo/Biosensing and Chemometrics Foundation (No. 05-12-1), Fok-Yingtung Educational Foundation (No. 98-7-6) and Chongqing University Innovation Foundation of Science and Technology ( No. 06-1-1)
文摘6 Atomic fragment types of organic compound have been defined, and the multilevel atom-pair frequency matrix has been constructed according to the occurrence number in pairs of atomic fragments with different bond lengths in the molecule. On the basis of them, a novel molecular coding technique: characteristic atom-pair holographic code (CAHC), is obtained. To some extent, this method exhibits a large number of benefits at the same time. For example, it can calculate 2D molecular topological descriptor easily, operate without difficulty and possess definite physicochemical meaning of 3D molecular structural characterization methods, and may fetch the complicated information of molecule, etc. Therefore, it is appropriate for the study on quantitative structure-property/activity relationship (QSPR/QSAR) of medicines and biological molecules. We attempt in this paper to utilize the method of CAHC to the quantitative prediction of reversed-phase liquid chromatogram (RPLC) retention data of 33 purine derivatives and 24 steroids. The fitting multiple correlation coefficient R2, cross-validated multiple correlation coefficient Q2 and predicted ability Q^2 pred over test set's samples of obtained partial least-square (PLS) regression model are respectively 0.990, 0.893 and 0.977, 0.897, 0.941.
基金Projects(41372312,51379194)supported by the National Natural Science Foundation of ChinaProject(CUGL140817)supported by the Fundamental Research Funds for the Central Universities of China University of Geosciences(Wuhan)+1 种基金Project(2014CFB894)supported by the Natural Science Foundation of Hubei Province of ChinaProject(2014M552113)supported by the China Postdoctoral Science Foundation
文摘The determination of material formula needs try-and-error experiment,and consumes large amount of time and fund.In order to solve the problem,a comprehensive method is established,via the experiment of artificial-similar material formula of a mine slope.We controlled the samples by the compactness,and arranged the formula of the test group with the method of the uniform formula experiment.The physical and mechanical parameters of these samples were analyzed using the method of the partial least-squares regression(PLS).And a mathematical model of the indexes of physical and mechanics parameters relating to the factors of formulation constituents was established eventually.We used the model to analyze the effect of each formulation constituent on physical and mechanics parameters of samples.The experiment results and analysis illustrates that1)in the formulation of similar material,the effect of raw materials on the internal friction angleφand cohesion C is opposite;2)The method can highly facilitate the process of the of preparing artificial-similar materials,more economic and effective.
文摘Logistic regression is usually used to model probabilities of categorical responses as functions of covariates. However, the link connecting the probabilities to the covariates is non-linear. We show in this paper that when the cross-classification of all the covariates and the dependent variable have no empty cells, then the probabilities of responses can be expressed as linear functions of the covariates. We demonstrate this for both the dichotmous and polytomous dependent variables.
文摘In this study, the simultaneous determination of verapamil hydrochloride and gliclazide in pharmaceuticals by chemometric approaches using UV spectrophotometry has been reported. Verapamil hydrochloride (VER) (Benzeneacetonitrile, α-[3-[[2-(3,4-dimethoxyphenyl) ethyl] methylamino]propyl]-3, 4-dimethoxy-α-(1-methylethyl) hydrochloride) is an L-type calcium channel blocker of the phenylalkylamine class. It has been used in the treatment of hypertension, angina pectoris, and cardiac arrhythmia. Gliclazide (GLZ) (1-(Hexahydrocyclopenta[c]pyrrol-2(1H)-yl)-3-[(4-methylphenyl) sulphonyl]urea) is an oral hypoglycaemic (anti-diabetic) drug and is classified as a second generation sulfonylurea. Spectra of VER and GLZ were recorded at several concentrations within their linear ranges between wavelengths of 200 nm to 400 nm in 0.1N HCl. Partial least squares regression (PLS) and principle components regression (PCR) were used for chemometric analysis of data and the parameters of the chemometric procedures were optimized. The recoveries were satisfactory and statistically comparable. The method was successfully applied to pharmaceutical formulation, tablet, with no interference from excipients as indicated by the recovery study results. The proposed methods are simple, rapid and can be easily used in the quality control of drugs as alternative analysis tools.
基金supported by National Key R&D Program of China(Nos.2019YFC0214800 and 2017YFC0212100)Beijing Municipal Science&Technology Commission(No.Z181100005418015)。
文摘In recent years,with rapid increases in the number of vehicles in China,the contribution of vehicle exhaust emissions to air pollution has become increasingly prominent.To achieve the precise control of emissions,on-road remote sensing(RS)technology has been developed and applied for law enforcement and supervision.However,data quality is still an existing issue affecting the development and application of RS.In this study,the RS data from a cross-road RS system used at a single site(from 2012 to 2015)were collected,the data screening process was reviewed,the issues with data quality were summarized,a new method of data screening and calibration was proposed,and the effectiveness of the improved data quality control methods was finally evaluated.The results showed that this method reduces the skewness and kurtosis of the data distribution by up to nearly 67%,which restores the actual characteristics of exhaust diffusion and is conducive to the identification of actual clean and high-emission vehicles.The annual variability of emission factors of nitric oxide decreases by 60%-on average-eliminating the annual drift of fleet emissions and improving data reliability.
基金National High Resolution Earth Observation System(the Civil Part)Technology Projects of ChinaLocal Scientific&Technological Development Projects of Qinghai Guided by Central Government of ChinaDisaster Research Foundation of PICC P&C,No.2017D24-03。
文摘Angstrom-Prescott equation(AP)is the algorithm recommended by the Food and Agriculture Organization(FAO)of the United Nations for calculating the surface solar radiation(R_(s))to support the estimation of crop evapotranspiration.Thus,the a_(s) and b_(s) coefficients in the AP are vital.This study aims to obtain coefficients a_(s) and b_(s) in the AP,which are optimized for Chinas comprehensive agricultural divisions.The average monthly solar radiation and relative sunshine duration data at 121 stations from 1957-2016 were collected.Using data from 1957 to 2010,we calculated the monthly a_(s) and b_(s) coefficients for each subregion by least-squares regression.Then,taking the observation values of R_(s) from 2011 to 2016 as the true values,we estimated and compared the relative accuracy of R_(s) calculated using the regression values of coefficients a_(s) and b_(s) and that calculated with the FAO recommended coefficients.The monthly coefficients,a_(s) and b_(s),of each subregion are significantly different,both temporally and spatially,from the FAO recommended coefficients.The relative error range(0-54%)of R_(s) calculated via the regression values of the a_(s) and b_(s) coefficients is better than the relative error range(0-77%)of R_(s) calculated using the FAO suggested coefficients.The station-mean relative error was reduced by 1% to 6%.However,the regression values of the a_(s) and b_(s) coefficients performed worse in certain months and agricultural subregions during verification.Therefore,we selected the a_(s) and b_(s) coefficients with the minimum R_(s) estimation error as the final coefficients and constructed a coefficient recommendation table for 36 agricultural production and management subregions in China.These coefficient recommendations enrich the case study of coefficient calibration for the AP in China and can improve the accuracy of calculating R_(s) and crop evapotranspiration based on existing data.
基金The authors acknowledge the financial support and a research grant provided by the Thailand Research Fund (TRF) and the Faculty of Engineering at Kamphaeng Saen, Kasetsart University, Thailand.
文摘Dry rubber content(DRC)is an important factor to be considered in evaluating the quality of cup lump rubber.The DRC analysis requires prolonged laboratory validation.To develop fast and effective DRC determination methods,this study proposed methods to evaluate the DRC of cup lump rubber using different spectroscopic measurement approaches.This involved a complete fundamental analysis leading to an efficient measurement method based on either point-based measurement using NIR reflectance spectrometer or area-based measurement using hyperspectral imaging.A dataset was prepared that 120 samples were randomly divided into a calibration set of 90 samples and a validation set of 30 samples.To obtain an average spectrum to represent a cup lump rubber sample,the spectral data were collected by locating and scanning for point-based and area-based measurement,respectively.The spectral data were calibrated using partial least squares regression(PLSR)and the least-squares support vector machine(LS-SVM)methods against the reference values.The experiments showed that the area-based measurement approach with both algorithms performed outstandingly in predicting the DRC of cup lump rubber and was clearly better than the point-based measurement approach.The best predictions of PLSR represented by the coefficient of determination(R2),the root mean square error of prediction(RMSEP)and the residual predictive deviation(RPD)were 0.99,0.72%and 15.17,while the best prediction of LS-SVM were 0.99,0.64%and 16.83,respectively.In summary,the area-based measurement based on the LS-SVM prediction model provided a highly accurate estimate of the DRC of cup lump rubber.