Weighted total least squares(WTLS)have been regarded as the standard tool for the errors-in-variables(EIV)model in which all the elements in the observation vector and the coefficient matrix are contaminated with rand...Weighted total least squares(WTLS)have been regarded as the standard tool for the errors-in-variables(EIV)model in which all the elements in the observation vector and the coefficient matrix are contaminated with random errors.However,in many geodetic applications,some elements are error-free and some random observations appear repeatedly in different positions in the augmented coefficient matrix.It is called the linear structured EIV(LSEIV)model.Two kinds of methods are proposed for the LSEIV model from functional and stochastic modifications.On the one hand,the functional part of the LSEIV model is modified into the errors-in-observations(EIO)model.On the other hand,the stochastic model is modified by applying the Moore-Penrose inverse of the cofactor matrix.The algorithms are derived through the Lagrange multipliers method and linear approximation.The estimation principles and iterative formula of the parameters are proven to be consistent.The first-order approximate variance-covariance matrix(VCM)of the parameters is also derived.A numerical example is given to compare the performances of our proposed three algorithms with the STLS approach.Afterwards,the least squares(LS),total least squares(TLS)and linear structured weighted total least squares(LSWTLS)solutions are compared and the accuracy evaluation formula is proven to be feasible and effective.Finally,the LSWTLS is applied to the field of deformation analysis,which yields a better result than the traditional LS and TLS estimations.展开更多
One-class classification problem has become a popular problem in many fields, with a wide range of applications in anomaly detection, fault diagnosis, and face recognition. We investigate the one-class classification ...One-class classification problem has become a popular problem in many fields, with a wide range of applications in anomaly detection, fault diagnosis, and face recognition. We investigate the one-class classification problem for second-order tensor data. Traditional vector-based one-class classification methods such as one-class support vector machine (OCSVM) and least squares one-class support vector machine (LSOCSVM) have limitations when tensor is used as input data, so we propose a new tensor one-class classification method, LSOCSTM, which directly uses tensor as input data. On one hand, using tensor as input data not only enables to classify tensor data, but also for vector data, classifying it after high dimensionalizing it into tensor still improves the classification accuracy and overcomes the over-fitting problem. On the other hand, different from one-class support tensor machine (OCSTM), we use squared loss instead of the original loss function so that we solve a series of linear equations instead of quadratic programming problems. Therefore, we use the distance to the hyperplane as a metric for classification, and the proposed method is more accurate and faster compared to existing methods. The experimental results show the high efficiency of the proposed method compared with several state-of-the-art methods.展开更多
In response to the complex characteristics of actual low-permeability tight reservoirs,this study develops a meshless-based numerical simulation method for oil-water two-phase flow in these reservoirs,considering comp...In response to the complex characteristics of actual low-permeability tight reservoirs,this study develops a meshless-based numerical simulation method for oil-water two-phase flow in these reservoirs,considering complex boundary shapes.Utilizing radial basis function point interpolation,the method approximates shape functions for unknown functions within the nodal influence domain.The shape functions constructed by the aforementioned meshless interpolation method haveδ-function properties,which facilitate the handling of essential aspects like the controlled bottom-hole flow pressure in horizontal wells.Moreover,the meshless method offers greater flexibility and freedom compared to grid cell discretization,making it simpler to discretize complex geometries.A variational principle for the flow control equation group is introduced using a weighted least squares meshless method,and the pressure distribution is solved implicitly.Example results demonstrate that the computational outcomes of the meshless point cloud model,which has a relatively small degree of freedom,are in close agreement with those of the Discrete Fracture Model(DFM)employing refined grid partitioning,with pressure calculation accuracy exceeding 98.2%.Compared to high-resolution grid-based computational methods,the meshless method can achieve a better balance between computational efficiency and accuracy.Additionally,the impact of fracture half-length on the productivity of horizontal wells is discussed.The results indicate that increasing the fracture half-length is an effective strategy for enhancing production from the perspective of cumulative oil production.展开更多
This article explores the comparison between the probability method and the least squares method in the design of linear predictive models. It points out that these two approaches have distinct theoretical foundations...This article explores the comparison between the probability method and the least squares method in the design of linear predictive models. It points out that these two approaches have distinct theoretical foundations and can lead to varied or similar results in terms of precision and performance under certain assumptions. The article underlines the importance of comparing these two approaches to choose the one best suited to the context, available data and modeling objectives.展开更多
Considering chaotic time series multi-step prediction, multi-step direct prediction model based on partial least squares (PLS) is proposed in this article, where PLS, the method for predicting a set of dependent var...Considering chaotic time series multi-step prediction, multi-step direct prediction model based on partial least squares (PLS) is proposed in this article, where PLS, the method for predicting a set of dependent variables forming a large set of predictors, is used to model the dynamic evolution between the space points and the corresponding future points. The model can eliminate error accumulation with the common single-step local model algorithm~ and refrain from the high multi-collinearity problem in the reconstructed state space with the increase of embedding dimension. Simulation predictions are done on the Mackey-Glass chaotic time series with the model. The satisfying prediction accuracy is obtained and the model efficiency verified. In the experiments, the number of extracted components in PLS is set with cross-validation procedure.展开更多
Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi...Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.展开更多
Estimating wheat grain protein content by remote sensing is important for assessing wheat quality at maturity and making grains harvest and purchase policies. However, spatial variability of soil condition, temperatur...Estimating wheat grain protein content by remote sensing is important for assessing wheat quality at maturity and making grains harvest and purchase policies. However, spatial variability of soil condition, temperature, and precipitation will affect grain protein contents and these factors usually cannot be monitored accurately by remote sensing data from single image. In this research, the relationships between wheat protein content at maturity and wheat agronomic parameters at different growing stages were analyzed and multi-temporal images of Landsat TM were used to estimate grain protein content by partial least squares regression. Experiment data were acquired in the suburb of Beijing during a 2-yr experiment in the period from 2003 to 2004. Determination coefficient, average deviation of self-modeling, and deviation of cross- validation were employed to assess the estimation accuracy of wheat grain protein content. Their values were 0.88, 1.30%, 3.81% and 0.72, 5.22%, 12.36% for 2003 and 2004, respectively. The research laid an agronomic foundation for GPC (grain protein content) estimation by multi-temporal remote sensing. The results showed that it is feasible to estimate GPC of wheat from multi-temporal remote sensing data in large area.展开更多
Local learning based soft sensing methods succeed in coping with time-varying characteristics of processes as well as nonlinearities in industrial plants. In this paper, a local partial least squares based soft sensin...Local learning based soft sensing methods succeed in coping with time-varying characteristics of processes as well as nonlinearities in industrial plants. In this paper, a local partial least squares based soft sensing method for multi-output processes is proposed to accomplish process states division and local model adaptation,which are two key steps in development of local learning based soft sensors. An adaptive way of partitioning process states without redundancy is proposed based on F-test, where unique local time regions are extracted.Subsequently, a novel anti-over-fitting criterion is proposed for online local model adaptation which simultaneously considers the relationship between process variables and the information in labeled and unlabeled samples. Case study is carried out on two chemical processes and simulation results illustrate the superiorities of the proposed method from several aspects.展开更多
To implement a real-time reduction in NOx,a rapid and accurate model is required.A PLS-ELM model based on the combination of partial least squares(PLS)and the extreme learning machine(ELM)for the establishment of the ...To implement a real-time reduction in NOx,a rapid and accurate model is required.A PLS-ELM model based on the combination of partial least squares(PLS)and the extreme learning machine(ELM)for the establishment of the NOx emission model of utility boilers is proposed.First,the initial input variables of the NOx emission model are determined according to the mechanism analysis.Then,the initial input data is extracted by PLS.Finally,the extracted information is used as the input of the ELM model.A large amount of real data was obtained from the distributed control system(DCS)historical database of a 1 000 MW power plant boiler to train and validate the PLS-ELM model.The modeling performance of the PLS-ELM was compared with that of the back propagation(BP)neural network,support vector machine(SVM)and ELM models.The mean relative errors(MRE)of the PLS-ELM model were 1.58%for the training dataset and 1.69%for the testing dataset.The prediction precision of the PLS-ELM model is higher than those of the BP,SVM and ELM models.The consumption time of the PLS-ELM model is also shorter than that of the BP,SVM and ELM models.展开更多
An approach for batch processes monitoring and fault detection based on multiway kernel partial least squares(MKPLS) was presented.It is known that conventional batch process monitoring methods,such as multiway partia...An approach for batch processes monitoring and fault detection based on multiway kernel partial least squares(MKPLS) was presented.It is known that conventional batch process monitoring methods,such as multiway partial least squares(MPLS),are not suitable due to their intrinsic linearity when the variations are nonlinear.To address this issue,kernel partial least squares(KPLS) was used to capture the nonlinear relationship between the latent structures and predictive variables.In addition,KPLS requires only linear algebra and does not involve any nonlinear optimization.In this paper,the application of KPLS was extended to on-line monitoring of batch processes.The proposed batch monitoring method was applied to a simulation benchmark of fed-batch penicillin fermentation process.And the results demonstrate the superior monitoring performance of MKPLS in comparison to MPLS monitoring.展开更多
Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more acc...Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more accurate estimation. To develop a statistical approach to joint association analysis that includes allele detection and genetic effect estimation, we combined multivariate partial least squares regression with variable selection strategies and selected the optimal model using the Bayesian Information Criterion(BIC). We then performed extensive simulations under varying heritabilities and sample sizes to compare the performance achieved using our method with those obtained by single-trait multilocus methods. Joint association analysis has measurable advantages over single-trait methods, as it exhibits superior gene detection power, especially for pleiotropic genes. Sample size, heritability,polymorphic information content(PIC), and magnitude of gene effects influence the statistical power, accuracy and precision of effect estimation by the joint association analysis.展开更多
A quantitative structure-activity relationships (QSAR) study is suggested for the prediction of solubility of some thiazolidine-4- carboxylic acid derivatives in aqueous solution. Ab initio theory was used to calcul...A quantitative structure-activity relationships (QSAR) study is suggested for the prediction of solubility of some thiazolidine-4- carboxylic acid derivatives in aqueous solution. Ab initio theory was used to calculate some quantum chemical descriptors including electrostatic potentials and local charges at each atom, HOMO and LUMO energies, etc. Modeling of the solubility of thiazolidine- 4-carboxylic acid derivatives as a function of molecular structures was established by means of the partial least squares (PLS). The subset of descriptors, which resulted in the low prediction error, was selected by genetic algorithm. This model was applied for the prediction of the solubility of some thiazolidine-4-carboxylic acid derivatives, which were not in the modeling procedure. The relative errors of prediction lower that -4% was obtained by using GA-PLS method. The resulted model showed high prediction ability with RMSEP of 3.8836 and 2.9500 for PLS and GA-PLS models, respectively.展开更多
The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality.Partial least squares(PLS) regression model,in which the turbidity and Fe are regarde...The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality.Partial least squares(PLS) regression model,in which the turbidity and Fe are regarded as control objectives,is used to establish the statistical model.The experimental results indicate that the PLS regression model has good predicted results of water quality compared with the monitored data.The percentages of absolute relative error(below 15%,20%,30%) are 44.4%,66.7%,100%(turbidity) and 33.3%,44.4%,77.8%(Fe) on the 4th sampling point;77.8%,88.9%,88.9%(turbidity) and 44.4%,55.6%,66.7%(Fe) on the 5th sampling point.展开更多
This study used near-infrared(NIR)spectroscopy to predict mechanical properties of wood.NIR spectra were collected in wavelengths 900–1700 nm,and spectra averaged by radial and tangential surface spectra were used to...This study used near-infrared(NIR)spectroscopy to predict mechanical properties of wood.NIR spectra were collected in wavelengths 900–1700 nm,and spectra averaged by radial and tangential surface spectra were used to establish a partial least square(PLS)model based on correlation local embedding(CLE).Mongolian oak(Quercus mongolica Fisch.ex Ledeb.)was used to test the eff ectiveness of the model.The cross-validation method was used to verify the robustness of the CLE–PLS model.Ninety samples were tested as the calibration set and forty-fi ve as the validation set.The results show that the prediction coeffi cient of determination(R2 p)is 0.80 for MOR,and 0.78 for MOE.The ratio of performance to deviation is 2.23 for MOR and 2.15 for MOE.展开更多
The computer auxiliary partial least squares is introduced to simultaneously determine the contents of Deoxyschizandin, Schisandrin, r-Schisandrin in the extracted solution of wuweizi. Regression analysis of the exper...The computer auxiliary partial least squares is introduced to simultaneously determine the contents of Deoxyschizandin, Schisandrin, r-Schisandrin in the extracted solution of wuweizi. Regression analysis of the experimental results shows that the average recovery of each component is all in the range from 98.9% to 110.3% , which means the partial least squares regression spectrophotometry can circumvent the overlappirtg of absorption spectrums of mlulti-components, so that sctisfactory results can be obtained without any scrapple pre-separation.展开更多
Least squares projection twin support vector machine(LSPTSVM)has faster computing speed than classical least squares support vector machine(LSSVM).However,LSPTSVM is sensitive to outliers and its solution lacks sparsi...Least squares projection twin support vector machine(LSPTSVM)has faster computing speed than classical least squares support vector machine(LSSVM).However,LSPTSVM is sensitive to outliers and its solution lacks sparsity.Therefore,it is difficult for LSPTSVM to process large-scale datasets with outliers.In this paper,we propose a robust LSPTSVM model(called R-LSPTSVM)by applying truncated least squares loss function.The robustness of R-LSPTSVM is proved from a weighted perspective.Furthermore,we obtain the sparse solution of R-LSPTSVM by using the pivoting Cholesky factorization method in primal space.Finally,the sparse R-LSPTSVM algorithm(SR-LSPTSVM)is proposed.Experimental results show that SR-LSPTSVM is insensitive to outliers and can deal with large-scale datasets fastly.展开更多
The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for t...The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for the liquor brands with the same flavor and the same alcohol content is essential. However, it is also difficult because the components of such liquor samples are very similar. Near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was applied to identification of liquor brands with the same flavor and alcohol content. A total of 160 samples of Luzhou Laojiao liquor and 200 samples of non-Luzhou Laojiao liquor with the same flavor and alcohol content were used for identification. Samples of each type were randomly divided into the modeling and validation sets. The modeling samples were further divided into calibration and prediction sets using the Kennard-Stone algorithm to achieve uniformity and representativeness. In the modeling and validation processes based on PLS-DA method, the recognition rates of samples achieved 99.1% and 98.7%, respectively. The results show high prediction performance for the identification of liquor brands, and were obviously better than those obtained from the principal component linear discriminant analysis method. NIR spectroscopy combined with the PLS-DA method provides a quick and effective means of the discriminant analysis of liquor brands, and is also a promising tool for large-scale inspection of liquor food safety.展开更多
Near-infrared (NIR) spectroscopy was applied to reagent-free quantitative analysis of polysaccharide of a brand product of proprietary Chinese medicine (PCM) oral solution samples. A novel method, called absorbance up...Near-infrared (NIR) spectroscopy was applied to reagent-free quantitative analysis of polysaccharide of a brand product of proprietary Chinese medicine (PCM) oral solution samples. A novel method, called absorbance upper optimization partial least squares (AUO-PLS), was proposed and successfully applied to the wavelength selection. Based on varied partitioning of the calibration and prediction sample sets, the parameter optimization was performed to achieve stability. On the basis of the AUO-PLS method, the selected upper bound of appropriate absorbance was 1.53 and the corresponding wavebands combination was 400 - 1880 & 2088 - 2346 nm. With the use of random validation samples excluded from the modeling process, the root-mean-square error and correlation coefficient of prediction for polysaccharide were 27.09 mg·L<sup>-</sup><sup>1</sup> and 0.888, respectively. The results indicate that the NIR prediction values are close to those of the measured values. NIR spectroscopy combined with AUO-PLS method provided a promising tool for quantification of the polysaccharide for PCM oral solution and this technique is rapid and simple when compared with conventional methods.展开更多
When the total least squares(TLS)solution is used to solve the parameters in the errors-in-variables(EIV)model,the obtained parameter estimations will be unreliable in the observations containing systematic errors.To ...When the total least squares(TLS)solution is used to solve the parameters in the errors-in-variables(EIV)model,the obtained parameter estimations will be unreliable in the observations containing systematic errors.To solve this problem,we propose to add the nonparametric part(systematic errors)to the partial EIV model,and build the partial EIV model to weaken the influence of systematic errors.Then,having rewritten the model as a nonlinear model,we derive the formula of parameter estimations based on the penalized total least squares criterion.Furthermore,based on the second-order approximation method of precision estimation,we derive the second-order bias and covariance of parameter estimations and calculate the mean square error(MSE).Aiming at the selection of the smoothing factor,we propose to use the U curve method.The experiments show that the proposed method can mitigate the influence of systematic errors to a certain extent compared with the traditional method and get more reliable parameter estimations and its precision information,which validates the feasibility and effectiveness of the proposed method.展开更多
As important components of air pollutant,volatile organic compounds(VOCs)can cause great harm to environment and human body.The concentration change of VOCs should be focused on in real-time environment monitoring sys...As important components of air pollutant,volatile organic compounds(VOCs)can cause great harm to environment and human body.The concentration change of VOCs should be focused on in real-time environment monitoring system.In order to solve the problem of wavelength redundancy in full spectrum partial least squares(PLS)modeling for VOCs concentration analysis,a new method based on improved interval PLS(iPLS)integrated with Monte-Carlo sampling,called iPLS-MC method,was proposed to select optimal characteristic wavelengths of VOCs spectra.This method uses iPLS modeling to preselect the characteristic wavebands of the spectra and generates random wavelength combinations from the selected wavebands by Monte-Carlo sampling.The wavelength combination with the best prediction result in regression model is selected as the characteristic wavelengths of the spectrum.Different wavelength selection methods were built,respectively,on Fourier transform infrared(FTIR)spectra of ethylene and ethanol gas at different concentrations obtained in the laboratory.When the interval number of iPLS model is set to 30 and the Monte-Carlo sampling runs 1000 times,the characteristic wavelengths selected by iPLS-MC method can reduce from 8916 to 10,which occupies only 0.22%of the full spectrum wavelengths.While the RMSECV and correlation coefficient(Rc)for ethylene are 0.2977 and 0.9999 ppm,and those for ethanol gas are 0.2977 ppm and 0.9999.The experimental results show that the iPLS-MC method can select the optimal characteristic wavelengths of VOCs FTIR spectra stably and effectively,and the prediction performance of the regression model can be significantly improved and simplified by using characteristic wavelengths.展开更多
基金the financial support of the National Natural Science Foundation of China(Grant No.42074016,42104025,42274057and 41704007)Hunan Provincial Natural Science Foundation of China(Grant No.2021JJ30244)Scientific Research Fund of Hunan Provincial Education Department(Grant No.22B0496)。
文摘Weighted total least squares(WTLS)have been regarded as the standard tool for the errors-in-variables(EIV)model in which all the elements in the observation vector and the coefficient matrix are contaminated with random errors.However,in many geodetic applications,some elements are error-free and some random observations appear repeatedly in different positions in the augmented coefficient matrix.It is called the linear structured EIV(LSEIV)model.Two kinds of methods are proposed for the LSEIV model from functional and stochastic modifications.On the one hand,the functional part of the LSEIV model is modified into the errors-in-observations(EIO)model.On the other hand,the stochastic model is modified by applying the Moore-Penrose inverse of the cofactor matrix.The algorithms are derived through the Lagrange multipliers method and linear approximation.The estimation principles and iterative formula of the parameters are proven to be consistent.The first-order approximate variance-covariance matrix(VCM)of the parameters is also derived.A numerical example is given to compare the performances of our proposed three algorithms with the STLS approach.Afterwards,the least squares(LS),total least squares(TLS)and linear structured weighted total least squares(LSWTLS)solutions are compared and the accuracy evaluation formula is proven to be feasible and effective.Finally,the LSWTLS is applied to the field of deformation analysis,which yields a better result than the traditional LS and TLS estimations.
文摘One-class classification problem has become a popular problem in many fields, with a wide range of applications in anomaly detection, fault diagnosis, and face recognition. We investigate the one-class classification problem for second-order tensor data. Traditional vector-based one-class classification methods such as one-class support vector machine (OCSVM) and least squares one-class support vector machine (LSOCSVM) have limitations when tensor is used as input data, so we propose a new tensor one-class classification method, LSOCSTM, which directly uses tensor as input data. On one hand, using tensor as input data not only enables to classify tensor data, but also for vector data, classifying it after high dimensionalizing it into tensor still improves the classification accuracy and overcomes the over-fitting problem. On the other hand, different from one-class support tensor machine (OCSTM), we use squared loss instead of the original loss function so that we solve a series of linear equations instead of quadratic programming problems. Therefore, we use the distance to the hyperplane as a metric for classification, and the proposed method is more accurate and faster compared to existing methods. The experimental results show the high efficiency of the proposed method compared with several state-of-the-art methods.
文摘In response to the complex characteristics of actual low-permeability tight reservoirs,this study develops a meshless-based numerical simulation method for oil-water two-phase flow in these reservoirs,considering complex boundary shapes.Utilizing radial basis function point interpolation,the method approximates shape functions for unknown functions within the nodal influence domain.The shape functions constructed by the aforementioned meshless interpolation method haveδ-function properties,which facilitate the handling of essential aspects like the controlled bottom-hole flow pressure in horizontal wells.Moreover,the meshless method offers greater flexibility and freedom compared to grid cell discretization,making it simpler to discretize complex geometries.A variational principle for the flow control equation group is introduced using a weighted least squares meshless method,and the pressure distribution is solved implicitly.Example results demonstrate that the computational outcomes of the meshless point cloud model,which has a relatively small degree of freedom,are in close agreement with those of the Discrete Fracture Model(DFM)employing refined grid partitioning,with pressure calculation accuracy exceeding 98.2%.Compared to high-resolution grid-based computational methods,the meshless method can achieve a better balance between computational efficiency and accuracy.Additionally,the impact of fracture half-length on the productivity of horizontal wells is discussed.The results indicate that increasing the fracture half-length is an effective strategy for enhancing production from the perspective of cumulative oil production.
文摘This article explores the comparison between the probability method and the least squares method in the design of linear predictive models. It points out that these two approaches have distinct theoretical foundations and can lead to varied or similar results in terms of precision and performance under certain assumptions. The article underlines the importance of comparing these two approaches to choose the one best suited to the context, available data and modeling objectives.
文摘Considering chaotic time series multi-step prediction, multi-step direct prediction model based on partial least squares (PLS) is proposed in this article, where PLS, the method for predicting a set of dependent variables forming a large set of predictors, is used to model the dynamic evolution between the space points and the corresponding future points. The model can eliminate error accumulation with the common single-step local model algorithm~ and refrain from the high multi-collinearity problem in the reconstructed state space with the increase of embedding dimension. Simulation predictions are done on the Mackey-Glass chaotic time series with the model. The satisfying prediction accuracy is obtained and the model efficiency verified. In the experiments, the number of extracted components in PLS is set with cross-validation procedure.
基金supported by the projects under the Innovation Team of the Safety Standards and Testing Technology for Agricultural Products of Zhejiang Province, China (Grant No.2010R50028)the National Key Technologies R&D Program of China during the 11th Five-Year Plan Period (Grant No.2006BAK02A18)
文摘Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.
基金the National Natural Science Foundation of China (41171281, 40701120)the Beijing Nova Program, China (2008B33)
文摘Estimating wheat grain protein content by remote sensing is important for assessing wheat quality at maturity and making grains harvest and purchase policies. However, spatial variability of soil condition, temperature, and precipitation will affect grain protein contents and these factors usually cannot be monitored accurately by remote sensing data from single image. In this research, the relationships between wheat protein content at maturity and wheat agronomic parameters at different growing stages were analyzed and multi-temporal images of Landsat TM were used to estimate grain protein content by partial least squares regression. Experiment data were acquired in the suburb of Beijing during a 2-yr experiment in the period from 2003 to 2004. Determination coefficient, average deviation of self-modeling, and deviation of cross- validation were employed to assess the estimation accuracy of wheat grain protein content. Their values were 0.88, 1.30%, 3.81% and 0.72, 5.22%, 12.36% for 2003 and 2004, respectively. The research laid an agronomic foundation for GPC (grain protein content) estimation by multi-temporal remote sensing. The results showed that it is feasible to estimate GPC of wheat from multi-temporal remote sensing data in large area.
基金Supported by the National Natural Science Foundation of China(61273160)the Fundamental Research Funds for the Central Universities(14CX06067A,13CX05021A)
文摘Local learning based soft sensing methods succeed in coping with time-varying characteristics of processes as well as nonlinearities in industrial plants. In this paper, a local partial least squares based soft sensing method for multi-output processes is proposed to accomplish process states division and local model adaptation,which are two key steps in development of local learning based soft sensors. An adaptive way of partitioning process states without redundancy is proposed based on F-test, where unique local time regions are extracted.Subsequently, a novel anti-over-fitting criterion is proposed for online local model adaptation which simultaneously considers the relationship between process variables and the information in labeled and unlabeled samples. Case study is carried out on two chemical processes and simulation results illustrate the superiorities of the proposed method from several aspects.
基金The National Natural Science Foundation of China(No.71471060)Natural Science Foundation of Hebei Province(No.E2018502111)
文摘To implement a real-time reduction in NOx,a rapid and accurate model is required.A PLS-ELM model based on the combination of partial least squares(PLS)and the extreme learning machine(ELM)for the establishment of the NOx emission model of utility boilers is proposed.First,the initial input variables of the NOx emission model are determined according to the mechanism analysis.Then,the initial input data is extracted by PLS.Finally,the extracted information is used as the input of the ELM model.A large amount of real data was obtained from the distributed control system(DCS)historical database of a 1 000 MW power plant boiler to train and validate the PLS-ELM model.The modeling performance of the PLS-ELM was compared with that of the back propagation(BP)neural network,support vector machine(SVM)and ELM models.The mean relative errors(MRE)of the PLS-ELM model were 1.58%for the training dataset and 1.69%for the testing dataset.The prediction precision of the PLS-ELM model is higher than those of the BP,SVM and ELM models.The consumption time of the PLS-ELM model is also shorter than that of the BP,SVM and ELM models.
基金National Natural Science Foundation of China (No. 61074079)Shanghai Leading Academic Discipline Project,China (No.B504)
文摘An approach for batch processes monitoring and fault detection based on multiway kernel partial least squares(MKPLS) was presented.It is known that conventional batch process monitoring methods,such as multiway partial least squares(MPLS),are not suitable due to their intrinsic linearity when the variations are nonlinear.To address this issue,kernel partial least squares(KPLS) was used to capture the nonlinear relationship between the latent structures and predictive variables.In addition,KPLS requires only linear algebra and does not involve any nonlinear optimization.In this paper,the application of KPLS was extended to on-line monitoring of batch processes.The proposed batch monitoring method was applied to a simulation benchmark of fed-batch penicillin fermentation process.And the results demonstrate the superior monitoring performance of MKPLS in comparison to MPLS monitoring.
基金supported by grants from the National Program on the Development of Basic Research (2011CB100100)the Priority Academic Program Development of Jiangsu Higher Education Institutions, the National Natural Science Foundations (31391632, 31200943, 31171187, and 91535103)+3 种基金the National High-tech R&D Program (863 Program) (2014AA10A601-5)the Natural Science Foundations of Jiangsu Province (BK20150010)the Natural Science Foundation of the Jiangsu Higher Education Institutions (14KJA210005)the Innovative Research Team of Universities in Jiangsu Province (KYLX_1352)
文摘Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more accurate estimation. To develop a statistical approach to joint association analysis that includes allele detection and genetic effect estimation, we combined multivariate partial least squares regression with variable selection strategies and selected the optimal model using the Bayesian Information Criterion(BIC). We then performed extensive simulations under varying heritabilities and sample sizes to compare the performance achieved using our method with those obtained by single-trait multilocus methods. Joint association analysis has measurable advantages over single-trait methods, as it exhibits superior gene detection power, especially for pleiotropic genes. Sample size, heritability,polymorphic information content(PIC), and magnitude of gene effects influence the statistical power, accuracy and precision of effect estimation by the joint association analysis.
文摘A quantitative structure-activity relationships (QSAR) study is suggested for the prediction of solubility of some thiazolidine-4- carboxylic acid derivatives in aqueous solution. Ab initio theory was used to calculate some quantum chemical descriptors including electrostatic potentials and local charges at each atom, HOMO and LUMO energies, etc. Modeling of the solubility of thiazolidine- 4-carboxylic acid derivatives as a function of molecular structures was established by means of the partial least squares (PLS). The subset of descriptors, which resulted in the low prediction error, was selected by genetic algorithm. This model was applied for the prediction of the solubility of some thiazolidine-4-carboxylic acid derivatives, which were not in the modeling procedure. The relative errors of prediction lower that -4% was obtained by using GA-PLS method. The resulted model showed high prediction ability with RMSEP of 3.8836 and 2.9500 for PLS and GA-PLS models, respectively.
基金Supported by National Natural Science Foundation of China (No.50478086)Tianjin Special Scientific Innovation Foundation (No.06FZZDSH00900)
文摘The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality.Partial least squares(PLS) regression model,in which the turbidity and Fe are regarded as control objectives,is used to establish the statistical model.The experimental results indicate that the PLS regression model has good predicted results of water quality compared with the monitored data.The percentages of absolute relative error(below 15%,20%,30%) are 44.4%,66.7%,100%(turbidity) and 33.3%,44.4%,77.8%(Fe) on the 4th sampling point;77.8%,88.9%,88.9%(turbidity) and 44.4%,55.6%,66.7%(Fe) on the 5th sampling point.
基金financially supported by the China State Forestry Administration“948”projects(2015-4-52)Fundamental Research Funds for the Central Universities(2572017DB05)Heilongjiang Natural Science Foundation(C2017005)。
文摘This study used near-infrared(NIR)spectroscopy to predict mechanical properties of wood.NIR spectra were collected in wavelengths 900–1700 nm,and spectra averaged by radial and tangential surface spectra were used to establish a partial least square(PLS)model based on correlation local embedding(CLE).Mongolian oak(Quercus mongolica Fisch.ex Ledeb.)was used to test the eff ectiveness of the model.The cross-validation method was used to verify the robustness of the CLE–PLS model.Ninety samples were tested as the calibration set and forty-fi ve as the validation set.The results show that the prediction coeffi cient of determination(R2 p)is 0.80 for MOR,and 0.78 for MOE.The ratio of performance to deviation is 2.23 for MOR and 2.15 for MOE.
文摘The computer auxiliary partial least squares is introduced to simultaneously determine the contents of Deoxyschizandin, Schisandrin, r-Schisandrin in the extracted solution of wuweizi. Regression analysis of the experimental results shows that the average recovery of each component is all in the range from 98.9% to 110.3% , which means the partial least squares regression spectrophotometry can circumvent the overlappirtg of absorption spectrums of mlulti-components, so that sctisfactory results can be obtained without any scrapple pre-separation.
基金supported by the National Natural Science Foundation of China(6177202062202433+4 种基金621723716227242262036010)the Natural Science Foundation of Henan Province(22100002)the Postdoctoral Research Grant in Henan Province(202103111)。
文摘Least squares projection twin support vector machine(LSPTSVM)has faster computing speed than classical least squares support vector machine(LSSVM).However,LSPTSVM is sensitive to outliers and its solution lacks sparsity.Therefore,it is difficult for LSPTSVM to process large-scale datasets with outliers.In this paper,we propose a robust LSPTSVM model(called R-LSPTSVM)by applying truncated least squares loss function.The robustness of R-LSPTSVM is proved from a weighted perspective.Furthermore,we obtain the sparse solution of R-LSPTSVM by using the pivoting Cholesky factorization method in primal space.Finally,the sparse R-LSPTSVM algorithm(SR-LSPTSVM)is proposed.Experimental results show that SR-LSPTSVM is insensitive to outliers and can deal with large-scale datasets fastly.
文摘The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for the liquor brands with the same flavor and the same alcohol content is essential. However, it is also difficult because the components of such liquor samples are very similar. Near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was applied to identification of liquor brands with the same flavor and alcohol content. A total of 160 samples of Luzhou Laojiao liquor and 200 samples of non-Luzhou Laojiao liquor with the same flavor and alcohol content were used for identification. Samples of each type were randomly divided into the modeling and validation sets. The modeling samples were further divided into calibration and prediction sets using the Kennard-Stone algorithm to achieve uniformity and representativeness. In the modeling and validation processes based on PLS-DA method, the recognition rates of samples achieved 99.1% and 98.7%, respectively. The results show high prediction performance for the identification of liquor brands, and were obviously better than those obtained from the principal component linear discriminant analysis method. NIR spectroscopy combined with the PLS-DA method provides a quick and effective means of the discriminant analysis of liquor brands, and is also a promising tool for large-scale inspection of liquor food safety.
文摘Near-infrared (NIR) spectroscopy was applied to reagent-free quantitative analysis of polysaccharide of a brand product of proprietary Chinese medicine (PCM) oral solution samples. A novel method, called absorbance upper optimization partial least squares (AUO-PLS), was proposed and successfully applied to the wavelength selection. Based on varied partitioning of the calibration and prediction sample sets, the parameter optimization was performed to achieve stability. On the basis of the AUO-PLS method, the selected upper bound of appropriate absorbance was 1.53 and the corresponding wavebands combination was 400 - 1880 & 2088 - 2346 nm. With the use of random validation samples excluded from the modeling process, the root-mean-square error and correlation coefficient of prediction for polysaccharide were 27.09 mg·L<sup>-</sup><sup>1</sup> and 0.888, respectively. The results indicate that the NIR prediction values are close to those of the measured values. NIR spectroscopy combined with AUO-PLS method provided a promising tool for quantification of the polysaccharide for PCM oral solution and this technique is rapid and simple when compared with conventional methods.
基金supported by the National Natural Science Foundation of China,Nos.41874001 and 41664001Support Program for Outstanding Youth Talents in Jiangxi Province,No.20162BCB23050National Key Research and Development Program,No.2016YFB0501405。
文摘When the total least squares(TLS)solution is used to solve the parameters in the errors-in-variables(EIV)model,the obtained parameter estimations will be unreliable in the observations containing systematic errors.To solve this problem,we propose to add the nonparametric part(systematic errors)to the partial EIV model,and build the partial EIV model to weaken the influence of systematic errors.Then,having rewritten the model as a nonlinear model,we derive the formula of parameter estimations based on the penalized total least squares criterion.Furthermore,based on the second-order approximation method of precision estimation,we derive the second-order bias and covariance of parameter estimations and calculate the mean square error(MSE).Aiming at the selection of the smoothing factor,we propose to use the U curve method.The experiments show that the proposed method can mitigate the influence of systematic errors to a certain extent compared with the traditional method and get more reliable parameter estimations and its precision information,which validates the feasibility and effectiveness of the proposed method.
基金supported by National Key Scientific Instrument and Equipment Development Project of China,Grant Nos.2013YQ220643the National 863 Program of China,Grant Nos.2014AA06A503.
文摘As important components of air pollutant,volatile organic compounds(VOCs)can cause great harm to environment and human body.The concentration change of VOCs should be focused on in real-time environment monitoring system.In order to solve the problem of wavelength redundancy in full spectrum partial least squares(PLS)modeling for VOCs concentration analysis,a new method based on improved interval PLS(iPLS)integrated with Monte-Carlo sampling,called iPLS-MC method,was proposed to select optimal characteristic wavelengths of VOCs spectra.This method uses iPLS modeling to preselect the characteristic wavebands of the spectra and generates random wavelength combinations from the selected wavebands by Monte-Carlo sampling.The wavelength combination with the best prediction result in regression model is selected as the characteristic wavelengths of the spectrum.Different wavelength selection methods were built,respectively,on Fourier transform infrared(FTIR)spectra of ethylene and ethanol gas at different concentrations obtained in the laboratory.When the interval number of iPLS model is set to 30 and the Monte-Carlo sampling runs 1000 times,the characteristic wavelengths selected by iPLS-MC method can reduce from 8916 to 10,which occupies only 0.22%of the full spectrum wavelengths.While the RMSECV and correlation coefficient(Rc)for ethylene are 0.2977 and 0.9999 ppm,and those for ethanol gas are 0.2977 ppm and 0.9999.The experimental results show that the iPLS-MC method can select the optimal characteristic wavelengths of VOCs FTIR spectra stably and effectively,and the prediction performance of the regression model can be significantly improved and simplified by using characteristic wavelengths.