Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more acc...Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more accurate estimation. To develop a statistical approach to joint association analysis that includes allele detection and genetic effect estimation, we combined multivariate partial least squares regression with variable selection strategies and selected the optimal model using the Bayesian Information Criterion(BIC). We then performed extensive simulations under varying heritabilities and sample sizes to compare the performance achieved using our method with those obtained by single-trait multilocus methods. Joint association analysis has measurable advantages over single-trait methods, as it exhibits superior gene detection power, especially for pleiotropic genes. Sample size, heritability,polymorphic information content(PIC), and magnitude of gene effects influence the statistical power, accuracy and precision of effect estimation by the joint association analysis.展开更多
In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically ind...In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.展开更多
The paper considers a multivariate partially linear model under independent errors,and investigates the asymptotic bias and variance-covariance for parametric component βand nonparametric component F(·)by the ...The paper considers a multivariate partially linear model under independent errors,and investigates the asymptotic bias and variance-covariance for parametric component βand nonparametric component F(·)by the GJS estimator and Kernel estimation.展开更多
The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by tradit...The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by traditional spectrophotometric methods.In this paper,the partial least-squares(PLS)regression is applied to the simultaneous determination of these compounds in mixtures by UV spectrophtometry without any pretreatment of the samples.Ten synthetic mixture samples are analyzed by the proposed method.The mean recoveries are 99.4%,996%,100.2%,99.3% and 99.1%,and the relative standard deviations(RSD) are 1.87%,1.98%,1.94%,0.960% and 0.672%,respectively.展开更多
Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of hea...Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2 500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respec-tively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demon-strates that it is feasible to estimate the disease severity of rice brown spot using hyperspectral reflectance data at the leaf level.展开更多
China has the largest apple planting area and total yield in the world, and the Fuji apple is the major cultivar, accounting for more than 70% of apple planting acreage in China. Apple qualities are affected by meteo...China has the largest apple planting area and total yield in the world, and the Fuji apple is the major cultivar, accounting for more than 70% of apple planting acreage in China. Apple qualities are affected by meteorological conditions, soil types, nutrient content of soil, and management practices. Meteorological factors, such as light, temperature and moisture are key environmental conditions affecting apple quality that are difficult to regulate and control. This study was performed to determine the effect of meteorological factors on the qualities of Fuji apple and to provide evidence for a reasonable regional layout and planting of Fuji apple in China. Fruit samples of Fuji apple and meteorological data were investigated from 153 commercial Fuji apple orchards located in 51 counties of 11 regions in China from 2010 to 2011. Partial least-squares regression and linear programming were used to analyze the effect model and impact weight of meteorological factors on fruit quality, to determine the major meteorological factors influencing fruit quality attributes, and to establish a regression equation to optimize meteorological factors for high-quality Fuji apples. Results showed relationships between fruit quality attributes and meteorological factors among the various apple producing counties in China. The mean, minimum, and maximum temperatures from April to October had the highest positive effects on fruit qualities in model effect loadings and weights, followed by the mean annual temperature and the sunshine percentage, the temperature difference between day and night, and the total precipitation for the same period. In contrast, annual total precipitation and relative humidity from April to October had negative effects on fruit quality. The meteorological factors exhibited distinct effects on the different fruit quality attributes. Soluble solid content was affected from the high to the low row preface by annual total precipitation, the minimum temperature from April to October, the mean temperature from April to October, the temperature difference between day and night, and the mean annual temperature. The regression equation showed that the optimum meteorological factors on fruit quality were the mean annual temperature of 5.5-18°C and the annual total precipitation of 602-1121 mm for the whole year, and the mean temperature of 13.3-19.6°C, the minimum temperature of 7.8-18.5°C, the maximum temperature of 19.5°C, the temperature difference of 13.7°C between day and night, the total precipitation of 227 mm, the relative humidity of 57.5-84.0%, and the sunshine percentage of 36.5-70.0% during the growing period (from April to October).展开更多
Multivariate statistical process monitoring and control (MSPM&C) methods for chemical process monitoring with statistical projection techniques such as principal component analysis (PCA) and partial least squares ...Multivariate statistical process monitoring and control (MSPM&C) methods for chemical process monitoring with statistical projection techniques such as principal component analysis (PCA) and partial least squares (PLS) are surveyed in this paper. The four-step procedure of performing MSPM&C for chemical process, modeling of processes, detecting abnormal events or faults, identifying the variable(s) responsible for the faults and diagnosing the source cause for the abnormal behavior, is analyzed. Several main research directions of MSPM&C reported in the literature are discussed, such as multi-way principal component analysis (MPCA) for batch process, statistical monitoring and control for nonlinear process, dynamic PCA and dynamic PLS, and on-line quality control by inferential models. Industrial applications of MSPM&C to several typical chemical processes, such as chemical reactor, distillation column, polymerization process, petroleum refinery units, are summarized. Finally, some concluding remarks and future considerations are made.展开更多
Data-driven partial differential equation identification is a potential breakthrough to solve the lack of physical equations in complex dynamic systems.However,existing equation identification methods still cannot eff...Data-driven partial differential equation identification is a potential breakthrough to solve the lack of physical equations in complex dynamic systems.However,existing equation identification methods still cannot effectively identify equations from multivariable complex systems.In this work,we combine physical constraints such as dimension and direction of equation with data-driven method,and successfully identify the Navier-Stocks equations from the flow field data of Karman vortex street.This method provides an effective approach to identify partial differential equations of multivariable complex systems.展开更多
A simple and rapid analytical method for the simultaneous quantification of three commercial azo dyes—Tartrazine (TAR), Congo Red (CR), and Amido Black (AB) in water is presented. The simultaneous assessment of the i...A simple and rapid analytical method for the simultaneous quantification of three commercial azo dyes—Tartrazine (TAR), Congo Red (CR), and Amido Black (AB) in water is presented. The simultaneous assessment of the individual concentration of an organic dye in mixtures using a spectrophotometric method is a difficult procedure in analytical chemistry, due to spectral overlapping. This drawback can be overcome if a multivariate calibration method such as Partial Least Squares Regression (PLSR) is used. This study presents a calibration model based on absorption spectra in the 300 - 650 nm range for a set of 20 different mixtures of dyes, followed by the prediction of the concentrations of dyes in 6 validation mixtures, randomly selected, using the PLSR method. Estimated limits of detection (LOD) were 0.106, 0.047 and 0.079 mg/L for TAR, CR, and AB, respectively, and limits of quantification (LOQ) were 0.355, 0.157 and 0.265 mg/L for TAR, CR, and AB, respectively. Quantitative determination of the three azo dyes was performed following optimized adsorption experiments onto chitosan beads of mixtures of TAR, CR and AB. Adsorption isotherm and kinetic studies were carried out, proving that the proposed PLSR method is rapid, accurate and reliable.展开更多
In this work,multivariate detection limits(MDL)estimator was obtained based on the microelectro-mechanical systems–near infrared(MEMS–NIR)technology coupled with two sampling accessories to assess the detection capa...In this work,multivariate detection limits(MDL)estimator was obtained based on the microelectro-mechanical systems–near infrared(MEMS–NIR)technology coupled with two sampling accessories to assess the detection capability of four quality parameters(glycyrrhizic acid,liquiritin,liquiritigenin and isoliquiritin)in licorice from di®erent geographical regions.112 licorice samples were divided into two parts(calibration set and prediction set)using Kennard–Stone(KS)method.Four quality parameters were measured using high-performance liquid chromatography(HPLC)method according to Chinese pharmacopoeia and previous studies.The MEMS–NIR spectra were acquired from¯ber optic probe(FOP)and integrating sphere,then the partial least squares(PLS)model was obtained using the optimum processing method.Chemometrics indicators have been utilized to assess the PLS model performance.Model assessment using chemometrics indicators is based on relative mean prediction error of all concentration levels,which indicated relatively low sensitivity for low-content analytes(below 1000 parts per million(ppm)).Therefore,MDL estimator was introduced with alpha error and beta error based on good prediction characteristic of low concentration levels.The result suggested that MEMS–NIR technology coupled with fiber optic probe(FOP)and integrating sphere was able to detect minor analytes.The result further demonstrated that integrating sphere mode(i.e.,MDL0:05;0:05,0.22%)was more robust than FOP mode(i.e.,MDL0:05;0:05,0.48%).In conclusion,this research proposed that MDL method was helpful to determine the detection capabilities of low-content analytes using MEMS–NIR technology and successful to compare two sampling accessories.展开更多
目的建立同步检测畲药树参中紫丁香苷、绿原酸、芥子醛葡萄糖苷、松柏醇、芦丁、山柰酚-3-O-芸香糖苷、3,4-O-二咖啡酰基奎宁酸、3,5-O-二咖啡酰基奎宁酸和4,5-O-二咖啡酰基奎宁酸含量的高效液相色谱一测多评(HPLC-QAMS)方法,并采用多...目的建立同步检测畲药树参中紫丁香苷、绿原酸、芥子醛葡萄糖苷、松柏醇、芦丁、山柰酚-3-O-芸香糖苷、3,4-O-二咖啡酰基奎宁酸、3,5-O-二咖啡酰基奎宁酸和4,5-O-二咖啡酰基奎宁酸含量的高效液相色谱一测多评(HPLC-QAMS)方法,并采用多元统计分析及加权优劣解距离(technique for order preference by similarity to ideal solution method,TOPSIS)法对其品质进行综合评价。方法以Waters Xbridge C 18色谱柱;乙腈-0.05%甲酸溶液为流动相,梯度洗脱;检测波长260 nm。以山柰酚-3-O-芸香糖苷为参照物,建立内参物与其他8个待测成分的相对校正因子(relative correction factor,RCF),进行RCF耐用性考察及色谱峰定位,同时与外标法实测结果进行对比,验证HPLC-QAMS法准确性和可靠性。运用主成分分析(principal component analysis,PCA)、正交偏最小二乘法-判别分析(orthogonal partial least squares-discriminant analysis,OPLS-DA)等多元统计分析以及W-TOPSIS法对9个成分HPLC-QAMS法含量结果的相关性进行分析,挖掘影响畲药树参产品质量的主要潜在标志物,建立畲药树参综合质量优劣评价方法。结果9种成分分别在3.27~81.75μg/mL、9.85~246.25μg/mL、0.43~0.75μg/mL、0.31~7.75μg/mL、1.58~39.50μg/mL、0.59~14.75μg/mL、1.26~31.50μg/mL、4.55~113.75μg/mL和1.98~49.50μg/mL范围内线性关系良好,平均加样回收率96.82%~100.07%(RSD<2.0%);HPLC-QAMS和外标法(ESM)含量测定结果差异无统计学意义(P>0.05),HPLC-QAMS法可用于畲药树参多组分定量控制;多元统计分析结果显示,前2个主成分累计方差贡献率89.589%,绿原酸、紫丁香苷、3,5-O-二咖啡酰基奎宁酸和4,5-O-二咖啡酰基奎宁酸是影响畲药树参产品质量的主要潜在标志物;加权TOPSIS法结果显示浙江地区所得畲药树参质量最优,其次为江西、安徽、湖南和湖北产树参,云南和贵州产树参位于排名后4位。结论所建立的HPLC-QAMS多组分定量控制方法,操作便捷、结果准确;多元统计分析联合加权TOPSIS法全面客观,可用于畲药树参品质的综合评价。展开更多
Machine learning has been widely used for solving partial differential equations(PDEs)in recent years,among which the random feature method(RFM)exhibits spectral accuracy and can compete with traditional solvers in te...Machine learning has been widely used for solving partial differential equations(PDEs)in recent years,among which the random feature method(RFM)exhibits spectral accuracy and can compete with traditional solvers in terms of both accuracy and efficiency.Potentially,the optimization problem in the RFM is more difficult to solve than those that arise in traditional methods.Unlike the broader machine-learning research,which frequently targets tasks within the low-precision regime,our study focuses on the high-precision regime crucial for solving PDEs.In this work,we study this problem from the following aspects:(i)we analyze the coeffcient matrix that arises in the RFM by studying the distribution of singular values;(ii)we investigate whether the continuous training causes the overfitting issue;(ii)we test direct and iterative methods as well as randomized methods for solving the optimization problem.Based on these results,we find that direct methods are superior to other methods if memory is not an issue,while iterative methods typically have low accuracy and can be improved by preconditioning to some extent.展开更多
To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to...To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to the pH value and levels of Ca2+,NH4+,Na+,K+,Mg2+,SO42-,NO3-,and Cl-in acid rain. We selected vegetables which were sensitive to acid rain as the sample crops,and collected 12 groups of data,of which 8 groups were used for modeling and 4 groups for testing. Using the cross validation method to evaluate the performace of this prediction model indicates that the optimum number of principal components was 3,determined by the minimum of prediction residual error sum of squares,and the prediction error of the regression equation ranges from -2.25% to 4.32%. The model predicted that the economic loss of vegetables from acid rain is negatively corrrelated to pH and the concentrations of NH4+,SO42-,NO3-,and Cl-in the rain,and positively correlated to the concentrations of Ca2+,Na+,K+ and Mg2+. The precision of the model may be improved if the non-linearity of original data is addressed.展开更多
Boosting algorithms are a class of general methods used to improve the general periormance of regression analysis. The main idea is to maintain a distribution over the train set. In order to use the given distribution...Boosting algorithms are a class of general methods used to improve the general periormance of regression analysis. The main idea is to maintain a distribution over the train set. In order to use the given distribution directly, a modified PLS algorithm is proposed and used as the base learner to deal with the nonlinear multivariate regression problems. Experiments on gasoline octane number prediction demonstrate that boosting the modified PLS algorithm has better general performance over the PLS algorithm.展开更多
Breast cancer is one of the malignant tumors having high incidence in women,the incidence of breast cancer has increased in all parts of the world since twentieth century,but its etiology is not yet completely clear,s...Breast cancer is one of the malignant tumors having high incidence in women,the incidence of breast cancer has increased in all parts of the world since twentieth century,but its etiology is not yet completely clear,so it is very important to detect breast cells.In this paper,we built a regression model to detect breast cells,and generated a method for predicting the formation of benign and malignant breast cells by training the model,then we used the 10 features of breast cells to predict it,the results reaching upto 93.67%accuracy,it was very effective to predict and analyse whether the breast cells getting cancer,It had an important role in the diagnosis and prevention of breast cancer.展开更多
基金supported by grants from the National Program on the Development of Basic Research (2011CB100100)the Priority Academic Program Development of Jiangsu Higher Education Institutions, the National Natural Science Foundations (31391632, 31200943, 31171187, and 91535103)+3 种基金the National High-tech R&D Program (863 Program) (2014AA10A601-5)the Natural Science Foundations of Jiangsu Province (BK20150010)the Natural Science Foundation of the Jiangsu Higher Education Institutions (14KJA210005)the Innovative Research Team of Universities in Jiangsu Province (KYLX_1352)
文摘Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more accurate estimation. To develop a statistical approach to joint association analysis that includes allele detection and genetic effect estimation, we combined multivariate partial least squares regression with variable selection strategies and selected the optimal model using the Bayesian Information Criterion(BIC). We then performed extensive simulations under varying heritabilities and sample sizes to compare the performance achieved using our method with those obtained by single-trait multilocus methods. Joint association analysis has measurable advantages over single-trait methods, as it exhibits superior gene detection power, especially for pleiotropic genes. Sample size, heritability,polymorphic information content(PIC), and magnitude of gene effects influence the statistical power, accuracy and precision of effect estimation by the joint association analysis.
基金National Natural Science Foundation of China No.40301038
文摘In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.
基金Supported by the Anhui Provincial Natural Science Foundation(11040606M04) Supported by the National Natural Science Foundation of China(10871001,10971097)
文摘The paper considers a multivariate partially linear model under independent errors,and investigates the asymptotic bias and variance-covariance for parametric component βand nonparametric component F(·)by the GJS estimator and Kernel estimation.
文摘The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by traditional spectrophotometric methods.In this paper,the partial least-squares(PLS)regression is applied to the simultaneous determination of these compounds in mixtures by UV spectrophtometry without any pretreatment of the samples.Ten synthetic mixture samples are analyzed by the proposed method.The mean recoveries are 99.4%,996%,100.2%,99.3% and 99.1%,and the relative standard deviations(RSD) are 1.87%,1.98%,1.94%,0.960% and 0.672%,respectively.
基金the Hi-Tech Research and Development Program (863) of China (No. 2006AA10Z203)the National Scienceand Technology Task Force Project (No. 2006BAD10A01), China
文摘Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2 500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respec-tively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demon-strates that it is feasible to estimate the disease severity of rice brown spot using hyperspectral reflectance data at the leaf level.
基金supported by the Forest Scientific Research in the Public Interest,China(201404720)the earmarked fund for the China Agriculture Research System(CARS-27)the Beijing Municipal Education Commission,China(CEFF-PXM2017_014207_000043)
文摘China has the largest apple planting area and total yield in the world, and the Fuji apple is the major cultivar, accounting for more than 70% of apple planting acreage in China. Apple qualities are affected by meteorological conditions, soil types, nutrient content of soil, and management practices. Meteorological factors, such as light, temperature and moisture are key environmental conditions affecting apple quality that are difficult to regulate and control. This study was performed to determine the effect of meteorological factors on the qualities of Fuji apple and to provide evidence for a reasonable regional layout and planting of Fuji apple in China. Fruit samples of Fuji apple and meteorological data were investigated from 153 commercial Fuji apple orchards located in 51 counties of 11 regions in China from 2010 to 2011. Partial least-squares regression and linear programming were used to analyze the effect model and impact weight of meteorological factors on fruit quality, to determine the major meteorological factors influencing fruit quality attributes, and to establish a regression equation to optimize meteorological factors for high-quality Fuji apples. Results showed relationships between fruit quality attributes and meteorological factors among the various apple producing counties in China. The mean, minimum, and maximum temperatures from April to October had the highest positive effects on fruit qualities in model effect loadings and weights, followed by the mean annual temperature and the sunshine percentage, the temperature difference between day and night, and the total precipitation for the same period. In contrast, annual total precipitation and relative humidity from April to October had negative effects on fruit quality. The meteorological factors exhibited distinct effects on the different fruit quality attributes. Soluble solid content was affected from the high to the low row preface by annual total precipitation, the minimum temperature from April to October, the mean temperature from April to October, the temperature difference between day and night, and the mean annual temperature. The regression equation showed that the optimum meteorological factors on fruit quality were the mean annual temperature of 5.5-18°C and the annual total precipitation of 602-1121 mm for the whole year, and the mean temperature of 13.3-19.6°C, the minimum temperature of 7.8-18.5°C, the maximum temperature of 19.5°C, the temperature difference of 13.7°C between day and night, the total precipitation of 227 mm, the relative humidity of 57.5-84.0%, and the sunshine percentage of 36.5-70.0% during the growing period (from April to October).
基金Supported by the National High-Tech Development Program of China(No.863-511-920-011,2001AA411230).
文摘Multivariate statistical process monitoring and control (MSPM&C) methods for chemical process monitoring with statistical projection techniques such as principal component analysis (PCA) and partial least squares (PLS) are surveyed in this paper. The four-step procedure of performing MSPM&C for chemical process, modeling of processes, detecting abnormal events or faults, identifying the variable(s) responsible for the faults and diagnosing the source cause for the abnormal behavior, is analyzed. Several main research directions of MSPM&C reported in the literature are discussed, such as multi-way principal component analysis (MPCA) for batch process, statistical monitoring and control for nonlinear process, dynamic PCA and dynamic PLS, and on-line quality control by inferential models. Industrial applications of MSPM&C to several typical chemical processes, such as chemical reactor, distillation column, polymerization process, petroleum refinery units, are summarized. Finally, some concluding remarks and future considerations are made.
基金supported by the National Natural Science Foundation of China(No.92152301).
文摘Data-driven partial differential equation identification is a potential breakthrough to solve the lack of physical equations in complex dynamic systems.However,existing equation identification methods still cannot effectively identify equations from multivariable complex systems.In this work,we combine physical constraints such as dimension and direction of equation with data-driven method,and successfully identify the Navier-Stocks equations from the flow field data of Karman vortex street.This method provides an effective approach to identify partial differential equations of multivariable complex systems.
文摘A simple and rapid analytical method for the simultaneous quantification of three commercial azo dyes—Tartrazine (TAR), Congo Red (CR), and Amido Black (AB) in water is presented. The simultaneous assessment of the individual concentration of an organic dye in mixtures using a spectrophotometric method is a difficult procedure in analytical chemistry, due to spectral overlapping. This drawback can be overcome if a multivariate calibration method such as Partial Least Squares Regression (PLSR) is used. This study presents a calibration model based on absorption spectra in the 300 - 650 nm range for a set of 20 different mixtures of dyes, followed by the prediction of the concentrations of dyes in 6 validation mixtures, randomly selected, using the PLSR method. Estimated limits of detection (LOD) were 0.106, 0.047 and 0.079 mg/L for TAR, CR, and AB, respectively, and limits of quantification (LOQ) were 0.355, 0.157 and 0.265 mg/L for TAR, CR, and AB, respectively. Quantitative determination of the three azo dyes was performed following optimized adsorption experiments onto chitosan beads of mixtures of TAR, CR and AB. Adsorption isotherm and kinetic studies were carried out, proving that the proposed PLSR method is rapid, accurate and reliable.
基金This work was financially supported fromthe National Natural Science Foundation of China(81303218)Doctoral Fund of China (20130013120006)Special Fund of Outstanding Young Teachers and Innovation Team.
文摘In this work,multivariate detection limits(MDL)estimator was obtained based on the microelectro-mechanical systems–near infrared(MEMS–NIR)technology coupled with two sampling accessories to assess the detection capability of four quality parameters(glycyrrhizic acid,liquiritin,liquiritigenin and isoliquiritin)in licorice from di®erent geographical regions.112 licorice samples were divided into two parts(calibration set and prediction set)using Kennard–Stone(KS)method.Four quality parameters were measured using high-performance liquid chromatography(HPLC)method according to Chinese pharmacopoeia and previous studies.The MEMS–NIR spectra were acquired from¯ber optic probe(FOP)and integrating sphere,then the partial least squares(PLS)model was obtained using the optimum processing method.Chemometrics indicators have been utilized to assess the PLS model performance.Model assessment using chemometrics indicators is based on relative mean prediction error of all concentration levels,which indicated relatively low sensitivity for low-content analytes(below 1000 parts per million(ppm)).Therefore,MDL estimator was introduced with alpha error and beta error based on good prediction characteristic of low concentration levels.The result suggested that MEMS–NIR technology coupled with fiber optic probe(FOP)and integrating sphere was able to detect minor analytes.The result further demonstrated that integrating sphere mode(i.e.,MDL0:05;0:05,0.22%)was more robust than FOP mode(i.e.,MDL0:05;0:05,0.48%).In conclusion,this research proposed that MDL method was helpful to determine the detection capabilities of low-content analytes using MEMS–NIR technology and successful to compare two sampling accessories.
文摘目的建立同步检测畲药树参中紫丁香苷、绿原酸、芥子醛葡萄糖苷、松柏醇、芦丁、山柰酚-3-O-芸香糖苷、3,4-O-二咖啡酰基奎宁酸、3,5-O-二咖啡酰基奎宁酸和4,5-O-二咖啡酰基奎宁酸含量的高效液相色谱一测多评(HPLC-QAMS)方法,并采用多元统计分析及加权优劣解距离(technique for order preference by similarity to ideal solution method,TOPSIS)法对其品质进行综合评价。方法以Waters Xbridge C 18色谱柱;乙腈-0.05%甲酸溶液为流动相,梯度洗脱;检测波长260 nm。以山柰酚-3-O-芸香糖苷为参照物,建立内参物与其他8个待测成分的相对校正因子(relative correction factor,RCF),进行RCF耐用性考察及色谱峰定位,同时与外标法实测结果进行对比,验证HPLC-QAMS法准确性和可靠性。运用主成分分析(principal component analysis,PCA)、正交偏最小二乘法-判别分析(orthogonal partial least squares-discriminant analysis,OPLS-DA)等多元统计分析以及W-TOPSIS法对9个成分HPLC-QAMS法含量结果的相关性进行分析,挖掘影响畲药树参产品质量的主要潜在标志物,建立畲药树参综合质量优劣评价方法。结果9种成分分别在3.27~81.75μg/mL、9.85~246.25μg/mL、0.43~0.75μg/mL、0.31~7.75μg/mL、1.58~39.50μg/mL、0.59~14.75μg/mL、1.26~31.50μg/mL、4.55~113.75μg/mL和1.98~49.50μg/mL范围内线性关系良好,平均加样回收率96.82%~100.07%(RSD<2.0%);HPLC-QAMS和外标法(ESM)含量测定结果差异无统计学意义(P>0.05),HPLC-QAMS法可用于畲药树参多组分定量控制;多元统计分析结果显示,前2个主成分累计方差贡献率89.589%,绿原酸、紫丁香苷、3,5-O-二咖啡酰基奎宁酸和4,5-O-二咖啡酰基奎宁酸是影响畲药树参产品质量的主要潜在标志物;加权TOPSIS法结果显示浙江地区所得畲药树参质量最优,其次为江西、安徽、湖南和湖北产树参,云南和贵州产树参位于排名后4位。结论所建立的HPLC-QAMS多组分定量控制方法,操作便捷、结果准确;多元统计分析联合加权TOPSIS法全面客观,可用于畲药树参品质的综合评价。
基金supported by the NSFC Major Research Plan--Interpretable and Generalpurpose Next-generation Artificial Intelligence(No.92370205).
文摘Machine learning has been widely used for solving partial differential equations(PDEs)in recent years,among which the random feature method(RFM)exhibits spectral accuracy and can compete with traditional solvers in terms of both accuracy and efficiency.Potentially,the optimization problem in the RFM is more difficult to solve than those that arise in traditional methods.Unlike the broader machine-learning research,which frequently targets tasks within the low-precision regime,our study focuses on the high-precision regime crucial for solving PDEs.In this work,we study this problem from the following aspects:(i)we analyze the coeffcient matrix that arises in the RFM by studying the distribution of singular values;(ii)we investigate whether the continuous training causes the overfitting issue;(ii)we test direct and iterative methods as well as randomized methods for solving the optimization problem.Based on these results,we find that direct methods are superior to other methods if memory is not an issue,while iterative methods typically have low accuracy and can be improved by preconditioning to some extent.
基金Funded by the Natural Basic Research Program of China under the grant No. 2005CB422207.
文摘To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to the pH value and levels of Ca2+,NH4+,Na+,K+,Mg2+,SO42-,NO3-,and Cl-in acid rain. We selected vegetables which were sensitive to acid rain as the sample crops,and collected 12 groups of data,of which 8 groups were used for modeling and 4 groups for testing. Using the cross validation method to evaluate the performace of this prediction model indicates that the optimum number of principal components was 3,determined by the minimum of prediction residual error sum of squares,and the prediction error of the regression equation ranges from -2.25% to 4.32%. The model predicted that the economic loss of vegetables from acid rain is negatively corrrelated to pH and the concentrations of NH4+,SO42-,NO3-,and Cl-in the rain,and positively correlated to the concentrations of Ca2+,Na+,K+ and Mg2+. The precision of the model may be improved if the non-linearity of original data is addressed.
基金This work was supported by the National High-tech Research and Development Program of China (No. 2003AA412110).
文摘Boosting algorithms are a class of general methods used to improve the general periormance of regression analysis. The main idea is to maintain a distribution over the train set. In order to use the given distribution directly, a modified PLS algorithm is proposed and used as the base learner to deal with the nonlinear multivariate regression problems. Experiments on gasoline octane number prediction demonstrate that boosting the modified PLS algorithm has better general performance over the PLS algorithm.
文摘Breast cancer is one of the malignant tumors having high incidence in women,the incidence of breast cancer has increased in all parts of the world since twentieth century,but its etiology is not yet completely clear,so it is very important to detect breast cells.In this paper,we built a regression model to detect breast cells,and generated a method for predicting the formation of benign and malignant breast cells by training the model,then we used the 10 features of breast cells to predict it,the results reaching upto 93.67%accuracy,it was very effective to predict and analyse whether the breast cells getting cancer,It had an important role in the diagnosis and prevention of breast cancer.