Based on the least-square minimization a computationally efficient learning algorithm for the Principal Component Analysis(PCA) is derived. The dual learning rate parameters are adaptively introduced to make the propo...Based on the least-square minimization a computationally efficient learning algorithm for the Principal Component Analysis(PCA) is derived. The dual learning rate parameters are adaptively introduced to make the proposed algorithm providing the capability of the fast convergence and high accuracy for extracting all the principal components. It is shown that all the information needed for PCA can be completely represented by the unnormalized weight vector which is updated based only on the corresponding neuron input-output product. The convergence performance of the proposed algorithm is briefly analyzed.The relation between Oja’s rule and the least squares learning rule is also established. Finally, a simulation example is given to illustrate the effectiveness of this algorithm for PCA.展开更多
With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistica...With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.展开更多
With the east section of the Changji sag Zhunger Basin as a case study, both a principal curvature method and a moving least square method are elaborated. The moving least square method is introduced, for the first ti...With the east section of the Changji sag Zhunger Basin as a case study, both a principal curvature method and a moving least square method are elaborated. The moving least square method is introduced, for the first time, to fit a stratum surface. The results show that, using the same-degree base function, compared with a traditional least square method, the moving least square method can produce lower fitting errors, the fitting surface can describe the morphological characteristics of stratum surfaces more accurately and the principal curvature values vary within a wide range and may be more suitable for the prediction of the distribution of structural fractures. The moving least square method could be useful in curved surface fitting and stratum curvature analysis.展开更多
Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of hea...Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2 500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respec-tively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demon-strates that it is feasible to estimate the disease severity of rice brown spot using hyperspectral reflectance data at the leaf level.展开更多
The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for t...The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for the liquor brands with the same flavor and the same alcohol content is essential. However, it is also difficult because the components of such liquor samples are very similar. Near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was applied to identification of liquor brands with the same flavor and alcohol content. A total of 160 samples of Luzhou Laojiao liquor and 200 samples of non-Luzhou Laojiao liquor with the same flavor and alcohol content were used for identification. Samples of each type were randomly divided into the modeling and validation sets. The modeling samples were further divided into calibration and prediction sets using the Kennard-Stone algorithm to achieve uniformity and representativeness. In the modeling and validation processes based on PLS-DA method, the recognition rates of samples achieved 99.1% and 98.7%, respectively. The results show high prediction performance for the identification of liquor brands, and were obviously better than those obtained from the principal component linear discriminant analysis method. NIR spectroscopy combined with the PLS-DA method provides a quick and effective means of the discriminant analysis of liquor brands, and is also a promising tool for large-scale inspection of liquor food safety.展开更多
In this paper, we present continuous iteratively reweighted least squares algorithm (CIRLS) for solving the linear models problem by convex relaxation, and prove the convergence of this algorithm. Under some condition...In this paper, we present continuous iteratively reweighted least squares algorithm (CIRLS) for solving the linear models problem by convex relaxation, and prove the convergence of this algorithm. Under some conditions, we give an error bound for the algorithm. In addition, the numerical result shows the efficiency of the algorithm.展开更多
Statistical downscaling (SD) analyzes relationship between local-scale response and global-scale predictors. The SD model can be used to forecast rainfall (local-scale) using global-scale precipitation from global cir...Statistical downscaling (SD) analyzes relationship between local-scale response and global-scale predictors. The SD model can be used to forecast rainfall (local-scale) using global-scale precipitation from global circulation model output (GCM). The objectives of this research were to determine the time lag of GCM data and build SD model using PCR method with time lag of the GCM precipitation data. The observations of rainfall data in Indramayu were taken from 1979 to 2007 showing similar patterns with GCM data on 1st grid to 64th grid after time shift (time lag). The time lag was determined using the cross-correlation function. However, GCM data of 64 grids showed multicollinearity problem. This problem was solved by principal component regression (PCR), but the PCR model resulted heterogeneous errors. PCR model was modified to overcome the errors with adding dummy variables to the model. Dummy variables were determined based on partial least squares regression (PLSR). The PCR model with dummy variables improved the rainfall prediction. The SD model with lag-GCM predictors was also better than SD model without lag-GCM.展开更多
The neural network partial least square (NNPLS) method was used to establish a robust reaction model for a multi-component catalyst of methane oxidative coupling. The details, including the learning algorithm, the num...The neural network partial least square (NNPLS) method was used to establish a robust reaction model for a multi-component catalyst of methane oxidative coupling. The details, including the learning algorithm, the number of hidden units of the inner network, activation function, initialization of the network weights and the principal components, are discussed. The results show that the structural organizations of inner neural network are 1-10-5-1, 1-8-4-1, 1-8-5-1, 1-7-4-1, 1-8-4-1, 1-8-6-1, respectively. The Levenberg-Marquardt method was used in the learning algorithm, and the central sigmoidal function is the activation function. Calculation results show that four principal components are convenient in the use of the multi-component catalyst modeling of methane oxidative coupling. Therefore a robust reaction model expressed by NNPLS succeeds in correlating the relations between elements in catalyst and catalytic reaction results. Compared with the direct network modeling, NNPLS model can be adjusted by experimental data and the calculation of the model is simpler and faster than that of the direct network model.展开更多
Investigation of genetic diversity of geographically distant wheat genotypes is </span><span style="font-family:Verdana;">a </span><span style="font-family:Verdana;">useful ...Investigation of genetic diversity of geographically distant wheat genotypes is </span><span style="font-family:Verdana;">a </span><span style="font-family:Verdana;">useful approach in wheat breeding providing efficient crop varieties. This article presents multivariate cluster and principal component analyses (PCA) of some yield traits of wheat, such as thousand-kernel weight (TKW), grain number, grain yield and plant height. Based on the results, an evaluation of economically valuable attributes by eigenvalues made it possible to determine the components that significantly contribute to the yield of common wheat genotypes. Twenty-five genotypes were grouped into four clusters on the basis of average linkage. The PCA showed four principal components (PC) with eigenvalues ></span><span style="font-family:""> </span><span style="font-family:Verdana;">1, explaining approximately 90.8% of the total variability. According to PC analysis, the variance in the eigenvalues was </span><span style="font-family:Verdana;">the </span><span style="font-family:Verdana;">greatest (4.33) for PC-1, PC-2 (1.86) and PC-3 (1.01). The cluster analysis revealed the classification of 25 accessions into four diverse groups. Averages, standard deviations and variances for clusters based on morpho-physiological traits showed that the maximum average values for grain yield (742.2), biomass (1756.7), grains square meter (18</span><span style="font-family:Verdana;">,</span><span style="font-family:Verdana;">373.7), and grains per spike (45.3) were higher in cluster C compared to other clusters. Cluster D exhibited the maximum thousand-kernel weight (TKW) (46.6).展开更多
In this note, the author find an upper bound formula for the number of the p × p normalized Latin Square,the first row and column of which are both standard order 1, 2,…p.
目的比较江西特色炮制技术对升麻化学成分的影响,筛选优质饮片品种。方法采用超高效液相色谱-四极杆-飞行时间串联质谱(ultra performance liquid chromatography-quadrupole-time of flight tandem mass spectrometry,UPLC-Q-TOF-MS)技...目的比较江西特色炮制技术对升麻化学成分的影响,筛选优质饮片品种。方法采用超高效液相色谱-四极杆-飞行时间串联质谱(ultra performance liquid chromatography-quadrupole-time of flight tandem mass spectrometry,UPLC-Q-TOF-MS)技术,在正、负离子模式下分析升麻不同炮制品的化学成分,通过对照品、相对分子质量、质谱裂解规律和文献信息进行鉴定。利用SIMCA-P13.0软件建立升麻各炮制品主成分分析(principal component analysis,PCA)和偏最小二乘法-判别分析(partial least squares discriminant analysis,PLS-DA)模型,获取PCA得分图、PLA-DA得分图和变量重要性投影(variable importance plot,VIP)值,筛选造成升麻炮制前后主要差异的物质基础。利用MetaboAnatyst网页绘图工具,制作得到热图,可更直观地观察升麻化学成分经炮制后的变化趋势。结果鉴定出71个化学成分,PCA显示经不同方法炮制后升麻组间差异性大,PLS-DA筛选出VIP值>1的33个化学成分作为炮制前后差异性的主要化学标记物。其中生品和蜜炙升麻中三萜类含量较高,蜜麸、蜜糠炒升麻中酚酸类物质含量较高,蜜麸升麻中阿魏酸含量较高。结论酚酸类和三萜皂苷类是区分升麻不同炮制品最重要的化合物类别,为江西特色升麻饮片的药效物质基础及优势品种研究提供了依据。展开更多
介绍了激光诱导击穿光谱(laser induced breakdown spectroscopy,LIBS)技术、主元分析(principal component an alysis,PCA)法和偏最小二乘(partial least squares,PLS)法的基本原理。对Pb元素特征谱线附近的36个维度进行主成分信息提取...介绍了激光诱导击穿光谱(laser induced breakdown spectroscopy,LIBS)技术、主元分析(principal component an alysis,PCA)法和偏最小二乘(partial least squares,PLS)法的基本原理。对Pb元素特征谱线附近的36个维度进行主成分信息提取,对36维波长数据压缩到2维后,采用每个样品的20个脉冲的主元分数进行偏最小二乘拟合,对数据进行平均处理后,拟合结果质量较高,拟合系数平方的值从0.49810提高到0.97000;残差平方和从0.72529下降到1.36366*10^(-4)。PCA法可以有效的缩减具有一定相关性的样本数据空间,对于数据维度较大的数据处理能显著提升效率,再结合PLS法拟合压缩后的主元,实验结论得出PLS适合用于LIBS定量分析。展开更多
基金Supported by the National Natural Science Foundation of Chinathe Science foundation of Guangxi Educational Administration
文摘Based on the least-square minimization a computationally efficient learning algorithm for the Principal Component Analysis(PCA) is derived. The dual learning rate parameters are adaptively introduced to make the proposed algorithm providing the capability of the fast convergence and high accuracy for extracting all the principal components. It is shown that all the information needed for PCA can be completely represented by the unnormalized weight vector which is updated based only on the corresponding neuron input-output product. The convergence performance of the proposed algorithm is briefly analyzed.The relation between Oja’s rule and the least squares learning rule is also established. Finally, a simulation example is given to illustrate the effectiveness of this algorithm for PCA.
基金founded by the National Natural Science Foundation of China(81202283,81473070,81373102 and81202267)Key Grant of Natural Science Foundation of the Jiangsu Higher Education Institutions of China(10KJA330034 and11KJA330001)+1 种基金the Research Fund for the Doctoral Program of Higher Education of China(20113234110002)the Priority Academic Program for the Development of Jiangsu Higher Education Institutions(Public Health and Preventive Medicine)
文摘With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.
基金Projects 2007CB209405 and 2002CB412702 supported by the National Basic Research Program of ChinaKZCX2-YW-113 by the Important Directive Item of the Knowledge Innovation Project of Chinese Academy of Sciences 40772100 by the National Natural Science Foundation of China
文摘With the east section of the Changji sag Zhunger Basin as a case study, both a principal curvature method and a moving least square method are elaborated. The moving least square method is introduced, for the first time, to fit a stratum surface. The results show that, using the same-degree base function, compared with a traditional least square method, the moving least square method can produce lower fitting errors, the fitting surface can describe the morphological characteristics of stratum surfaces more accurately and the principal curvature values vary within a wide range and may be more suitable for the prediction of the distribution of structural fractures. The moving least square method could be useful in curved surface fitting and stratum curvature analysis.
基金the Hi-Tech Research and Development Program (863) of China (No. 2006AA10Z203)the National Scienceand Technology Task Force Project (No. 2006BAD10A01), China
文摘Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2 500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respec-tively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demon-strates that it is feasible to estimate the disease severity of rice brown spot using hyperspectral reflectance data at the leaf level.
文摘The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for the liquor brands with the same flavor and the same alcohol content is essential. However, it is also difficult because the components of such liquor samples are very similar. Near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was applied to identification of liquor brands with the same flavor and alcohol content. A total of 160 samples of Luzhou Laojiao liquor and 200 samples of non-Luzhou Laojiao liquor with the same flavor and alcohol content were used for identification. Samples of each type were randomly divided into the modeling and validation sets. The modeling samples were further divided into calibration and prediction sets using the Kennard-Stone algorithm to achieve uniformity and representativeness. In the modeling and validation processes based on PLS-DA method, the recognition rates of samples achieved 99.1% and 98.7%, respectively. The results show high prediction performance for the identification of liquor brands, and were obviously better than those obtained from the principal component linear discriminant analysis method. NIR spectroscopy combined with the PLS-DA method provides a quick and effective means of the discriminant analysis of liquor brands, and is also a promising tool for large-scale inspection of liquor food safety.
文摘In this paper, we present continuous iteratively reweighted least squares algorithm (CIRLS) for solving the linear models problem by convex relaxation, and prove the convergence of this algorithm. Under some conditions, we give an error bound for the algorithm. In addition, the numerical result shows the efficiency of the algorithm.
文摘Statistical downscaling (SD) analyzes relationship between local-scale response and global-scale predictors. The SD model can be used to forecast rainfall (local-scale) using global-scale precipitation from global circulation model output (GCM). The objectives of this research were to determine the time lag of GCM data and build SD model using PCR method with time lag of the GCM precipitation data. The observations of rainfall data in Indramayu were taken from 1979 to 2007 showing similar patterns with GCM data on 1st grid to 64th grid after time shift (time lag). The time lag was determined using the cross-correlation function. However, GCM data of 64 grids showed multicollinearity problem. This problem was solved by principal component regression (PCR), but the PCR model resulted heterogeneous errors. PCR model was modified to overcome the errors with adding dummy variables to the model. Dummy variables were determined based on partial least squares regression (PLSR). The PCR model with dummy variables improved the rainfall prediction. The SD model with lag-GCM predictors was also better than SD model without lag-GCM.
文摘The neural network partial least square (NNPLS) method was used to establish a robust reaction model for a multi-component catalyst of methane oxidative coupling. The details, including the learning algorithm, the number of hidden units of the inner network, activation function, initialization of the network weights and the principal components, are discussed. The results show that the structural organizations of inner neural network are 1-10-5-1, 1-8-4-1, 1-8-5-1, 1-7-4-1, 1-8-4-1, 1-8-6-1, respectively. The Levenberg-Marquardt method was used in the learning algorithm, and the central sigmoidal function is the activation function. Calculation results show that four principal components are convenient in the use of the multi-component catalyst modeling of methane oxidative coupling. Therefore a robust reaction model expressed by NNPLS succeeds in correlating the relations between elements in catalyst and catalytic reaction results. Compared with the direct network modeling, NNPLS model can be adjusted by experimental data and the calculation of the model is simpler and faster than that of the direct network model.
文摘Investigation of genetic diversity of geographically distant wheat genotypes is </span><span style="font-family:Verdana;">a </span><span style="font-family:Verdana;">useful approach in wheat breeding providing efficient crop varieties. This article presents multivariate cluster and principal component analyses (PCA) of some yield traits of wheat, such as thousand-kernel weight (TKW), grain number, grain yield and plant height. Based on the results, an evaluation of economically valuable attributes by eigenvalues made it possible to determine the components that significantly contribute to the yield of common wheat genotypes. Twenty-five genotypes were grouped into four clusters on the basis of average linkage. The PCA showed four principal components (PC) with eigenvalues ></span><span style="font-family:""> </span><span style="font-family:Verdana;">1, explaining approximately 90.8% of the total variability. According to PC analysis, the variance in the eigenvalues was </span><span style="font-family:Verdana;">the </span><span style="font-family:Verdana;">greatest (4.33) for PC-1, PC-2 (1.86) and PC-3 (1.01). The cluster analysis revealed the classification of 25 accessions into four diverse groups. Averages, standard deviations and variances for clusters based on morpho-physiological traits showed that the maximum average values for grain yield (742.2), biomass (1756.7), grains square meter (18</span><span style="font-family:Verdana;">,</span><span style="font-family:Verdana;">373.7), and grains per spike (45.3) were higher in cluster C compared to other clusters. Cluster D exhibited the maximum thousand-kernel weight (TKW) (46.6).
文摘In this note, the author find an upper bound formula for the number of the p × p normalized Latin Square,the first row and column of which are both standard order 1, 2,…p.
文摘目的比较江西特色炮制技术对升麻化学成分的影响,筛选优质饮片品种。方法采用超高效液相色谱-四极杆-飞行时间串联质谱(ultra performance liquid chromatography-quadrupole-time of flight tandem mass spectrometry,UPLC-Q-TOF-MS)技术,在正、负离子模式下分析升麻不同炮制品的化学成分,通过对照品、相对分子质量、质谱裂解规律和文献信息进行鉴定。利用SIMCA-P13.0软件建立升麻各炮制品主成分分析(principal component analysis,PCA)和偏最小二乘法-判别分析(partial least squares discriminant analysis,PLS-DA)模型,获取PCA得分图、PLA-DA得分图和变量重要性投影(variable importance plot,VIP)值,筛选造成升麻炮制前后主要差异的物质基础。利用MetaboAnatyst网页绘图工具,制作得到热图,可更直观地观察升麻化学成分经炮制后的变化趋势。结果鉴定出71个化学成分,PCA显示经不同方法炮制后升麻组间差异性大,PLS-DA筛选出VIP值>1的33个化学成分作为炮制前后差异性的主要化学标记物。其中生品和蜜炙升麻中三萜类含量较高,蜜麸、蜜糠炒升麻中酚酸类物质含量较高,蜜麸升麻中阿魏酸含量较高。结论酚酸类和三萜皂苷类是区分升麻不同炮制品最重要的化合物类别,为江西特色升麻饮片的药效物质基础及优势品种研究提供了依据。
文摘介绍了激光诱导击穿光谱(laser induced breakdown spectroscopy,LIBS)技术、主元分析(principal component an alysis,PCA)法和偏最小二乘(partial least squares,PLS)法的基本原理。对Pb元素特征谱线附近的36个维度进行主成分信息提取,对36维波长数据压缩到2维后,采用每个样品的20个脉冲的主元分数进行偏最小二乘拟合,对数据进行平均处理后,拟合结果质量较高,拟合系数平方的值从0.49810提高到0.97000;残差平方和从0.72529下降到1.36366*10^(-4)。PCA法可以有效的缩减具有一定相关性的样本数据空间,对于数据维度较大的数据处理能显著提升效率,再结合PLS法拟合压缩后的主元,实验结论得出PLS适合用于LIBS定量分析。