The evaluation model was established to estimate the number of houses collapsed during typhoon disaster for Zhejiang Province.The factor leading to disaster,the environment fostering disaster and the exposure of build...The evaluation model was established to estimate the number of houses collapsed during typhoon disaster for Zhejiang Province.The factor leading to disaster,the environment fostering disaster and the exposure of buildings were processed by Principal Component Analysis.The key factor was extracted to support input of vector machine model and to build an evaluation model;the historical fitting result kept in line with the fact.In the real evaluation of two typhoons landed in Zhejiang Province in 2008 and 2009,the coincidence of evaluating result and actual value proved the feasibility of this model.展开更多
Financial time series forecasting could be beneficial for individual as well as institutional investors. But, the high noise and complexity residing in the financial data make this job extremely challenging. Over the ...Financial time series forecasting could be beneficial for individual as well as institutional investors. But, the high noise and complexity residing in the financial data make this job extremely challenging. Over the years, many researchers have used support vector regression (SVR) quite successfully to conquer this challenge. In this paper, an SVR based forecasting model is proposed which first uses the principal component analysis (PCA) to extract the low-dimensional and efficient feature information, and then uses the independent component analysis (ICA) to preprocess the extracted features to nullify the influence of noise in the features. Experiments were carried out based on 16 years’ historical data of three prominent stocks from three different sectors listed in Dhaka Stock Exchange (DSE), Bangladesh. The predictions were made for 1 to 4 days in advance targeting the short term prediction. For comparison, the integration of PCA with SVR (PCA-SVR), ICA with SVR (ICA-SVR) and single SVR approaches were applied to evaluate the prediction accuracy of the proposed approach. Experimental results show that the proposed model (PCA-ICA-SVR) outperforms the PCA-SVR, ICA-SVR and single SVR methods.展开更多
The selection of important factors in machine learning-based susceptibility assessments is crucial to obtain reliable susceptibility results.In this study,metaheuristic optimization and feature selection techniques we...The selection of important factors in machine learning-based susceptibility assessments is crucial to obtain reliable susceptibility results.In this study,metaheuristic optimization and feature selection techniques were applied to identify the most important input parameters for mapping debris flow susceptibility in the southern mountain area of Chengde City in Hebei Province,China,by using machine learning algorithms.In total,133 historical debris flow records and 16 related factors were selected.The support vector machine(SVM)was first used as the base classifier,and then a hybrid model was introduced by a two-step process.First,the particle swarm optimization(PSO)algorithm was employed to select the SVM model hyperparameters.Second,two feature selection algorithms,namely principal component analysis(PCA)and PSO,were integrated into the PSO-based SVM model,which generated the PCA-PSO-SVM and FS-PSO-SVM models,respectively.Three statistical metrics(accuracy,recall,and specificity)and the area under the receiver operating characteristic curve(AUC)were employed to evaluate and validate the performance of the models.The results indicated that the feature selection-based models exhibited the best performance,followed by the PSO-based SVM and SVM models.Moreover,the performance of the FS-PSO-SVM model was better than that of the PCA-PSO-SVM model,showing the highest AUC,accuracy,recall,and specificity values in both the training and testing processes.It was found that the selection of optimal features is crucial to improving the reliability of debris flow susceptibility assessment results.Moreover,the PSO algorithm was found to be not only an effective tool for hyperparameter optimization,but also a useful feature selection algorithm to improve prediction accuracies of debris flow susceptibility by using machine learning algorithms.The high and very high debris flow susceptibility zone appropriately covers 38.01%of the study area,where debris flow may occur under intensive human activities and heavy rainfall events.展开更多
On-line monitoring and fault diagnosis of chemical process is extremely important for operation safety and product quality. Principal component analysis (PCA) has been widely used in multivariate statistical process m...On-line monitoring and fault diagnosis of chemical process is extremely important for operation safety and product quality. Principal component analysis (PCA) has been widely used in multivariate statistical process monitoring for its ability to reduce processes dimensions. PCA and other statistical techniques, however, have difficulties in differentiating faults correctly in complex chemical process. Support vector machine (SVM) is a novel approach based on statistical learning theory, which has emerged for feature identification and classification. In this paper, an integrated method is applied for process monitoring and fault diagnosis, which combines PCA for fault feature extraction and multiple SVMs for identification of different fault sources. This approach is verified and illustrated on the Tennessee Eastman benchmark process as a case study. Results show that the proposed PCA-SVMs method has good diagnosis capability and overall diagnosis correctness rate.展开更多
A novel configuration performance prediction approach with combination of principal component analysis(PCA) and support vector machine(SVM) was proposed.This method can estimate the performance parameter values of a n...A novel configuration performance prediction approach with combination of principal component analysis(PCA) and support vector machine(SVM) was proposed.This method can estimate the performance parameter values of a newly configured product through soft computing technique instead of practical test experiments,which helps to evaluate whether or not the product variant can satisfy the customers' individual requirements.The PCA technique was used to reduce and orthogonalize the module parameters that affect the product performance.Then,these extracted features were used as new input variables in SVM model to mine knowledge from the limited existing product data.The performance values of a newly configured product can be predicted by means of the trained SVM models.This PCA-SVM method can ensure that the performance prediction is executed rapidly and accurately,even under the small sample conditions.The applicability of the proposed method was verified on a family of plate electrostatic precipitators.展开更多
This article presents an anomaly detection system based on principal component analysis (PCA) and support vector machine (SVM). The system first creates a profile defining a normal behavior by frequency-based sche...This article presents an anomaly detection system based on principal component analysis (PCA) and support vector machine (SVM). The system first creates a profile defining a normal behavior by frequency-based scheme, and then compares the similarity of a current behavior with the created profile to decide whether the input instance is norreal or anomaly. In order to avoid overfitting and reduce the computational burden, normal behavior principal features are extracted by the PCA method. SVM is used to distinguish normal or anomaly for user behavior after training procedure has been completed by learning. In the experiments for performance evaluation the system achieved a correct detection rate equal to 92.2% and a false detection rate equal to 2.8%.展开更多
Laser-induced breakdown spectroscopy(LIBS) is a versatile tool for both qualitative and quantitative analysis.In this paper,LIBS combined with principal component analysis(PCA) and support vector machine(SVM) is...Laser-induced breakdown spectroscopy(LIBS) is a versatile tool for both qualitative and quantitative analysis.In this paper,LIBS combined with principal component analysis(PCA) and support vector machine(SVM) is applied to rock analysis.Fourteen emission lines including Fe,Mg,Ca,Al,Si,and Ti are selected as analysis lines.A good accuracy(91.38% for the real rock) is achieved by using SVM to analyze the spectroscopic peak area data which are processed by PCA.It can not only reduce the noise and dimensionality which contributes to improving the efficiency of the program,but also solve the problem of linear inseparability by combining PCA and SVM.By this method,the ability of LIBS to classify rock is validated.展开更多
Support vector classifier (SVC) has the superior advantages for small sample learning problems with high dimensions, with especially better generalization ability. However there is some redundancy among the high dim...Support vector classifier (SVC) has the superior advantages for small sample learning problems with high dimensions, with especially better generalization ability. However there is some redundancy among the high dimensions of the original samples and the main features of the samples may be picked up first to improve the performance of SVC. A principal component analysis (PCA) is employed to reduce the feature dimensions of the original samples and the pre-selected main features efficiently, and an SVC is constructed in the selected feature space to improve the learning speed and identification rate of SVC. Furthermore, a heuristic genetic algorithm-based automatic model selection is proposed to determine the hyperparameters of SVC to evaluate the performance of the learning machines. Experiments performed on the Heart and Adult benchmark data sets demonstrate that the proposed PCA-based SVC not only reduces the test time drastically, but also improves the identify rates effectively.展开更多
Accurate cost estimation at the early stage of a construction project is key factor in a project’s success. But it is difficult to quickly and accurately estimate construction costs at the planning stage, when drawin...Accurate cost estimation at the early stage of a construction project is key factor in a project’s success. But it is difficult to quickly and accurately estimate construction costs at the planning stage, when drawings, documentation and the like are still incomplete. As such, various techniques have been applied to accurately estimate construction costs at an early stage, when project information is limited. While the various techniques have their pros and cons, there has been little effort made to determine the best technique in terms of cost estimating performance. The objective of this research is to compare the accuracy of three estimating techniques (regression analysis (RA), neural network (NN), and support vector machine techniques (SVM)) by performing estimations of construction costs. By comparing the accuracy of these techniques using historical cost data, it was found that NN model showed more accurate estimation results than the RA and SVM models. Consequently, it is determined that NN model is most suitable for estimating the cost of school building projects.展开更多
Rainfall forecasting is becoming more and more significant and precipitation anomalies would lead to droughts and floods disasters.However,because of the complexity and non-stationary of rainfall data,it is difficult ...Rainfall forecasting is becoming more and more significant and precipitation anomalies would lead to droughts and floods disasters.However,because of the complexity and non-stationary of rainfall data,it is difficult to forecast.In this paper,a novel hybrid model to forecast rainfall is developed by incorporating singular spectrum analysis (SSA) and dragonfly algorithm (DA) into support vector regression (SVR) method.Firstly,SSA is used for extracting the trend components of the hydrological data.Then,SVR is utilized to deal with the volatility and irregularity of the precipitation series.Finally,the parameter of SVR is optimized by DA.The proposed SSA-DA-SVR method is used to forecast the monthly precipitation for Songbai,Panshui,Lanma and Jiulongchi stations.To validate the efficiency of the method,four compared models,DA-SVR,SSA-GWO-SVR,SSA-PSO-SVR and SSA-CS-SVR are established.The result shows that the proposed method has the best performance among all five models,and its prediction has high precision and accuracy.展开更多
Machine learning algorithms (MLs) can potentially improve disease diagnostics, leading to early detection and treatment of these diseases. As a malignant tumor whose primary focus is located in the bronchial mucosal e...Machine learning algorithms (MLs) can potentially improve disease diagnostics, leading to early detection and treatment of these diseases. As a malignant tumor whose primary focus is located in the bronchial mucosal epithelium, lung cancer has the highest mortality and morbidity among cancer types, threatening health and life of patients suffering from the disease. Machine learning algorithms such as Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Naïve Bayes (NB) have been used for lung cancer prediction. However they still face challenges such as high dimensionality of the feature space, over-fitting, high computational complexity, noise and missing data, low accuracies, low precision and high error rates. Ensemble learning, which combines classifiers, may be helpful to boost prediction on new data. However, current ensemble ML techniques rarely consider comprehensive evaluation metrics to evaluate the performance of individual classifiers. The main purpose of this study was to develop an ensemble classifier that improves lung cancer prediction. An ensemble machine learning algorithm is developed based on RF, SVM, NB, and KNN. Feature selection is done based on Principal Component Analysis (PCA) and Analysis of Variance (ANOVA). This algorithm is then executed on lung cancer data and evaluated using execution time, true positives (TP), true negatives (TN), false positives (FP), false negatives (FN), false positive rate (FPR), recall (R), precision (P) and F-measure (FM). Experimental results show that the proposed ensemble classifier has the best classification of 0.9825% with the lowest error rate of 0.0193. This is followed by SVM in which the probability of having the best classification is 0.9652% at an error rate of 0.0206. On the other hand, NB had the worst performance of 0.8475% classification at 0.0738 error rate.展开更多
A novel method for developing a reliable data driven soft sensor to improve the prediction accuracy of sulfur content in hydrodesulfurization(HDS) process was proposed. Therefore, an integrated approach using support ...A novel method for developing a reliable data driven soft sensor to improve the prediction accuracy of sulfur content in hydrodesulfurization(HDS) process was proposed. Therefore, an integrated approach using support vector regression(SVR) based on wavelet transform(WT) and principal component analysis(PCA) was used. Experimental data from the HDS setup were employed to validate the proposed model. The results reveal that the integrated WT-PCA with SVR model was able to increase the prediction accuracy of SVR model. Implementation of the proposed model delivers the best satisfactory predicting performance(EAARE=0.058 and R2=0.97) in comparison with SVR. The obtained results indicate that the proposed model is more reliable and more precise than the multiple linear regression(MLR), SVR and PCA-SVR.展开更多
With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistica...With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.展开更多
A combined model based on principal components analysis (PCA) and generalized regression neural network (GRNN) was adopted to forecast electricity price in day-ahead electricity market. PCA was applied to mine the mai...A combined model based on principal components analysis (PCA) and generalized regression neural network (GRNN) was adopted to forecast electricity price in day-ahead electricity market. PCA was applied to mine the main influence on day-ahead price, avoiding the strong correlation between the input factors that might influence electricity price, such as the load of the forecasting hour, other history loads and prices, weather and temperature; then GRNN was employed to forecast electricity price according to the main information extracted by PCA. To prove the efficiency of the combined model, a case from PJM (Pennsylvania-New Jersey-Maryland) day-ahead electricity market was evaluated. Compared to back-propagation (BP) neural network and standard GRNN, the combined method reduces the mean absolute percentage error about 3%.展开更多
By selecting the time sequence data concerning influencing factors of rural consumer demand in Hebei Province from 2000 to 2010,this paper uses the principal component analysis method in multiplex econometric statisti...By selecting the time sequence data concerning influencing factors of rural consumer demand in Hebei Province from 2000 to 2010,this paper uses the principal component analysis method in multiplex econometric statistical analysis,constructs the principal component of consumer demand in Hebei Province,conducts regression on the dependent variable of consumer spending per capita in Hebei Province and the principal component of consumer demand so as to get principal component regression,and then conducts quantitative and qualitative analysis on the principal component.The results show that total output value per capita (yuan),employment rate,and income gap,are correlative with rural residents' consumer demand in Hebei Province positively;consumer price index,upbringing ratio of children,and one-year interest rate are correlative with rural residents' consumer demand in Hebei Province negatively;the ratio of supporting the elderly and medical care spending per capita are correlative with rural residents' consumer demand in Hebei Province positively.The corresponding countermeasures and suggestions are put forward to promote residents' consumer demand in Hebei Province as follows:develop county economy in Hebei Province and increase rural residents' consumer demand;use industry to support agriculture and coordinate urban-rural development;improve rural medical care and health system and resolve actual difficulties of the masses.展开更多
This paper studies the problem of tensor principal component analysis (PCA). Usually the tensor PCA is viewed as a low-rank matrix completion problem via matrix factorization technique, and nuclear norm is used as a c...This paper studies the problem of tensor principal component analysis (PCA). Usually the tensor PCA is viewed as a low-rank matrix completion problem via matrix factorization technique, and nuclear norm is used as a convex approximation of the rank operator under mild condition. However, most nuclear norm minimization approaches are based on SVD operations. Given a matrix , the time complexity of SVD operation is O(mn2), which brings prohibitive computational complexity in large-scale problems. In this paper, an efficient and scalable algorithm for tensor principal component analysis is proposed which is called Linearized Alternating Direction Method with Vectorized technique for Tensor Principal Component Analysis (LADMVTPCA). Different from traditional matrix factorization methods, LADMVTPCA utilizes the vectorized technique to formulate the tensor as an outer product of vectors, which greatly improves the computational efficacy compared to matrix factorization method. In the experiment part, synthetic tensor data with different orders are used to empirically evaluate the proposed algorithm LADMVTPCA. Results have shown that LADMVTPCA outperforms matrix factorization based method.展开更多
Objective: To introduce a method to calculate cardiovascular age, a new, accurate and much simpler index for assessing cardiovascular autonomic regulatory function, based on statistical analysis of heart rate and bloo...Objective: To introduce a method to calculate cardiovascular age, a new, accurate and much simpler index for assessing cardiovascular autonomic regulatory function, based on statistical analysis of heart rate and blood pressure variability (HRV and BPV) and baroreflex sensitivity (BRS) data. Methods: Firstly, HRV and BPV of 89 healthy aviation personnel were analyzed by the conventional autoregressive (AR) spectral analysis and their spontaneous BRS was obtained by the sequence method. Secondly, principal component analysis was conducted over original and derived indices of HRV, BPV and BRS data and the relevant principal components, PCi orig and PCi deri (i=1, 2, 3,...) were obtained. Finally, the equation for calculating cardiovascular age was obtained by multiple regression with the chronological age being assigned as the dependent variable and the principal components significantly related to age as the regressors. Results: The first four principal components of original indices accounted for over 90% of total variance of the indices, so did the first three principal components of derived indices. So, these seven principal components could reflect the information of cardiovascular autonomic regulation which was embodied in the 17 indices of HRV, BPV and BRS exactly with a minimal loss of information. Of the seven principal components, PC2 orig , PC4 orig and PC2 deri were negatively correlated with the chronological age ( P <0 05), whereas the PC3 orig was positively correlated with the chronological age ( P <0 01). The cardiovascular age thus calculated from the regression equation was significantly correlated with the chronological age among the 89 aviation personnel ( r =0.73, P <0 01). Conclusion: The cardiovascular age calculated based on a multi variate analysis of HRV, BPV and BRS could be regarded as a comprehensive indicator reflecting the age dependency of autonomic regulation of cardiovascular system in healthy aviation personnel.展开更多
There are a variety of classification techniques such as neural network, decision tree, support vector machine and logistic regression. The problem of dimensionality is pertinent to many learning algorithms, and it de...There are a variety of classification techniques such as neural network, decision tree, support vector machine and logistic regression. The problem of dimensionality is pertinent to many learning algorithms, and it denotes the drastic raise of computational complexity, however, we need to use dimensionality reduction methods. These methods include principal component analysis (PCA) and locality preserving projection (LPP). In many real-world classification problems, the local structure is more important than the global structure and dimensionality reduction techniques ignore the local structure and preserve the global structure. The objectives is to compare PCA and LPP in terms of accuracy, to develop appropriate representations of complex data by reducing the dimensions of the data and to explain the importance of using LPP with logistic regression. The results of this paper find that the proposed LPP approach provides a better representation and high accuracy than the PCA approach.展开更多
As the market competition of steel mills is severe,deoxidization alloying is an important link in the metallurgical process.To solve this problem,principal component regression analysis is adopted to reduce the dimens...As the market competition of steel mills is severe,deoxidization alloying is an important link in the metallurgical process.To solve this problem,principal component regression analysis is adopted to reduce the dimension of influencing factors,and a reasonable and reliable prediction model of element yield is established.Based on the constraint conditions such as target cost function constraint,yield constraint and non-negative constraint,linear programming is adopted to design the lowest cost batting scheme that meets the national standards and production requirements.The research results provide a reliable optimization model for the deoxidization and alloying process of steel mills,which is of positive significance for improving the market competitiveness of steel mills,reducing waste discharge and protecting the environment.展开更多
In order to predict the coal outburst risk quickly and accurately,a PCA-FA-SVM based coal and gas outburst risk prediction model was designed.Principal component analysis(PCA)was used to pre-process the original data ...In order to predict the coal outburst risk quickly and accurately,a PCA-FA-SVM based coal and gas outburst risk prediction model was designed.Principal component analysis(PCA)was used to pre-process the original data samples,extract the principal components of the samples,use firefly algorithm(FA)to improve the support vector machine model,and compare and analyze the prediction results of PCA-FA-SVM model with BP model,FA-SVM model,FA-BP model and SVM model.Accuracy rate,recall rate,Macro-F1 and model prediction time were used as evaluation indexes.The results show that:Principal component analysis improves the prediction efficiency and accuracy of FA-SVM model.The accuracy rate of PCA-FA-SVM model predicting coal and gas outburst risk is 0.962,recall rate is 0.955,Macro-F1 is 0.957,and model prediction time is 0.312s.Compared with other models,The comprehensive performance of PCA-FA-SVM model is better.展开更多
基金Supported by Scientific Research Project for Commonwealth (GYHY200806017)Innovation Project for Graduate of Jiangsu Province (CX09S-018Z)
文摘The evaluation model was established to estimate the number of houses collapsed during typhoon disaster for Zhejiang Province.The factor leading to disaster,the environment fostering disaster and the exposure of buildings were processed by Principal Component Analysis.The key factor was extracted to support input of vector machine model and to build an evaluation model;the historical fitting result kept in line with the fact.In the real evaluation of two typhoons landed in Zhejiang Province in 2008 and 2009,the coincidence of evaluating result and actual value proved the feasibility of this model.
文摘Financial time series forecasting could be beneficial for individual as well as institutional investors. But, the high noise and complexity residing in the financial data make this job extremely challenging. Over the years, many researchers have used support vector regression (SVR) quite successfully to conquer this challenge. In this paper, an SVR based forecasting model is proposed which first uses the principal component analysis (PCA) to extract the low-dimensional and efficient feature information, and then uses the independent component analysis (ICA) to preprocess the extracted features to nullify the influence of noise in the features. Experiments were carried out based on 16 years’ historical data of three prominent stocks from three different sectors listed in Dhaka Stock Exchange (DSE), Bangladesh. The predictions were made for 1 to 4 days in advance targeting the short term prediction. For comparison, the integration of PCA with SVR (PCA-SVR), ICA with SVR (ICA-SVR) and single SVR approaches were applied to evaluate the prediction accuracy of the proposed approach. Experimental results show that the proposed model (PCA-ICA-SVR) outperforms the PCA-SVR, ICA-SVR and single SVR methods.
基金supported by the Second Tibetan Plateau Scientific Expedition and Research Program(Grant no.2019QZKK0904)Natural Science Foundation of Hebei Province(Grant no.D2022403032)S&T Program of Hebei(Grant no.E2021403001).
文摘The selection of important factors in machine learning-based susceptibility assessments is crucial to obtain reliable susceptibility results.In this study,metaheuristic optimization and feature selection techniques were applied to identify the most important input parameters for mapping debris flow susceptibility in the southern mountain area of Chengde City in Hebei Province,China,by using machine learning algorithms.In total,133 historical debris flow records and 16 related factors were selected.The support vector machine(SVM)was first used as the base classifier,and then a hybrid model was introduced by a two-step process.First,the particle swarm optimization(PSO)algorithm was employed to select the SVM model hyperparameters.Second,two feature selection algorithms,namely principal component analysis(PCA)and PSO,were integrated into the PSO-based SVM model,which generated the PCA-PSO-SVM and FS-PSO-SVM models,respectively.Three statistical metrics(accuracy,recall,and specificity)and the area under the receiver operating characteristic curve(AUC)were employed to evaluate and validate the performance of the models.The results indicated that the feature selection-based models exhibited the best performance,followed by the PSO-based SVM and SVM models.Moreover,the performance of the FS-PSO-SVM model was better than that of the PCA-PSO-SVM model,showing the highest AUC,accuracy,recall,and specificity values in both the training and testing processes.It was found that the selection of optimal features is crucial to improving the reliability of debris flow susceptibility assessment results.Moreover,the PSO algorithm was found to be not only an effective tool for hyperparameter optimization,but also a useful feature selection algorithm to improve prediction accuracies of debris flow susceptibility by using machine learning algorithms.The high and very high debris flow susceptibility zone appropriately covers 38.01%of the study area,where debris flow may occur under intensive human activities and heavy rainfall events.
文摘On-line monitoring and fault diagnosis of chemical process is extremely important for operation safety and product quality. Principal component analysis (PCA) has been widely used in multivariate statistical process monitoring for its ability to reduce processes dimensions. PCA and other statistical techniques, however, have difficulties in differentiating faults correctly in complex chemical process. Support vector machine (SVM) is a novel approach based on statistical learning theory, which has emerged for feature identification and classification. In this paper, an integrated method is applied for process monitoring and fault diagnosis, which combines PCA for fault feature extraction and multiple SVMs for identification of different fault sources. This approach is verified and illustrated on the Tennessee Eastman benchmark process as a case study. Results show that the proposed PCA-SVMs method has good diagnosis capability and overall diagnosis correctness rate.
基金Project(9140A18010210KG01) supported by the Departmental Pre-Research Fund of China
文摘A novel configuration performance prediction approach with combination of principal component analysis(PCA) and support vector machine(SVM) was proposed.This method can estimate the performance parameter values of a newly configured product through soft computing technique instead of practical test experiments,which helps to evaluate whether or not the product variant can satisfy the customers' individual requirements.The PCA technique was used to reduce and orthogonalize the module parameters that affect the product performance.Then,these extracted features were used as new input variables in SVM model to mine knowledge from the limited existing product data.The performance values of a newly configured product can be predicted by means of the trained SVM models.This PCA-SVM method can ensure that the performance prediction is executed rapidly and accurately,even under the small sample conditions.The applicability of the proposed method was verified on a family of plate electrostatic precipitators.
基金Supported by the Natural Science Foundation ofHubei Province (2005ABA256)
文摘This article presents an anomaly detection system based on principal component analysis (PCA) and support vector machine (SVM). The system first creates a profile defining a normal behavior by frequency-based scheme, and then compares the similarity of a current behavior with the created profile to decide whether the input instance is norreal or anomaly. In order to avoid overfitting and reduce the computational burden, normal behavior principal features are extracted by the PCA method. SVM is used to distinguish normal or anomaly for user behavior after training procedure has been completed by learning. In the experiments for performance evaluation the system achieved a correct detection rate equal to 92.2% and a false detection rate equal to 2.8%.
基金Project supported by the National Natural Science Foundation of China(Grant No.11075184)the Knowledge Innovation Program of the Chinese Academy of Sciences(CAS)(Grant No.Y03RC21124)the CAS President’s International Fellowship Initiative Foundation(Grant No.2015VMA007)
文摘Laser-induced breakdown spectroscopy(LIBS) is a versatile tool for both qualitative and quantitative analysis.In this paper,LIBS combined with principal component analysis(PCA) and support vector machine(SVM) is applied to rock analysis.Fourteen emission lines including Fe,Mg,Ca,Al,Si,and Ti are selected as analysis lines.A good accuracy(91.38% for the real rock) is achieved by using SVM to analyze the spectroscopic peak area data which are processed by PCA.It can not only reduce the noise and dimensionality which contributes to improving the efficiency of the program,but also solve the problem of linear inseparability by combining PCA and SVM.By this method,the ability of LIBS to classify rock is validated.
基金the National Natural Science of China (50675167)a Foundation for the Author of National Excellent Doctoral Dissertation of China(200535)
文摘Support vector classifier (SVC) has the superior advantages for small sample learning problems with high dimensions, with especially better generalization ability. However there is some redundancy among the high dimensions of the original samples and the main features of the samples may be picked up first to improve the performance of SVC. A principal component analysis (PCA) is employed to reduce the feature dimensions of the original samples and the pre-selected main features efficiently, and an SVC is constructed in the selected feature space to improve the learning speed and identification rate of SVC. Furthermore, a heuristic genetic algorithm-based automatic model selection is proposed to determine the hyperparameters of SVC to evaluate the performance of the learning machines. Experiments performed on the Heart and Adult benchmark data sets demonstrate that the proposed PCA-based SVC not only reduces the test time drastically, but also improves the identify rates effectively.
文摘Accurate cost estimation at the early stage of a construction project is key factor in a project’s success. But it is difficult to quickly and accurately estimate construction costs at the planning stage, when drawings, documentation and the like are still incomplete. As such, various techniques have been applied to accurately estimate construction costs at an early stage, when project information is limited. While the various techniques have their pros and cons, there has been little effort made to determine the best technique in terms of cost estimating performance. The objective of this research is to compare the accuracy of three estimating techniques (regression analysis (RA), neural network (NN), and support vector machine techniques (SVM)) by performing estimations of construction costs. By comparing the accuracy of these techniques using historical cost data, it was found that NN model showed more accurate estimation results than the RA and SVM models. Consequently, it is determined that NN model is most suitable for estimating the cost of school building projects.
文摘Rainfall forecasting is becoming more and more significant and precipitation anomalies would lead to droughts and floods disasters.However,because of the complexity and non-stationary of rainfall data,it is difficult to forecast.In this paper,a novel hybrid model to forecast rainfall is developed by incorporating singular spectrum analysis (SSA) and dragonfly algorithm (DA) into support vector regression (SVR) method.Firstly,SSA is used for extracting the trend components of the hydrological data.Then,SVR is utilized to deal with the volatility and irregularity of the precipitation series.Finally,the parameter of SVR is optimized by DA.The proposed SSA-DA-SVR method is used to forecast the monthly precipitation for Songbai,Panshui,Lanma and Jiulongchi stations.To validate the efficiency of the method,four compared models,DA-SVR,SSA-GWO-SVR,SSA-PSO-SVR and SSA-CS-SVR are established.The result shows that the proposed method has the best performance among all five models,and its prediction has high precision and accuracy.
文摘Machine learning algorithms (MLs) can potentially improve disease diagnostics, leading to early detection and treatment of these diseases. As a malignant tumor whose primary focus is located in the bronchial mucosal epithelium, lung cancer has the highest mortality and morbidity among cancer types, threatening health and life of patients suffering from the disease. Machine learning algorithms such as Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Naïve Bayes (NB) have been used for lung cancer prediction. However they still face challenges such as high dimensionality of the feature space, over-fitting, high computational complexity, noise and missing data, low accuracies, low precision and high error rates. Ensemble learning, which combines classifiers, may be helpful to boost prediction on new data. However, current ensemble ML techniques rarely consider comprehensive evaluation metrics to evaluate the performance of individual classifiers. The main purpose of this study was to develop an ensemble classifier that improves lung cancer prediction. An ensemble machine learning algorithm is developed based on RF, SVM, NB, and KNN. Feature selection is done based on Principal Component Analysis (PCA) and Analysis of Variance (ANOVA). This algorithm is then executed on lung cancer data and evaluated using execution time, true positives (TP), true negatives (TN), false positives (FP), false negatives (FN), false positive rate (FPR), recall (R), precision (P) and F-measure (FM). Experimental results show that the proposed ensemble classifier has the best classification of 0.9825% with the lowest error rate of 0.0193. This is followed by SVM in which the probability of having the best classification is 0.9652% at an error rate of 0.0206. On the other hand, NB had the worst performance of 0.8475% classification at 0.0738 error rate.
文摘A novel method for developing a reliable data driven soft sensor to improve the prediction accuracy of sulfur content in hydrodesulfurization(HDS) process was proposed. Therefore, an integrated approach using support vector regression(SVR) based on wavelet transform(WT) and principal component analysis(PCA) was used. Experimental data from the HDS setup were employed to validate the proposed model. The results reveal that the integrated WT-PCA with SVR model was able to increase the prediction accuracy of SVR model. Implementation of the proposed model delivers the best satisfactory predicting performance(EAARE=0.058 and R2=0.97) in comparison with SVR. The obtained results indicate that the proposed model is more reliable and more precise than the multiple linear regression(MLR), SVR and PCA-SVR.
基金founded by the National Natural Science Foundation of China(81202283,81473070,81373102 and81202267)Key Grant of Natural Science Foundation of the Jiangsu Higher Education Institutions of China(10KJA330034 and11KJA330001)+1 种基金the Research Fund for the Doctoral Program of Higher Education of China(20113234110002)the Priority Academic Program for the Development of Jiangsu Higher Education Institutions(Public Health and Preventive Medicine)
文摘With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.
基金Project(70671039) supported by the National Natural Science Foundation of China
文摘A combined model based on principal components analysis (PCA) and generalized regression neural network (GRNN) was adopted to forecast electricity price in day-ahead electricity market. PCA was applied to mine the main influence on day-ahead price, avoiding the strong correlation between the input factors that might influence electricity price, such as the load of the forecasting hour, other history loads and prices, weather and temperature; then GRNN was employed to forecast electricity price according to the main information extracted by PCA. To prove the efficiency of the combined model, a case from PJM (Pennsylvania-New Jersey-Maryland) day-ahead electricity market was evaluated. Compared to back-propagation (BP) neural network and standard GRNN, the combined method reduces the mean absolute percentage error about 3%.
基金Supported by Hebei Province Regional Economic Development Countermeasures Research Program (Fs201010)
文摘By selecting the time sequence data concerning influencing factors of rural consumer demand in Hebei Province from 2000 to 2010,this paper uses the principal component analysis method in multiplex econometric statistical analysis,constructs the principal component of consumer demand in Hebei Province,conducts regression on the dependent variable of consumer spending per capita in Hebei Province and the principal component of consumer demand so as to get principal component regression,and then conducts quantitative and qualitative analysis on the principal component.The results show that total output value per capita (yuan),employment rate,and income gap,are correlative with rural residents' consumer demand in Hebei Province positively;consumer price index,upbringing ratio of children,and one-year interest rate are correlative with rural residents' consumer demand in Hebei Province negatively;the ratio of supporting the elderly and medical care spending per capita are correlative with rural residents' consumer demand in Hebei Province positively.The corresponding countermeasures and suggestions are put forward to promote residents' consumer demand in Hebei Province as follows:develop county economy in Hebei Province and increase rural residents' consumer demand;use industry to support agriculture and coordinate urban-rural development;improve rural medical care and health system and resolve actual difficulties of the masses.
文摘This paper studies the problem of tensor principal component analysis (PCA). Usually the tensor PCA is viewed as a low-rank matrix completion problem via matrix factorization technique, and nuclear norm is used as a convex approximation of the rank operator under mild condition. However, most nuclear norm minimization approaches are based on SVD operations. Given a matrix , the time complexity of SVD operation is O(mn2), which brings prohibitive computational complexity in large-scale problems. In this paper, an efficient and scalable algorithm for tensor principal component analysis is proposed which is called Linearized Alternating Direction Method with Vectorized technique for Tensor Principal Component Analysis (LADMVTPCA). Different from traditional matrix factorization methods, LADMVTPCA utilizes the vectorized technique to formulate the tensor as an outer product of vectors, which greatly improves the computational efficacy compared to matrix factorization method. In the experiment part, synthetic tensor data with different orders are used to empirically evaluate the proposed algorithm LADMVTPCA. Results have shown that LADMVTPCA outperforms matrix factorization based method.
文摘Objective: To introduce a method to calculate cardiovascular age, a new, accurate and much simpler index for assessing cardiovascular autonomic regulatory function, based on statistical analysis of heart rate and blood pressure variability (HRV and BPV) and baroreflex sensitivity (BRS) data. Methods: Firstly, HRV and BPV of 89 healthy aviation personnel were analyzed by the conventional autoregressive (AR) spectral analysis and their spontaneous BRS was obtained by the sequence method. Secondly, principal component analysis was conducted over original and derived indices of HRV, BPV and BRS data and the relevant principal components, PCi orig and PCi deri (i=1, 2, 3,...) were obtained. Finally, the equation for calculating cardiovascular age was obtained by multiple regression with the chronological age being assigned as the dependent variable and the principal components significantly related to age as the regressors. Results: The first four principal components of original indices accounted for over 90% of total variance of the indices, so did the first three principal components of derived indices. So, these seven principal components could reflect the information of cardiovascular autonomic regulation which was embodied in the 17 indices of HRV, BPV and BRS exactly with a minimal loss of information. Of the seven principal components, PC2 orig , PC4 orig and PC2 deri were negatively correlated with the chronological age ( P <0 05), whereas the PC3 orig was positively correlated with the chronological age ( P <0 01). The cardiovascular age thus calculated from the regression equation was significantly correlated with the chronological age among the 89 aviation personnel ( r =0.73, P <0 01). Conclusion: The cardiovascular age calculated based on a multi variate analysis of HRV, BPV and BRS could be regarded as a comprehensive indicator reflecting the age dependency of autonomic regulation of cardiovascular system in healthy aviation personnel.
文摘There are a variety of classification techniques such as neural network, decision tree, support vector machine and logistic regression. The problem of dimensionality is pertinent to many learning algorithms, and it denotes the drastic raise of computational complexity, however, we need to use dimensionality reduction methods. These methods include principal component analysis (PCA) and locality preserving projection (LPP). In many real-world classification problems, the local structure is more important than the global structure and dimensionality reduction techniques ignore the local structure and preserve the global structure. The objectives is to compare PCA and LPP in terms of accuracy, to develop appropriate representations of complex data by reducing the dimensions of the data and to explain the importance of using LPP with logistic regression. The results of this paper find that the proposed LPP approach provides a better representation and high accuracy than the PCA approach.
文摘As the market competition of steel mills is severe,deoxidization alloying is an important link in the metallurgical process.To solve this problem,principal component regression analysis is adopted to reduce the dimension of influencing factors,and a reasonable and reliable prediction model of element yield is established.Based on the constraint conditions such as target cost function constraint,yield constraint and non-negative constraint,linear programming is adopted to design the lowest cost batting scheme that meets the national standards and production requirements.The research results provide a reliable optimization model for the deoxidization and alloying process of steel mills,which is of positive significance for improving the market competitiveness of steel mills,reducing waste discharge and protecting the environment.
基金financially supported by the National Natural Science Foundation of China(52174117,52004117)Postdoctoral Science Foundation of China(2021T140290,2020M680975)Science and Technology Research Project of Liaoning Provincial Department of Education(LJ2020JCL005).
文摘In order to predict the coal outburst risk quickly and accurately,a PCA-FA-SVM based coal and gas outburst risk prediction model was designed.Principal component analysis(PCA)was used to pre-process the original data samples,extract the principal components of the samples,use firefly algorithm(FA)to improve the support vector machine model,and compare and analyze the prediction results of PCA-FA-SVM model with BP model,FA-SVM model,FA-BP model and SVM model.Accuracy rate,recall rate,Macro-F1 and model prediction time were used as evaluation indexes.The results show that:Principal component analysis improves the prediction efficiency and accuracy of FA-SVM model.The accuracy rate of PCA-FA-SVM model predicting coal and gas outburst risk is 0.962,recall rate is 0.955,Macro-F1 is 0.957,and model prediction time is 0.312s.Compared with other models,The comprehensive performance of PCA-FA-SVM model is better.