The computational approaches of support vector machine (SVM), support vector regression (SVR) and molecular docking were widely utilized for the computation of active compounds. In this work, to improve the accura...The computational approaches of support vector machine (SVM), support vector regression (SVR) and molecular docking were widely utilized for the computation of active compounds. In this work, to improve the accuracy and reliability of prediction, the strategy of combining the above three computational approaches was applied to predict potential cytochrome P450 1A2 (CYP1A2) inhibitors. The accuracy of the optimal SVM qualitative model was 99.432%, 97.727%, and 91.667% for training set, internal test set and external test set, respectively, showing this model had high discrimination ability. The R2 and mean square error for the optimal SVR quantitative model were 0.763, 0.013 for training set, and 0.753, 0.056 for test set respectively, indicating that this SVR model has high predictive ability for the biolog-ical activities of compounds. According to the results of the SVM and SVR models, some types of descriptors were identi ed to be essential to bioactivity prediction of compounds, including the connectivity indices, constitutional descriptors and functional group counts. Moreover, molecular docking studies were used to reveal the binding poses and binding a n-ity of potential inhibitors interacting with CYP1A2. Wherein, the amino acids of THR124 and ASP320 could form key hydrogen bond interactions with active compounds. And the amino acids of ALA317 and GLY316 could form strong hydrophobic bond interactions with active compounds. The models obtained above were applied to discover potential CYP1A2 inhibitors from natural products, which could predict the CYPs-mediated drug-drug inter-actions and provide useful guidance and reference for rational drug combination therapy. A set of 20 potential CYP1A2 inhibitors were obtained. Part of the results was consistent with references, which further indicates the accuracy of these models and the reliability of this combinatorial computation strategy.展开更多
The distribution of data has a significant impact on the results of classification.When the distribution of one class is insignificant compared to the distribution of another class,data imbalance occurs.This will resu...The distribution of data has a significant impact on the results of classification.When the distribution of one class is insignificant compared to the distribution of another class,data imbalance occurs.This will result in rising outlier values and noise.Therefore,the speed and performance of classification could be greatly affected.Given the above problems,this paper starts with the motivation and mathematical representing of classification,puts forward a new classification method based on the relationship between different classification formulations.Combined with the vector characteristics of the actual problem and the choice of matrix characteristics,we firstly analyze the orderly regression to introduce slack variables to solve the constraint problem of the lone point.Then we introduce the fuzzy factors to solve the problem of the gap between the isolated points on the basis of the support vector machine.We introduce the cost control to solve the problem of sample skew.Finally,based on the bi-boundary support vector machine,a twostep weight setting twin classifier is constructed.This can help to identify multitasks with feature-selected patterns without the need for additional optimizers,which solves the problem of large-scale classification that can’t deal effectively with the very low category distribution gap.展开更多
Support Vector-based learning methods are an important part of Computational Intelligence techniques. Recent efforts have been dealing with the problem of learning from very large datasets. This paper reviews the most...Support Vector-based learning methods are an important part of Computational Intelligence techniques. Recent efforts have been dealing with the problem of learning from very large datasets. This paper reviews the most commonly used formulations of support vector machines for regression (SVRs) aiming to emphasize its usability on large-scale applications. We review the general concept of support vector machines (SVMs), address the state-of-the-art on training methods SVMs, and explain the fundamental principle of SVRs. The most common learning methods for SVRs are introduced and linear programming-based SVR formulations are explained emphasizing its suitability for large-scale learning. Finally, this paper also discusses some open problems and current trends.展开更多
In this paper we apply the nonlinear time series analysis method to small-time scale traffic measurement data. The prediction-based method is used to determine the embedding dimension of the traffic data. Based on the...In this paper we apply the nonlinear time series analysis method to small-time scale traffic measurement data. The prediction-based method is used to determine the embedding dimension of the traffic data. Based on the reconstructed phase space, the local support vector machine prediction method is used to predict the traffic measurement data, and the BIC-based neighbouring point selection method is used to choose the number of the nearest neighbouring points for the local support vector machine regression model. The experimental results show that the local support vector machine prediction method whose neighbouring points are optimized can effectively predict the small-time scale traffic measurement data and can reproduce the statistical features of real traffic measurements.展开更多
As the solutions of the least squares support vector regression machine (LS-SVRM) are not sparse, it leads to slow prediction speed and limits its applications. The defects of the ex- isting adaptive pruning algorit...As the solutions of the least squares support vector regression machine (LS-SVRM) are not sparse, it leads to slow prediction speed and limits its applications. The defects of the ex- isting adaptive pruning algorithm for LS-SVRM are that the training speed is slow, and the generalization performance is not satis- factory, especially for large scale problems. Hence an improved algorithm is proposed. In order to accelerate the training speed, the pruned data point and fast leave-one-out error are employed to validate the temporary model obtained after decremental learning. The novel objective function in the termination condition which in- volves the whole constraints generated by all training data points and three pruning strategies are employed to improve the generali- zation performance. The effectiveness of the proposed algorithm is tested on six benchmark datasets. The sparse LS-SVRM model has a faster training speed and better generalization performance.展开更多
Tensor representation is useful to reduce the overfitting problem in vector-based learning algorithm in pattern recognition.This is mainly because the structure information of objects in pattern analysis is a reasonab...Tensor representation is useful to reduce the overfitting problem in vector-based learning algorithm in pattern recognition.This is mainly because the structure information of objects in pattern analysis is a reasonable constraint to reduce the number of unknown parameters used to model a classifier.In this paper, we generalize the vector-based learning algorithm TWin Support Vector Machine(TWSVM) to the tensor-based method TWin Support Tensor Machines(TWSTM), which accepts general tensors as input.To examine the effectiveness of TWSTM, we implement the TWSTM method for Microcalcification Clusters(MCs) detection.In the tensor subspace domain, the MCs detection procedure is formulated as a supervised learning and classification problem, and TWSTM is used as a classifier to make decision for the presence of MCs or not.A large number of experiments were carried out to evaluate and compare the performance of the proposed MCs detection algorithm.By comparison with TWSVM, the tensor version reduces the overfitting problem.展开更多
In this paper a new continuous variable called core-ratio is defined to describe the probability for a residue to be in a binding site, thereby replacing the previous binary description of the interface residue using ...In this paper a new continuous variable called core-ratio is defined to describe the probability for a residue to be in a binding site, thereby replacing the previous binary description of the interface residue using 0 and 1. So we can use the support vector machine regression method to fit the core-ratio value and predict the protein binding sites. We also design a new group of physical and chemical descriptors to characterize the binding sites. The new descriptors are more effective, with an averaging procedure used. Our test shows that much better prediction results can be obtained by the support vector regression (SVR) method than by the support vector classification method.展开更多
Accurate cost estimation at the early stage of a construction project is key factor in a project’s success. But it is difficult to quickly and accurately estimate construction costs at the planning stage, when drawin...Accurate cost estimation at the early stage of a construction project is key factor in a project’s success. But it is difficult to quickly and accurately estimate construction costs at the planning stage, when drawings, documentation and the like are still incomplete. As such, various techniques have been applied to accurately estimate construction costs at an early stage, when project information is limited. While the various techniques have their pros and cons, there has been little effort made to determine the best technique in terms of cost estimating performance. The objective of this research is to compare the accuracy of three estimating techniques (regression analysis (RA), neural network (NN), and support vector machine techniques (SVM)) by performing estimations of construction costs. By comparing the accuracy of these techniques using historical cost data, it was found that NN model showed more accurate estimation results than the RA and SVM models. Consequently, it is determined that NN model is most suitable for estimating the cost of school building projects.展开更多
The stochastic simulation algorithm (SSA) accurately depicts spatially homogeneous wellstirred chemically reacting systems with small populations of chemical species and properly represents noise, but it is often ab...The stochastic simulation algorithm (SSA) accurately depicts spatially homogeneous wellstirred chemically reacting systems with small populations of chemical species and properly represents noise, but it is often abandoned when modeling larger systems because of its computational complexity. In this work, a twin support vector regression based stochastic simulations algorithm (TS^3A) is proposed by combining the twin support vector regression and SSA, the former is a well-known robust regression method in machine learning. Numerical results indicate that this proposed algorithm can be applied to a wide range of chemically reacting systems and obtain significant improvements on efficiency and accuracy with fewer simulating runs over the existing methods.展开更多
Path loss prediction models are vital for accurate signal propagation in wireless channels. Empirical and deterministic models used in path loss predictions have not produced optimal results. In this paper, we introdu...Path loss prediction models are vital for accurate signal propagation in wireless channels. Empirical and deterministic models used in path loss predictions have not produced optimal results. In this paper, we introduced machine learning algorithms to path loss predictions because it offers a flexible network architecture and extensive data can be used. We introduced support vector regression (SVR) and radial basis function (RBF) models to path loss predictions in the investigated environments. The SVR model was able to process several input parameters without introducing complexity to the network architecture. The RBF on its part provides a good function approximation. Hyperparameter tuning of the machine learning models was carried out in order to achieve optimal results. The performances of the SVR and RBF models were compared and result validated using the root-mean squared error (RMSE). The two machine learning algorithms were also compared with the Cost-231, SUI, Egli, Freespace, Cost-231 W-I models. The analytical models overpredicted path loss. Overall, the machine learning models predicted path loss with greater accuracy than the empirical models. The SVR model performed best across all the indices with RMSE values of 1.378 dB, 1.4523 dB, 2.1568 dB in rural, suburban and urban settings respectively and should therefore be adopted for signal propagation in the investigated environments and beyond.展开更多
In order to handle the semi-supervised problem quickly and efficiently in the twin support vector machine (TWSVM) field, a semi-supervised twin support vector machine (S2TSVM) is proposed by adding the original unlabe...In order to handle the semi-supervised problem quickly and efficiently in the twin support vector machine (TWSVM) field, a semi-supervised twin support vector machine (S2TSVM) is proposed by adding the original unlabeled samples. In S2TSVM, the addition of unlabeled samples can easily cause the classification hyper plane to deviate from the sample points. Then a centerdistance principle is proposed to pre-classify unlabeled samples, and a pre-classified S2TSVM (PS2TSVM) is proposed. Compared with S2TSVM, PS2TSVM not only improves the problem of the samples deviating from the classification hyper plane, but also improves the training speed. Then PS2TSVM is smoothed. After smoothing the model, the pre-classified smooth S2TSVM (PS3TSVM) is obtained, and its convergence is deduced. Finally, nine datasets are selected in the UCI machine learning database for comparison with other types of semi-supervised models. The experimental results show that the proposed PS3TSVM model has better classification results.展开更多
The principle of the support vector regression machine(SVR) is first analysed. Then the new data-dependent kernel function is constructed from information geometry perspective. The current waveforms change regularly...The principle of the support vector regression machine(SVR) is first analysed. Then the new data-dependent kernel function is constructed from information geometry perspective. The current waveforms change regularly in accordance with the different horizontal offset when the rotational frequency of the high speed rotational arc sensor is in the range from 15 Hz to 30 Hz. The welding current data is pretreated by wavelet filtering, mean filtering and normalization treatment. The SVR model is constructed by making use of the evolvement laws, the decision function can be achieved by training the SVR and the seam offset can be identified. The experimental results show that the precision of the offset identification can be greatly improved by modifying the SVR and applying mean filteringfrom the longitudinal direction.展开更多
The prediction of magnitude (M) of reservoir induced earthquake is an important task in earthquake engineering. In this article, we employ a Support Vector Machine (SVM) and Gaussian Process Regression (GPR) for...The prediction of magnitude (M) of reservoir induced earthquake is an important task in earthquake engineering. In this article, we employ a Support Vector Machine (SVM) and Gaussian Process Regression (GPR) for prediction of reservoir induced earthquake M based on reservoir parameters. Comprehensive parameter (E) and maximum reservoir depth] (H) are considered as inputs to the SVM and GPR. We give an equation for determination oil reservoir induced earthquake M. The developed SVM and GPR have been compared with] the Artificial Neural Network (ANN) method. The results show that the developed SVM and] GPR are efficient tools for prediction of reservoir induced earthquake M. /展开更多
With the progress of deep learning research, convolutional neural networks have become the most important method in feature extraction. How to effectively classify and recognize the extracted features will directly af...With the progress of deep learning research, convolutional neural networks have become the most important method in feature extraction. How to effectively classify and recognize the extracted features will directly affect the performance of the entire network. Traditional processing methods include classification models such as fully connected network models and support vector machines. In order to solve the problem that the traditional convolutional neural network is prone to over-fitting for the classification of small samples, a CNN-TWSVM hybrid model was proposed by fusing the twin support vector machine (TWSVM) with higher computational efficiency as the CNN classifier, and it was applied to the traffic sign recognition task. In order to improve the generalization ability of the model, the wavelet kernel function is introduced to deal with the nonlinear classification task. The method uses the network initialized from the ImageNet dataset to fine-tune the specific domain and intercept the inner layer of the network to extract the high abstract features of the traffic sign image. Finally, the TWSVM based on wavelet kernel function is used to identify the traffic signs, so as to effectively solve the over-fitting problem of traffic signs classification. On GTSRB and BELGIUMTS datasets, the validity and generalization ability of the improved model is verified by comparing with different kernel functions and different SVM classifiers.展开更多
Prostate cancer(PCa)symptoms are commonly confused with benign prostate hyperplasia(BPH),particularly in the early stages due to similarities between symptoms,and in some instances,underdiagnoses.Clinical methods have...Prostate cancer(PCa)symptoms are commonly confused with benign prostate hyperplasia(BPH),particularly in the early stages due to similarities between symptoms,and in some instances,underdiagnoses.Clinical methods have been utilized to diagnose PCa;however,at the full-blown stage,clinical methods usually present high risks of complicated side effects.Therefore,we proposed the use of support vector machine for early differential diagnosis of PCa(SVM-PCa-EDD).SVM was used to classify persons with and without PCa.We used the PCa dataset from the Kaggle Healthcare repository to develop and validate SVM model for classification.The PCa dataset consisted of 250 features and one class of features.Attributes considered in this study were age,body mass index(BMI),race,family history,obesity,trouble urinating,urine stream force,blood in semen,bone pain,and erectile dysfunction.The SVM-PCa-EDD was used for preprocessing the PCa dataset,specifically dealing with class imbalance,and for dimensionality reduction.After eliminating class imbalance,the area under the receiver operating characteristic(ROC)curve(AUC)of the logistic regression(LR)model trained with the downsampled dataset was 58.4%,whereas that of the AUC-ROC of LR trained with the class imbalance dataset was 54.3%.The SVM-PCa-EDD achieved 90%accuracy,80%sensitivity,and 80%specificity.The validation of SVM-PCa-EDD using random forest and LR showed that SVM-PCa-EDD performed better in early differential diagnosis of PCa.The proposed model can assist medical experts in early diagnosis of PCa,particularly in resource-constrained healthcare settings and making further recommendations for PCa testing and treatment.展开更多
Objective: Support Vector Machine (SVM) is a machine-learning method, based on the principle of structural risk minimization, which performs well when applied to data outside the training set. In this paper, SVM wa...Objective: Support Vector Machine (SVM) is a machine-learning method, based on the principle of structural risk minimization, which performs well when applied to data outside the training set. In this paper, SVM was applied to predict 5-year survival status of patients with nasopharyngeal carcinoma (NPC) after treatment, we expect to find a new way for prognosis studies in cancer so as to assist right clinical decision for individual patient. Methods: Two modelling methods were used in the study; SVM network and a standard parametric logistic regression were used to model 5-year survival status. And the two methods were compared on a prospective set of patients not used in model construction via receiver operating characteristic (ROC) curve analysis. Results: The SVM1, trained with the 25 original input variables without screening, yielded a ROC area of 0.868, at sensitivity to mortality of 79.2% and the specificity of 94.5%. Similarly, the SVM2, trained with 9 input variables which were obtained by optimal input variable selection from the 25 original variables by logistic regression screening, yielded a ROC area of 0.874, at a sensitivity to mortality of 79.2% and the specificity of 95.6%, while the logistic regression yielded a ROC area of 0.751 at a sensitivity to mortality of 66.7% and gave a specificity of 83.5%. Conclusion: SVM found a strong pattern in the database predictive of 5-year survival status. The logistic regression produces somewhat similar, but better, results. These results show that the SVM models have the potential to predict individual patient's 5-year survival status after treatment, and to assist the clinicians for making a good clinical decision.展开更多
The endpoint parameters are very important to the process of EAF steel-making, but their on-line measurement is difficult. The soft sensor technology is widely used for the prediction of endpoint parameters. Based on ...The endpoint parameters are very important to the process of EAF steel-making, but their on-line measurement is difficult. The soft sensor technology is widely used for the prediction of endpoint parameters. Based on the analysis of the smelting process of EAF and the advantages of support vector machines, a soft sensor model for predicting the endpoint parameters was built using multiple support vector machines (MSVM). In this model, the input space was divided by subtractive clustering and a sub-model based on LS-SVM was built in each sub-space. To decrease the correlation among the sub-models and to improve the accuracy and robustness of the model, the sub- models were combined by Principal Components Regression. The accuracy of the soft sensor model is perfectly improved. The simulation result demonstrates the practicability and efficiency of the MSVM model for the endpoint prediction of EAF.展开更多
In this study,we developed multiple hybrid machine-learning models to address parameter optimization limitations and enhance the spatial prediction of landslide susceptibility models.We created a geographic informatio...In this study,we developed multiple hybrid machine-learning models to address parameter optimization limitations and enhance the spatial prediction of landslide susceptibility models.We created a geographic information system database,and our analysis results were used to prepare a landslide inventory map containing 359 landslide events identified from Google Earth,aerial photographs,and other validated sources.A support vector regression(SVR)machine-learning model was used to divide the landslide inventory into training(70%)and testing(30%)datasets.The landslide susceptibility map was produced using 14 causative factors.We applied the established gray wolf optimization(GWO)algorithm,bat algorithm(BA),and cuckoo optimization algorithm(COA)to fine-tune the parameters of the SVR model to improve its predictive accuracy.The resultant hybrid models,SVR-GWO,SVR-BA,and SVR-COA,were validated in terms of the area under curve(AUC)and root mean square error(RMSE).The AUC values for the SVR-GWO(0.733),SVR-BA(0.724),and SVR-COA(0.738)models indicate their good prediction rates for landslide susceptibility modeling.SVR-COA had the greatest accuracy,with an RMSE of 0.21687,and SVR-BA had the least accuracy,with an RMSE of 0.23046.The three optimized hybrid models outperformed the SVR model(AUC=0.704,RMSE=0.26689),confirming the ability of metaheuristic algorithms to improve model performance.展开更多
Support vector machine(SVM) has shown great potential in pattern recognition and regressive estima-tion.Due to the industrial development demands,such as the fermentation process modeling,improving the training perfor...Support vector machine(SVM) has shown great potential in pattern recognition and regressive estima-tion.Due to the industrial development demands,such as the fermentation process modeling,improving the training performance on increasingly large sample sets is an important problem.However,solving a large optimization problem is computationally intensive and memory intensive.In this paper,a geometric interpretation of SVM re-gression(SVR) is derived,and μ-SVM is extended for both L1-norm and L2-norm penalty SVR.Further,Gilbert al-gorithm,a well-known geometric algorithm,is modified to solve SVR problems.Theoretical analysis indicates that the presented SVR training geometric algorithms have the same convergence and almost identical cost of computa-tion as their corresponding algorithms for SVM classification.Experimental results show that the geometric meth-ods are more efficient than conventional methods using quadratic programming and require much less memory.展开更多
A new multiple models(MM) approach was proposed to model complex industrial process by using Fuzzy Support Vector Machines(F -SVMs). By applying the proposed approach to a pH neutralization titration experiment, F -SV...A new multiple models(MM) approach was proposed to model complex industrial process by using Fuzzy Support Vector Machines(F -SVMs). By applying the proposed approach to a pH neutralization titration experiment, F -SVMs MM not only provides satisfactory approximation and generalization property, but also achieves superior performance to USOCPN multiple modeling method and single modeling method based on standard SVMs.展开更多
文摘The computational approaches of support vector machine (SVM), support vector regression (SVR) and molecular docking were widely utilized for the computation of active compounds. In this work, to improve the accuracy and reliability of prediction, the strategy of combining the above three computational approaches was applied to predict potential cytochrome P450 1A2 (CYP1A2) inhibitors. The accuracy of the optimal SVM qualitative model was 99.432%, 97.727%, and 91.667% for training set, internal test set and external test set, respectively, showing this model had high discrimination ability. The R2 and mean square error for the optimal SVR quantitative model were 0.763, 0.013 for training set, and 0.753, 0.056 for test set respectively, indicating that this SVR model has high predictive ability for the biolog-ical activities of compounds. According to the results of the SVM and SVR models, some types of descriptors were identi ed to be essential to bioactivity prediction of compounds, including the connectivity indices, constitutional descriptors and functional group counts. Moreover, molecular docking studies were used to reveal the binding poses and binding a n-ity of potential inhibitors interacting with CYP1A2. Wherein, the amino acids of THR124 and ASP320 could form key hydrogen bond interactions with active compounds. And the amino acids of ALA317 and GLY316 could form strong hydrophobic bond interactions with active compounds. The models obtained above were applied to discover potential CYP1A2 inhibitors from natural products, which could predict the CYPs-mediated drug-drug inter-actions and provide useful guidance and reference for rational drug combination therapy. A set of 20 potential CYP1A2 inhibitors were obtained. Part of the results was consistent with references, which further indicates the accuracy of these models and the reliability of this combinatorial computation strategy.
基金Hebei Province Key Research and Development Project(No.20313701D)Hebei Province Key Research and Development Project(No.19210404D)+13 种基金Mobile computing and universal equipment for the Beijing Key Laboratory Open Project,The National Social Science Fund of China(17AJL014)Beijing University of Posts and Telecommunications Construction of World-Class Disciplines and Characteristic Development Guidance Special Fund “Cultural Inheritance and Innovation”Project(No.505019221)National Natural Science Foundation of China(No.U1536112)National Natural Science Foundation of China(No.81673697)National Natural Science Foundation of China(61872046)The National Social Science Fund Key Project of China(No.17AJL014)“Blue Fire Project”(Huizhou)University of Technology Joint Innovation Project(CXZJHZ201729)Industry-University Cooperation Cooperative Education Project of the Ministry of Education(No.201902218004)Industry-University Cooperation Cooperative Education Project of the Ministry of Education(No.201902024006)Industry-University Cooperation Cooperative Education Project of the Ministry of Education(No.201901197007)Industry-University Cooperation Collaborative Education Project of the Ministry of Education(No.201901199005)The Ministry of Education Industry-University Cooperation Collaborative Education Project(No.201901197001)Shijiazhuang science and technology plan project(236240267A)Hebei Province key research and development plan project(20312701D)。
文摘The distribution of data has a significant impact on the results of classification.When the distribution of one class is insignificant compared to the distribution of another class,data imbalance occurs.This will result in rising outlier values and noise.Therefore,the speed and performance of classification could be greatly affected.Given the above problems,this paper starts with the motivation and mathematical representing of classification,puts forward a new classification method based on the relationship between different classification formulations.Combined with the vector characteristics of the actual problem and the choice of matrix characteristics,we firstly analyze the orderly regression to introduce slack variables to solve the constraint problem of the lone point.Then we introduce the fuzzy factors to solve the problem of the gap between the isolated points on the basis of the support vector machine.We introduce the cost control to solve the problem of sample skew.Finally,based on the bi-boundary support vector machine,a twostep weight setting twin classifier is constructed.This can help to identify multitasks with feature-selected patterns without the need for additional optimizers,which solves the problem of large-scale classification that can’t deal effectively with the very low category distribution gap.
文摘Support Vector-based learning methods are an important part of Computational Intelligence techniques. Recent efforts have been dealing with the problem of learning from very large datasets. This paper reviews the most commonly used formulations of support vector machines for regression (SVRs) aiming to emphasize its usability on large-scale applications. We review the general concept of support vector machines (SVMs), address the state-of-the-art on training methods SVMs, and explain the fundamental principle of SVRs. The most common learning methods for SVRs are introduced and linear programming-based SVR formulations are explained emphasizing its suitability for large-scale learning. Finally, this paper also discusses some open problems and current trends.
基金Project supported by the National Natural Science Foundation of China (Grant No 60573065)the Natural Science Foundation of Shandong Province,China (Grant No Y2007G33)the Key Subject Research Foundation of Shandong Province,China(Grant No XTD0708)
文摘In this paper we apply the nonlinear time series analysis method to small-time scale traffic measurement data. The prediction-based method is used to determine the embedding dimension of the traffic data. Based on the reconstructed phase space, the local support vector machine prediction method is used to predict the traffic measurement data, and the BIC-based neighbouring point selection method is used to choose the number of the nearest neighbouring points for the local support vector machine regression model. The experimental results show that the local support vector machine prediction method whose neighbouring points are optimized can effectively predict the small-time scale traffic measurement data and can reproduce the statistical features of real traffic measurements.
基金supported by the National Natural Science Foundation of China (61074127)
文摘As the solutions of the least squares support vector regression machine (LS-SVRM) are not sparse, it leads to slow prediction speed and limits its applications. The defects of the ex- isting adaptive pruning algorithm for LS-SVRM are that the training speed is slow, and the generalization performance is not satis- factory, especially for large scale problems. Hence an improved algorithm is proposed. In order to accelerate the training speed, the pruned data point and fast leave-one-out error are employed to validate the temporary model obtained after decremental learning. The novel objective function in the termination condition which in- volves the whole constraints generated by all training data points and three pruning strategies are employed to improve the generali- zation performance. The effectiveness of the proposed algorithm is tested on six benchmark datasets. The sparse LS-SVRM model has a faster training speed and better generalization performance.
基金Supported by the National Natural Science Foundation of China (No. 60771068)the Natural Science Basic Research Plan in Shaanxi Province of China (No. 2007F248)
文摘Tensor representation is useful to reduce the overfitting problem in vector-based learning algorithm in pattern recognition.This is mainly because the structure information of objects in pattern analysis is a reasonable constraint to reduce the number of unknown parameters used to model a classifier.In this paper, we generalize the vector-based learning algorithm TWin Support Vector Machine(TWSVM) to the tensor-based method TWin Support Tensor Machines(TWSTM), which accepts general tensors as input.To examine the effectiveness of TWSTM, we implement the TWSTM method for Microcalcification Clusters(MCs) detection.In the tensor subspace domain, the MCs detection procedure is formulated as a supervised learning and classification problem, and TWSTM is used as a classifier to make decision for the presence of MCs or not.A large number of experiments were carried out to evaluate and compare the performance of the proposed MCs detection algorithm.By comparison with TWSVM, the tensor version reduces the overfitting problem.
基金Project supported by the National Natural Science Foundation of China (Grant Nos. 10674172 and 10874229)
文摘In this paper a new continuous variable called core-ratio is defined to describe the probability for a residue to be in a binding site, thereby replacing the previous binary description of the interface residue using 0 and 1. So we can use the support vector machine regression method to fit the core-ratio value and predict the protein binding sites. We also design a new group of physical and chemical descriptors to characterize the binding sites. The new descriptors are more effective, with an averaging procedure used. Our test shows that much better prediction results can be obtained by the support vector regression (SVR) method than by the support vector classification method.
文摘Accurate cost estimation at the early stage of a construction project is key factor in a project’s success. But it is difficult to quickly and accurately estimate construction costs at the planning stage, when drawings, documentation and the like are still incomplete. As such, various techniques have been applied to accurately estimate construction costs at an early stage, when project information is limited. While the various techniques have their pros and cons, there has been little effort made to determine the best technique in terms of cost estimating performance. The objective of this research is to compare the accuracy of three estimating techniques (regression analysis (RA), neural network (NN), and support vector machine techniques (SVM)) by performing estimations of construction costs. By comparing the accuracy of these techniques using historical cost data, it was found that NN model showed more accurate estimation results than the RA and SVM models. Consequently, it is determined that NN model is most suitable for estimating the cost of school building projects.
基金This work was supported by the National Natural Science Foundation of China (No.30871341), the National High-Tech Research and Development Program of China (No.2006AA02-Z190), the Shanghai Leading Academic Discipline Project (No.S30405), and the Natural Science Foundation of Shanghai Normal University (No.SK200937).
文摘The stochastic simulation algorithm (SSA) accurately depicts spatially homogeneous wellstirred chemically reacting systems with small populations of chemical species and properly represents noise, but it is often abandoned when modeling larger systems because of its computational complexity. In this work, a twin support vector regression based stochastic simulations algorithm (TS^3A) is proposed by combining the twin support vector regression and SSA, the former is a well-known robust regression method in machine learning. Numerical results indicate that this proposed algorithm can be applied to a wide range of chemically reacting systems and obtain significant improvements on efficiency and accuracy with fewer simulating runs over the existing methods.
文摘Path loss prediction models are vital for accurate signal propagation in wireless channels. Empirical and deterministic models used in path loss predictions have not produced optimal results. In this paper, we introduced machine learning algorithms to path loss predictions because it offers a flexible network architecture and extensive data can be used. We introduced support vector regression (SVR) and radial basis function (RBF) models to path loss predictions in the investigated environments. The SVR model was able to process several input parameters without introducing complexity to the network architecture. The RBF on its part provides a good function approximation. Hyperparameter tuning of the machine learning models was carried out in order to achieve optimal results. The performances of the SVR and RBF models were compared and result validated using the root-mean squared error (RMSE). The two machine learning algorithms were also compared with the Cost-231, SUI, Egli, Freespace, Cost-231 W-I models. The analytical models overpredicted path loss. Overall, the machine learning models predicted path loss with greater accuracy than the empirical models. The SVR model performed best across all the indices with RMSE values of 1.378 dB, 1.4523 dB, 2.1568 dB in rural, suburban and urban settings respectively and should therefore be adopted for signal propagation in the investigated environments and beyond.
基金supported by the Fundamental Research Funds for University of Science and Technology Beijing(FRF-BR-12-021)
文摘In order to handle the semi-supervised problem quickly and efficiently in the twin support vector machine (TWSVM) field, a semi-supervised twin support vector machine (S2TSVM) is proposed by adding the original unlabeled samples. In S2TSVM, the addition of unlabeled samples can easily cause the classification hyper plane to deviate from the sample points. Then a centerdistance principle is proposed to pre-classify unlabeled samples, and a pre-classified S2TSVM (PS2TSVM) is proposed. Compared with S2TSVM, PS2TSVM not only improves the problem of the samples deviating from the classification hyper plane, but also improves the training speed. Then PS2TSVM is smoothed. After smoothing the model, the pre-classified smooth S2TSVM (PS3TSVM) is obtained, and its convergence is deduced. Finally, nine datasets are selected in the UCI machine learning database for comparison with other types of semi-supervised models. The experimental results show that the proposed PS3TSVM model has better classification results.
基金Supported by National Natural Science Foundation of China( No. 50705030).
文摘The principle of the support vector regression machine(SVR) is first analysed. Then the new data-dependent kernel function is constructed from information geometry perspective. The current waveforms change regularly in accordance with the different horizontal offset when the rotational frequency of the high speed rotational arc sensor is in the range from 15 Hz to 30 Hz. The welding current data is pretreated by wavelet filtering, mean filtering and normalization treatment. The SVR model is constructed by making use of the evolvement laws, the decision function can be achieved by training the SVR and the seam offset can be identified. The experimental results show that the precision of the offset identification can be greatly improved by modifying the SVR and applying mean filteringfrom the longitudinal direction.
文摘The prediction of magnitude (M) of reservoir induced earthquake is an important task in earthquake engineering. In this article, we employ a Support Vector Machine (SVM) and Gaussian Process Regression (GPR) for prediction of reservoir induced earthquake M based on reservoir parameters. Comprehensive parameter (E) and maximum reservoir depth] (H) are considered as inputs to the SVM and GPR. We give an equation for determination oil reservoir induced earthquake M. The developed SVM and GPR have been compared with] the Artificial Neural Network (ANN) method. The results show that the developed SVM and] GPR are efficient tools for prediction of reservoir induced earthquake M. /
文摘With the progress of deep learning research, convolutional neural networks have become the most important method in feature extraction. How to effectively classify and recognize the extracted features will directly affect the performance of the entire network. Traditional processing methods include classification models such as fully connected network models and support vector machines. In order to solve the problem that the traditional convolutional neural network is prone to over-fitting for the classification of small samples, a CNN-TWSVM hybrid model was proposed by fusing the twin support vector machine (TWSVM) with higher computational efficiency as the CNN classifier, and it was applied to the traffic sign recognition task. In order to improve the generalization ability of the model, the wavelet kernel function is introduced to deal with the nonlinear classification task. The method uses the network initialized from the ImageNet dataset to fine-tune the specific domain and intercept the inner layer of the network to extract the high abstract features of the traffic sign image. Finally, the TWSVM based on wavelet kernel function is used to identify the traffic signs, so as to effectively solve the over-fitting problem of traffic signs classification. On GTSRB and BELGIUMTS datasets, the validity and generalization ability of the improved model is verified by comparing with different kernel functions and different SVM classifiers.
文摘Prostate cancer(PCa)symptoms are commonly confused with benign prostate hyperplasia(BPH),particularly in the early stages due to similarities between symptoms,and in some instances,underdiagnoses.Clinical methods have been utilized to diagnose PCa;however,at the full-blown stage,clinical methods usually present high risks of complicated side effects.Therefore,we proposed the use of support vector machine for early differential diagnosis of PCa(SVM-PCa-EDD).SVM was used to classify persons with and without PCa.We used the PCa dataset from the Kaggle Healthcare repository to develop and validate SVM model for classification.The PCa dataset consisted of 250 features and one class of features.Attributes considered in this study were age,body mass index(BMI),race,family history,obesity,trouble urinating,urine stream force,blood in semen,bone pain,and erectile dysfunction.The SVM-PCa-EDD was used for preprocessing the PCa dataset,specifically dealing with class imbalance,and for dimensionality reduction.After eliminating class imbalance,the area under the receiver operating characteristic(ROC)curve(AUC)of the logistic regression(LR)model trained with the downsampled dataset was 58.4%,whereas that of the AUC-ROC of LR trained with the class imbalance dataset was 54.3%.The SVM-PCa-EDD achieved 90%accuracy,80%sensitivity,and 80%specificity.The validation of SVM-PCa-EDD using random forest and LR showed that SVM-PCa-EDD performed better in early differential diagnosis of PCa.The proposed model can assist medical experts in early diagnosis of PCa,particularly in resource-constrained healthcare settings and making further recommendations for PCa testing and treatment.
文摘Objective: Support Vector Machine (SVM) is a machine-learning method, based on the principle of structural risk minimization, which performs well when applied to data outside the training set. In this paper, SVM was applied to predict 5-year survival status of patients with nasopharyngeal carcinoma (NPC) after treatment, we expect to find a new way for prognosis studies in cancer so as to assist right clinical decision for individual patient. Methods: Two modelling methods were used in the study; SVM network and a standard parametric logistic regression were used to model 5-year survival status. And the two methods were compared on a prospective set of patients not used in model construction via receiver operating characteristic (ROC) curve analysis. Results: The SVM1, trained with the 25 original input variables without screening, yielded a ROC area of 0.868, at sensitivity to mortality of 79.2% and the specificity of 94.5%. Similarly, the SVM2, trained with 9 input variables which were obtained by optimal input variable selection from the 25 original variables by logistic regression screening, yielded a ROC area of 0.874, at a sensitivity to mortality of 79.2% and the specificity of 95.6%, while the logistic regression yielded a ROC area of 0.751 at a sensitivity to mortality of 66.7% and gave a specificity of 83.5%. Conclusion: SVM found a strong pattern in the database predictive of 5-year survival status. The logistic regression produces somewhat similar, but better, results. These results show that the SVM models have the potential to predict individual patient's 5-year survival status after treatment, and to assist the clinicians for making a good clinical decision.
基金Item Sponsored by National Natural Science Foundation of China (60374003)
文摘The endpoint parameters are very important to the process of EAF steel-making, but their on-line measurement is difficult. The soft sensor technology is widely used for the prediction of endpoint parameters. Based on the analysis of the smelting process of EAF and the advantages of support vector machines, a soft sensor model for predicting the endpoint parameters was built using multiple support vector machines (MSVM). In this model, the input space was divided by subtractive clustering and a sub-model based on LS-SVM was built in each sub-space. To decrease the correlation among the sub-models and to improve the accuracy and robustness of the model, the sub- models were combined by Principal Components Regression. The accuracy of the soft sensor model is perfectly improved. The simulation result demonstrates the practicability and efficiency of the MSVM model for the endpoint prediction of EAF.
基金supported by the Basic Research Project of the Korea Institute of Geoscience and Mineral Resources(KIGAM)Project of Environmental Business Big Data Platform and Center Construction funded by the Ministry of Science and ICT。
文摘In this study,we developed multiple hybrid machine-learning models to address parameter optimization limitations and enhance the spatial prediction of landslide susceptibility models.We created a geographic information system database,and our analysis results were used to prepare a landslide inventory map containing 359 landslide events identified from Google Earth,aerial photographs,and other validated sources.A support vector regression(SVR)machine-learning model was used to divide the landslide inventory into training(70%)and testing(30%)datasets.The landslide susceptibility map was produced using 14 causative factors.We applied the established gray wolf optimization(GWO)algorithm,bat algorithm(BA),and cuckoo optimization algorithm(COA)to fine-tune the parameters of the SVR model to improve its predictive accuracy.The resultant hybrid models,SVR-GWO,SVR-BA,and SVR-COA,were validated in terms of the area under curve(AUC)and root mean square error(RMSE).The AUC values for the SVR-GWO(0.733),SVR-BA(0.724),and SVR-COA(0.738)models indicate their good prediction rates for landslide susceptibility modeling.SVR-COA had the greatest accuracy,with an RMSE of 0.21687,and SVR-BA had the least accuracy,with an RMSE of 0.23046.The three optimized hybrid models outperformed the SVR model(AUC=0.704,RMSE=0.26689),confirming the ability of metaheuristic algorithms to improve model performance.
基金Supported by the National Natural Science Foundation of China (20476007,20676013)
文摘Support vector machine(SVM) has shown great potential in pattern recognition and regressive estima-tion.Due to the industrial development demands,such as the fermentation process modeling,improving the training performance on increasingly large sample sets is an important problem.However,solving a large optimization problem is computationally intensive and memory intensive.In this paper,a geometric interpretation of SVM re-gression(SVR) is derived,and μ-SVM is extended for both L1-norm and L2-norm penalty SVR.Further,Gilbert al-gorithm,a well-known geometric algorithm,is modified to solve SVR problems.Theoretical analysis indicates that the presented SVR training geometric algorithms have the same convergence and almost identical cost of computa-tion as their corresponding algorithms for SVM classification.Experimental results show that the geometric meth-ods are more efficient than conventional methods using quadratic programming and require much less memory.
基金National High Technology Research andDevelopment Program of China( Project 863 G2 0 0 1AA413 13 0
文摘A new multiple models(MM) approach was proposed to model complex industrial process by using Fuzzy Support Vector Machines(F -SVMs). By applying the proposed approach to a pH neutralization titration experiment, F -SVMs MM not only provides satisfactory approximation and generalization property, but also achieves superior performance to USOCPN multiple modeling method and single modeling method based on standard SVMs.