A new incremental support vector machine (SVM) algorithm is proposed which is based on multiple kernel learning. Through introducing multiple kernel learning into the SVM incremental learning, large scale data set l...A new incremental support vector machine (SVM) algorithm is proposed which is based on multiple kernel learning. Through introducing multiple kernel learning into the SVM incremental learning, large scale data set learning problem can be solved effectively. Furthermore, different punishments are adopted in allusion to the training subset and the acquired support vectors, which may help to improve the performance of SVM. Simulation results indicate that the proposed algorithm can not only solve the model selection problem in SVM incremental learning, but also improve the classification or prediction precision.展开更多
Turbopump condition monitoring is a significant approach to ensure the safety of liquid rocket engine (LRE).Because of lack of fault samples,a monitoring system cannot be trained on all possible condition patterns.T...Turbopump condition monitoring is a significant approach to ensure the safety of liquid rocket engine (LRE).Because of lack of fault samples,a monitoring system cannot be trained on all possible condition patterns.Thus it is important to differentiate abnormal or unknown patterns from normal pattern with novelty detection methods.One-class support vector machine (OCSVM) that has been commonly used for novelty detection cannot deal well with large scale samples.In order to model the normal pattern of the turbopump with OCSVM and so as to monitor the condition of the turbopump,a monitoring method that integrates OCSVM with incremental clustering is presented.In this method,the incremental clustering is used for sample reduction by extracting representative vectors from a large training set.The representative vectors are supposed to distribute uniformly in the object region and fulfill the region.And training OCSVM on these representative vectors yields a novelty detector.By applying this method to the analysis of the turbopump's historical test data,it shows that the incremental clustering algorithm can extract 91 representative points from more than 36 000 training vectors,and the OCSVM detector trained on these 91 representative points can recognize spikes in vibration signals caused by different abnormal events such as vane shedding,rub-impact and sensor faults.This monitoring method does not need fault samples during training as classical recognition methods.The method resolves the learning problem of large samples and is an alternative method for condition monitoring of the LRE turbopump.展开更多
The structure and function of proteins are closely related, and protein structure decides its function, therefore protein structure prediction is quite important.β-turns are important components of protein secondary ...The structure and function of proteins are closely related, and protein structure decides its function, therefore protein structure prediction is quite important.β-turns are important components of protein secondary structure. So development of an accurate prediction method ofβ-turn types is very necessary. In this paper, we used the composite vector with position conservation scoring function, increment of diversity and predictive secondary structure information as the input parameter of support vector machine algorithm for predicting theβ-turn types in the database of 426 protein chains, obtained the overall prediction accuracy of 95.6%, 97.8%, 97.0%, 98.9%, 99.2%, 91.8%, 99.4% and 83.9% with the Matthews Correlation Coefficient values of 0.74, 0.68, 0.20, 0.49, 0.23, 0.47, 0.49 and 0.53 for types I, II, VIII, I’, II’, IV, VI and nonturn respectively, which is better than other prediction.展开更多
Based on the research of predictingβ-hairpin motifs in proteins, we apply Random Forest and Support Vector Machine algorithm to predictβ-hairpin motifs in ArchDB40 dataset. The motifs with the loop length of 2 to 8 ...Based on the research of predictingβ-hairpin motifs in proteins, we apply Random Forest and Support Vector Machine algorithm to predictβ-hairpin motifs in ArchDB40 dataset. The motifs with the loop length of 2 to 8 amino acid residues are extracted as research object and thefixed-length pattern of 12 amino acids are selected. When using the same characteristic parameters and the same test method, Random Forest algorithm is more effective than Support Vector Machine. In addition, because of Random Forest algorithm doesn’t produce overfitting phenomenon while the dimension of characteristic parameters is higher, we use Random Forest based on higher dimension characteristic parameters to predictβ-hairpin motifs. The better prediction results are obtained;the overall accuracy and Matthew’s correlation coefficient of 5-fold cross-validation achieve 83.3% and 0.59, respectively.展开更多
As the solutions of the least squares support vector regression machine (LS-SVRM) are not sparse, it leads to slow prediction speed and limits its applications. The defects of the ex- isting adaptive pruning algorit...As the solutions of the least squares support vector regression machine (LS-SVRM) are not sparse, it leads to slow prediction speed and limits its applications. The defects of the ex- isting adaptive pruning algorithm for LS-SVRM are that the training speed is slow, and the generalization performance is not satis- factory, especially for large scale problems. Hence an improved algorithm is proposed. In order to accelerate the training speed, the pruned data point and fast leave-one-out error are employed to validate the temporary model obtained after decremental learning. The novel objective function in the termination condition which in- volves the whole constraints generated by all training data points and three pruning strategies are employed to improve the generali- zation performance. The effectiveness of the proposed algorithm is tested on six benchmark datasets. The sparse LS-SVRM model has a faster training speed and better generalization performance.展开更多
According to the classic Karush-Kuhn-Tucker(KKT)theorem,at every step of incremental support vector machine(SVM)learning,the newly adding sample which violates the KKT conditions will be a new support vector(SV)and mi...According to the classic Karush-Kuhn-Tucker(KKT)theorem,at every step of incremental support vector machine(SVM)learning,the newly adding sample which violates the KKT conditions will be a new support vector(SV)and migrate the old samples between SV set and non-support vector(NSV)set,and at the same time the learning model should be updated based on the SVs.However,it is not exactly clear at this moment that which of the old samples would change between SVs and NSVs.Additionally,the learning model will be unnecessarily updated,which will not greatly increase its accuracy but decrease the training speed.Therefore,how to choose the new SVs from old sets during the incremental stages and when to process incremental steps will greatly influence the accuracy and efficiency of incremental SVM learning.In this work,a new algorithm is proposed to select candidate SVs and use the wrongly predicted sample to trigger the incremental processing simultaneously.Experimental results show that the proposed algorithm can achieve good performance with high efficiency,high speed and good accuracy.展开更多
This paper analyzed the theory of incremental learning of SVM (support vector machine) and pointed out it is a shortage that the support vector optimization is only considered in present research of SVM incremental le...This paper analyzed the theory of incremental learning of SVM (support vector machine) and pointed out it is a shortage that the support vector optimization is only considered in present research of SVM incremental learning. According to the significance of keyword in training, a new incremental training method considering keyword adjusting was proposed, which eliminates the difference between incremental learning and batch learning through the keyword adjusting. The experimental results show that the improved method outperforms the method without the keyword adjusting and achieve the same precision as the batch method. Key words SVM (support vector machine) - incremental training - classification - keyword adjusting CLC number TP 18 Foundation item: Supported by the National Information Industry Development Foundation of ChinaBiography: SUN Jin-wen (1972-), male, Post-Doctoral, research direction: artificial intelligence, data mining and system integration.展开更多
Based on the concept of the pseudo amino acid composition (PseAAC), protein structural classes are predicted by using an approach of increment of diversity combined with support vector machine (ID-SVM), in which t...Based on the concept of the pseudo amino acid composition (PseAAC), protein structural classes are predicted by using an approach of increment of diversity combined with support vector machine (ID-SVM), in which the dipeptide amino acid composition of proteins is used as the source of diversity. Jackknife test shows that total prediction accuracy is 96.6% and higher than that given by other approaches. Besides, the specificity (Sp) and the Matthew's correlation coefficient (MCC) are also calculated for each protein structural class, the Sp is more than 88%, the MCC is higher than 92%, and the higher MCC and Sp imply that it is credible to use ID-SVM model predicting protein structural class. The results indicate that: 1 the choice of the source of diversity is reasonable, 2 the predictive performance of IDSVM is excellent, and3 the amino acid sequences of proteins contain information of protein structural classes.展开更多
To overcome the problem that soft sensor models cannot be updated with the process changes, a soft sensor modeling algorithm based on hybrid fuzzy c-means (FCM) algorithm and incremental support vector machines (I...To overcome the problem that soft sensor models cannot be updated with the process changes, a soft sensor modeling algorithm based on hybrid fuzzy c-means (FCM) algorithm and incremental support vector machines (ISVM) is proposed. This hybrid algorithm FCMISVM includes three parts: samples clustering based on FCM algorithm, learning algorithm based on ISVM, and heuristic sample displacement method. In the training process, the training samples are first clustered by the FCM algorithm, and then by training each clustering with the SVM algorithm, a sub-model is built to each clustering. In the predicting process, when an incremental sample that represents new operation information is introduced in the model, the fuzzy membership function of the sample to each clustering is first computed by the FCM algorithm. Then, a corresponding SVM sub-model of the clustering with the largest fuzzy membership function is used to predict and perform incremental learning so the model can be updated on-line. An old sample chosen by heuristic sample displacement method is then discarded from the sub-model to control the size of the working set. The proposed method is applied to predict the p-xylene (PX) purity in the adsorption separation process. Simulation results indicate that the proposed method actually increases the model's adaptive abilities to various operation conditions and improves its generalization capability.展开更多
基金supported by the National Natural Science Key Foundation of China(69974021)
文摘A new incremental support vector machine (SVM) algorithm is proposed which is based on multiple kernel learning. Through introducing multiple kernel learning into the SVM incremental learning, large scale data set learning problem can be solved effectively. Furthermore, different punishments are adopted in allusion to the training subset and the acquired support vectors, which may help to improve the performance of SVM. Simulation results indicate that the proposed algorithm can not only solve the model selection problem in SVM incremental learning, but also improve the classification or prediction precision.
基金supported by National Natural Science Foundation of China (Grant No. 50675219)Hu’nan Provincial Science Committee Excellent Youth Foundation of China (Grant No. 08JJ1008)
文摘Turbopump condition monitoring is a significant approach to ensure the safety of liquid rocket engine (LRE).Because of lack of fault samples,a monitoring system cannot be trained on all possible condition patterns.Thus it is important to differentiate abnormal or unknown patterns from normal pattern with novelty detection methods.One-class support vector machine (OCSVM) that has been commonly used for novelty detection cannot deal well with large scale samples.In order to model the normal pattern of the turbopump with OCSVM and so as to monitor the condition of the turbopump,a monitoring method that integrates OCSVM with incremental clustering is presented.In this method,the incremental clustering is used for sample reduction by extracting representative vectors from a large training set.The representative vectors are supposed to distribute uniformly in the object region and fulfill the region.And training OCSVM on these representative vectors yields a novelty detector.By applying this method to the analysis of the turbopump's historical test data,it shows that the incremental clustering algorithm can extract 91 representative points from more than 36 000 training vectors,and the OCSVM detector trained on these 91 representative points can recognize spikes in vibration signals caused by different abnormal events such as vane shedding,rub-impact and sensor faults.This monitoring method does not need fault samples during training as classical recognition methods.The method resolves the learning problem of large samples and is an alternative method for condition monitoring of the LRE turbopump.
文摘The structure and function of proteins are closely related, and protein structure decides its function, therefore protein structure prediction is quite important.β-turns are important components of protein secondary structure. So development of an accurate prediction method ofβ-turn types is very necessary. In this paper, we used the composite vector with position conservation scoring function, increment of diversity and predictive secondary structure information as the input parameter of support vector machine algorithm for predicting theβ-turn types in the database of 426 protein chains, obtained the overall prediction accuracy of 95.6%, 97.8%, 97.0%, 98.9%, 99.2%, 91.8%, 99.4% and 83.9% with the Matthews Correlation Coefficient values of 0.74, 0.68, 0.20, 0.49, 0.23, 0.47, 0.49 and 0.53 for types I, II, VIII, I’, II’, IV, VI and nonturn respectively, which is better than other prediction.
文摘Based on the research of predictingβ-hairpin motifs in proteins, we apply Random Forest and Support Vector Machine algorithm to predictβ-hairpin motifs in ArchDB40 dataset. The motifs with the loop length of 2 to 8 amino acid residues are extracted as research object and thefixed-length pattern of 12 amino acids are selected. When using the same characteristic parameters and the same test method, Random Forest algorithm is more effective than Support Vector Machine. In addition, because of Random Forest algorithm doesn’t produce overfitting phenomenon while the dimension of characteristic parameters is higher, we use Random Forest based on higher dimension characteristic parameters to predictβ-hairpin motifs. The better prediction results are obtained;the overall accuracy and Matthew’s correlation coefficient of 5-fold cross-validation achieve 83.3% and 0.59, respectively.
基金supported by the National Natural Science Foundation of China (61074127)
文摘As the solutions of the least squares support vector regression machine (LS-SVRM) are not sparse, it leads to slow prediction speed and limits its applications. The defects of the ex- isting adaptive pruning algorithm for LS-SVRM are that the training speed is slow, and the generalization performance is not satis- factory, especially for large scale problems. Hence an improved algorithm is proposed. In order to accelerate the training speed, the pruned data point and fast leave-one-out error are employed to validate the temporary model obtained after decremental learning. The novel objective function in the termination condition which in- volves the whole constraints generated by all training data points and three pruning strategies are employed to improve the generali- zation performance. The effectiveness of the proposed algorithm is tested on six benchmark datasets. The sparse LS-SVRM model has a faster training speed and better generalization performance.
基金supported by the National Natural Science Foundation of China(Nos.U1509207 and 61325019)
文摘According to the classic Karush-Kuhn-Tucker(KKT)theorem,at every step of incremental support vector machine(SVM)learning,the newly adding sample which violates the KKT conditions will be a new support vector(SV)and migrate the old samples between SV set and non-support vector(NSV)set,and at the same time the learning model should be updated based on the SVs.However,it is not exactly clear at this moment that which of the old samples would change between SVs and NSVs.Additionally,the learning model will be unnecessarily updated,which will not greatly increase its accuracy but decrease the training speed.Therefore,how to choose the new SVs from old sets during the incremental stages and when to process incremental steps will greatly influence the accuracy and efficiency of incremental SVM learning.In this work,a new algorithm is proposed to select candidate SVs and use the wrongly predicted sample to trigger the incremental processing simultaneously.Experimental results show that the proposed algorithm can achieve good performance with high efficiency,high speed and good accuracy.
文摘This paper analyzed the theory of incremental learning of SVM (support vector machine) and pointed out it is a shortage that the support vector optimization is only considered in present research of SVM incremental learning. According to the significance of keyword in training, a new incremental training method considering keyword adjusting was proposed, which eliminates the difference between incremental learning and batch learning through the keyword adjusting. The experimental results show that the improved method outperforms the method without the keyword adjusting and achieve the same precision as the batch method. Key words SVM (support vector machine) - incremental training - classification - keyword adjusting CLC number TP 18 Foundation item: Supported by the National Information Industry Development Foundation of ChinaBiography: SUN Jin-wen (1972-), male, Post-Doctoral, research direction: artificial intelligence, data mining and system integration.
基金Supported by the National Natural Science Foundation of China (30660044)
文摘Based on the concept of the pseudo amino acid composition (PseAAC), protein structural classes are predicted by using an approach of increment of diversity combined with support vector machine (ID-SVM), in which the dipeptide amino acid composition of proteins is used as the source of diversity. Jackknife test shows that total prediction accuracy is 96.6% and higher than that given by other approaches. Besides, the specificity (Sp) and the Matthew's correlation coefficient (MCC) are also calculated for each protein structural class, the Sp is more than 88%, the MCC is higher than 92%, and the higher MCC and Sp imply that it is credible to use ID-SVM model predicting protein structural class. The results indicate that: 1 the choice of the source of diversity is reasonable, 2 the predictive performance of IDSVM is excellent, and3 the amino acid sequences of proteins contain information of protein structural classes.
基金Supported by the National Natural Science Foundation of China (60421002) and priority supported financially by "the New Century 151 Talent Project" of Zhejiang Province.
文摘To overcome the problem that soft sensor models cannot be updated with the process changes, a soft sensor modeling algorithm based on hybrid fuzzy c-means (FCM) algorithm and incremental support vector machines (ISVM) is proposed. This hybrid algorithm FCMISVM includes three parts: samples clustering based on FCM algorithm, learning algorithm based on ISVM, and heuristic sample displacement method. In the training process, the training samples are first clustered by the FCM algorithm, and then by training each clustering with the SVM algorithm, a sub-model is built to each clustering. In the predicting process, when an incremental sample that represents new operation information is introduced in the model, the fuzzy membership function of the sample to each clustering is first computed by the FCM algorithm. Then, a corresponding SVM sub-model of the clustering with the largest fuzzy membership function is used to predict and perform incremental learning so the model can be updated on-line. An old sample chosen by heuristic sample displacement method is then discarded from the sub-model to control the size of the working set. The proposed method is applied to predict the p-xylene (PX) purity in the adsorption separation process. Simulation results indicate that the proposed method actually increases the model's adaptive abilities to various operation conditions and improves its generalization capability.