The distribution of data has a significant impact on the results of classification.When the distribution of one class is insignificant compared to the distribution of another class,data imbalance occurs.This will resu...The distribution of data has a significant impact on the results of classification.When the distribution of one class is insignificant compared to the distribution of another class,data imbalance occurs.This will result in rising outlier values and noise.Therefore,the speed and performance of classification could be greatly affected.Given the above problems,this paper starts with the motivation and mathematical representing of classification,puts forward a new classification method based on the relationship between different classification formulations.Combined with the vector characteristics of the actual problem and the choice of matrix characteristics,we firstly analyze the orderly regression to introduce slack variables to solve the constraint problem of the lone point.Then we introduce the fuzzy factors to solve the problem of the gap between the isolated points on the basis of the support vector machine.We introduce the cost control to solve the problem of sample skew.Finally,based on the bi-boundary support vector machine,a twostep weight setting twin classifier is constructed.This can help to identify multitasks with feature-selected patterns without the need for additional optimizers,which solves the problem of large-scale classification that can’t deal effectively with the very low category distribution gap.展开更多
Key variable identification for classifications is related to many trouble-shooting problems in process indus-tries. Recursive feature elimination based on support vector machine (SVM-RFE) has been proposed recently i...Key variable identification for classifications is related to many trouble-shooting problems in process indus-tries. Recursive feature elimination based on support vector machine (SVM-RFE) has been proposed recently in applica-tion for feature selection in cancer diagnosis. In this paper, SVM-RFE is used to the key variable selection in fault diag-nosis, and an accelerated SVM-RFE procedure based on heuristic criterion is proposed. The data from Tennessee East-man process (TEP) simulator is used to evaluate the effectiveness of the key variable selection using accelerated SVM-RFE (A-SVM-RFE). A-SVM-RFE integrates computational rate and algorithm effectiveness into a consistent framework. It not only can correctly identify the key variables, but also has very good computational rate. In comparison with contribution charts combined with principal component aralysis (PCA) and other two SVM-RFE algorithms, A-SVM-RFE performs better. It is more fitting for industrial application.展开更多
This study describes a classification methodology based on support vector machines(SVMs),which offer superior classification performance for fault diagnosis in chemical process engineering.The method incorporates an e...This study describes a classification methodology based on support vector machines(SVMs),which offer superior classification performance for fault diagnosis in chemical process engineering.The method incorporates an efficient parameter tuning procedure(based on minimization of radius/margin bound for SVM's leave-one-out errors)into a multi-class classification strategy using a fuzzy decision factor,which is named fuzzy support vector machine(FSVM).The datasets generated from the Tennessee Eastman process(TEP)simulator were used to evaluate the clas-sification performance.To decrease the negative influence of the auto-correlated and irrelevant variables,a key vari-able identification procedure using recursive feature elimination,based on the SVM is implemented,with time lags incorporated,before every classifier is trained,and the number of relatively important variables to every classifier is basically determined by 10-fold cross-validation.Performance comparisons are implemented among several kinds of multi-class decision machines,by which the effectiveness of the proposed approach is proved.展开更多
To solve the problems of SVM in dealing with large sample size and asymmetric distributed samples, a support vector classification algorithm based on variable parameter linear programming is proposed. In the proposed ...To solve the problems of SVM in dealing with large sample size and asymmetric distributed samples, a support vector classification algorithm based on variable parameter linear programming is proposed. In the proposed algorithm, linear programming is employed to solve the optimization problem of classification to decrease the computation time and to reduce its complexity when compared with the original model. The adjusted punishment parameter greatly reduced the classification error resulting from asymmetric distributed samples and the detailed procedure of the proposed algorithm is given. An experiment is conducted to verify whether the proposed algorithm is suitable for asymmetric distributed samples.展开更多
Large-scale wind turbine generator systems have strong nonlinear multivariable characteristics with many uncertain factors and disturbances. Automatic control is crucial for the efficiency and reliability of wind turb...Large-scale wind turbine generator systems have strong nonlinear multivariable characteristics with many uncertain factors and disturbances. Automatic control is crucial for the efficiency and reliability of wind turbines. On the basis of simplified and proper model of variable speed variable pitch wind turbines, the effective wind speed is estimated using extended Kaiman filter. Intelligent control schemes proposed in the paper include two loops which operate in synchronism with each other. At below-rated wind speed, the inner loop adopts adaptive fuzzy control based on variable universe for generator torque regulation to realize maximum wind energy capture. At above-rated wind speed, a controller based on least square support vector machine is proposed to adjust pitch angle and keep rated output power. The simulation shows the effectiveness of the intelligent control.展开更多
In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficie...In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficient and effective searching environment for users to query their images more easily. In this paper,a semi-supervised learning based probabilistic latent semantic analysis( PLSA) model for automatic image annotation is presenred. Since it's often hard to obtain or create labeled images in large quantities while unlabeled ones are easier to collect,a transductive support vector machine( TSVM) is exploited to enhance the quality of the training image data. Then,different image features with different magnitudes will result in different performance for automatic image annotation. To this end,a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible. Finally,a PLSA model with asymmetric modalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores. Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PLSA for the task of automatic image annotation.展开更多
We propose the threshold updating method for terminating variable selection and two variable selection methods. In the threshold updating method, we update the threshold value when the approximation error smaller than...We propose the threshold updating method for terminating variable selection and two variable selection methods. In the threshold updating method, we update the threshold value when the approximation error smaller than the current threshold value is obtained. The first variable selection method is the combination of forward selection by block addi-tion and backward selection by block deletion. In this method, starting from the empty set of the input variables, we add several input variables at a time until the approximation error is below the threshold value. Then we search deletable variables by block deletion. The second method is the combination of the first method and variable selection by Linear Programming Support Vector Regressors (LPSVRs). By training an LPSVR with linear kernels, we evaluate the weights of the decision function and delete the input variables whose associated absolute weights are zero. Then we carry out block addition and block deletion. By computer experiments using benchmark data sets, we show that the proposed methods can perform faster variable selection than the method only using block deletion, and that by the threshold updating method, the approximation error is lower than that by the fixed threshold method. We also compare our method with an imbedded method, which determines the optimal variables during training, and show that our method gives comparable or better variable selection performance.展开更多
基金Hebei Province Key Research and Development Project(No.20313701D)Hebei Province Key Research and Development Project(No.19210404D)+13 种基金Mobile computing and universal equipment for the Beijing Key Laboratory Open Project,The National Social Science Fund of China(17AJL014)Beijing University of Posts and Telecommunications Construction of World-Class Disciplines and Characteristic Development Guidance Special Fund “Cultural Inheritance and Innovation”Project(No.505019221)National Natural Science Foundation of China(No.U1536112)National Natural Science Foundation of China(No.81673697)National Natural Science Foundation of China(61872046)The National Social Science Fund Key Project of China(No.17AJL014)“Blue Fire Project”(Huizhou)University of Technology Joint Innovation Project(CXZJHZ201729)Industry-University Cooperation Cooperative Education Project of the Ministry of Education(No.201902218004)Industry-University Cooperation Cooperative Education Project of the Ministry of Education(No.201902024006)Industry-University Cooperation Cooperative Education Project of the Ministry of Education(No.201901197007)Industry-University Cooperation Collaborative Education Project of the Ministry of Education(No.201901199005)The Ministry of Education Industry-University Cooperation Collaborative Education Project(No.201901197001)Shijiazhuang science and technology plan project(236240267A)Hebei Province key research and development plan project(20312701D)。
文摘The distribution of data has a significant impact on the results of classification.When the distribution of one class is insignificant compared to the distribution of another class,data imbalance occurs.This will result in rising outlier values and noise.Therefore,the speed and performance of classification could be greatly affected.Given the above problems,this paper starts with the motivation and mathematical representing of classification,puts forward a new classification method based on the relationship between different classification formulations.Combined with the vector characteristics of the actual problem and the choice of matrix characteristics,we firstly analyze the orderly regression to introduce slack variables to solve the constraint problem of the lone point.Then we introduce the fuzzy factors to solve the problem of the gap between the isolated points on the basis of the support vector machine.We introduce the cost control to solve the problem of sample skew.Finally,based on the bi-boundary support vector machine,a twostep weight setting twin classifier is constructed.This can help to identify multitasks with feature-selected patterns without the need for additional optimizers,which solves the problem of large-scale classification that can’t deal effectively with the very low category distribution gap.
基金Supported by China 973 Program (No.2002CB312200), the National Natural Science Foundation of China (No.60574019 and No.60474045), the Key Technologies R&D Program of Zhejiang Province (No.2005C21087) and the Academician Foundation of Zhejiang Province (No.2005A1001-13).
文摘Key variable identification for classifications is related to many trouble-shooting problems in process indus-tries. Recursive feature elimination based on support vector machine (SVM-RFE) has been proposed recently in applica-tion for feature selection in cancer diagnosis. In this paper, SVM-RFE is used to the key variable selection in fault diag-nosis, and an accelerated SVM-RFE procedure based on heuristic criterion is proposed. The data from Tennessee East-man process (TEP) simulator is used to evaluate the effectiveness of the key variable selection using accelerated SVM-RFE (A-SVM-RFE). A-SVM-RFE integrates computational rate and algorithm effectiveness into a consistent framework. It not only can correctly identify the key variables, but also has very good computational rate. In comparison with contribution charts combined with principal component aralysis (PCA) and other two SVM-RFE algorithms, A-SVM-RFE performs better. It is more fitting for industrial application.
基金Supported by the Special Funds for Major State Basic Research Program of China (973 Program,No.2002CB312200)the Na-tional Natural Science Foundation of China (No.60574019,No.60474045)+1 种基金the Key Technologies R&D Program of Zhejiang Province (No.2005C21087)the Academician Foundation of Zhejiang Province (No.2005A1001-13).
文摘This study describes a classification methodology based on support vector machines(SVMs),which offer superior classification performance for fault diagnosis in chemical process engineering.The method incorporates an efficient parameter tuning procedure(based on minimization of radius/margin bound for SVM's leave-one-out errors)into a multi-class classification strategy using a fuzzy decision factor,which is named fuzzy support vector machine(FSVM).The datasets generated from the Tennessee Eastman process(TEP)simulator were used to evaluate the clas-sification performance.To decrease the negative influence of the auto-correlated and irrelevant variables,a key vari-able identification procedure using recursive feature elimination,based on the SVM is implemented,with time lags incorporated,before every classifier is trained,and the number of relatively important variables to every classifier is basically determined by 10-fold cross-validation.Performance comparisons are implemented among several kinds of multi-class decision machines,by which the effectiveness of the proposed approach is proved.
基金the National Natural Science Foundation of China (70471074)China Postdoctoral Science Foundation(2005038042)Department of Science and Technology of Guangdong Province(2004B36001051).
文摘To solve the problems of SVM in dealing with large sample size and asymmetric distributed samples, a support vector classification algorithm based on variable parameter linear programming is proposed. In the proposed algorithm, linear programming is employed to solve the optimization problem of classification to decrease the computation time and to reduce its complexity when compared with the original model. The adjusted punishment parameter greatly reduced the classification error resulting from asymmetric distributed samples and the detailed procedure of the proposed algorithm is given. An experiment is conducted to verify whether the proposed algorithm is suitable for asymmetric distributed samples.
文摘Large-scale wind turbine generator systems have strong nonlinear multivariable characteristics with many uncertain factors and disturbances. Automatic control is crucial for the efficiency and reliability of wind turbines. On the basis of simplified and proper model of variable speed variable pitch wind turbines, the effective wind speed is estimated using extended Kaiman filter. Intelligent control schemes proposed in the paper include two loops which operate in synchronism with each other. At below-rated wind speed, the inner loop adopts adaptive fuzzy control based on variable universe for generator torque regulation to realize maximum wind energy capture. At above-rated wind speed, a controller based on least square support vector machine is proposed to adjust pitch angle and keep rated output power. The simulation shows the effectiveness of the intelligent control.
基金Supported by the National Program on Key Basic Research Project(No.2013CB329502)the National Natural Science Foundation of China(No.61202212)+1 种基金the Special Research Project of the Educational Department of Shaanxi Province of China(No.15JK1038)the Key Research Project of Baoji University of Arts and Sciences(No.ZK16047)
文摘In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficient and effective searching environment for users to query their images more easily. In this paper,a semi-supervised learning based probabilistic latent semantic analysis( PLSA) model for automatic image annotation is presenred. Since it's often hard to obtain or create labeled images in large quantities while unlabeled ones are easier to collect,a transductive support vector machine( TSVM) is exploited to enhance the quality of the training image data. Then,different image features with different magnitudes will result in different performance for automatic image annotation. To this end,a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible. Finally,a PLSA model with asymmetric modalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores. Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PLSA for the task of automatic image annotation.
文摘We propose the threshold updating method for terminating variable selection and two variable selection methods. In the threshold updating method, we update the threshold value when the approximation error smaller than the current threshold value is obtained. The first variable selection method is the combination of forward selection by block addi-tion and backward selection by block deletion. In this method, starting from the empty set of the input variables, we add several input variables at a time until the approximation error is below the threshold value. Then we search deletable variables by block deletion. The second method is the combination of the first method and variable selection by Linear Programming Support Vector Regressors (LPSVRs). By training an LPSVR with linear kernels, we evaluate the weights of the decision function and delete the input variables whose associated absolute weights are zero. Then we carry out block addition and block deletion. By computer experiments using benchmark data sets, we show that the proposed methods can perform faster variable selection than the method only using block deletion, and that by the threshold updating method, the approximation error is lower than that by the fixed threshold method. We also compare our method with an imbedded method, which determines the optimal variables during training, and show that our method gives comparable or better variable selection performance.