Key variable identification for classifications is related to many trouble-shooting problems in process indus-tries. Recursive feature elimination based on support vector machine (SVM-RFE) has been proposed recently i...Key variable identification for classifications is related to many trouble-shooting problems in process indus-tries. Recursive feature elimination based on support vector machine (SVM-RFE) has been proposed recently in applica-tion for feature selection in cancer diagnosis. In this paper, SVM-RFE is used to the key variable selection in fault diag-nosis, and an accelerated SVM-RFE procedure based on heuristic criterion is proposed. The data from Tennessee East-man process (TEP) simulator is used to evaluate the effectiveness of the key variable selection using accelerated SVM-RFE (A-SVM-RFE). A-SVM-RFE integrates computational rate and algorithm effectiveness into a consistent framework. It not only can correctly identify the key variables, but also has very good computational rate. In comparison with contribution charts combined with principal component aralysis (PCA) and other two SVM-RFE algorithms, A-SVM-RFE performs better. It is more fitting for industrial application.展开更多
This letter adopts a GA (Genetic Algorithm) approach to assist in learning scaling of features that are most favorable to SVM (Support Vector Machines) classifier, which is named as GA-SVM. The relevant coefficients o...This letter adopts a GA (Genetic Algorithm) approach to assist in learning scaling of features that are most favorable to SVM (Support Vector Machines) classifier, which is named as GA-SVM. The relevant coefficients of various features to the classification task, measured by real-valued scaling, are estimated efficiently by using GA. And GA exploits heavy-bias operator to promote sparsity in the scaling of features. There are many potential benefits of this method:Feature selection is performed by eliminating irrelevant features whose scaling is zero, an SVM classifier that has enhanced generalization ability can be learned simultaneously. Experimental comparisons using original SVM and GA-SVM demonstrate both economical feature selection and excellent classification accuracy on junk e-mail recognition problem and Internet ad recognition problem. The experimental results show that comparing with original SVM classifier, the number of support vector decreases significantly and better classification results are achieved based on GA-SVM. It also demonstrates that GA can provide a simple, general, and powerful framework for tuning parameters in optimal problem, which directly improves the recognition performance and recognition rate of SVM.展开更多
The model of optimization problem for Support Vector Machine(SVM) is provided, which based on the definitions of the dual norm and the distance between a point and its projection onto a given plane. The model of impro...The model of optimization problem for Support Vector Machine(SVM) is provided, which based on the definitions of the dual norm and the distance between a point and its projection onto a given plane. The model of improved Support Vector Machine based on 1-norm(1-SVM) is provided from the optimization problem, yet it is a discrete programming. With the smoothing technique and optimality knowledge, the discrete programming is changed into a continuous programming. Experimental results show that the algorithm is easy to implement and this method can select and suppress the problem features more efficiently. Illustrative examples show that the 1-SVM deal with the linear or nonlinear classification well.展开更多
Depth estimation of subsurface faults is one of the problems in gravity interpretation. We tried using the support vector classifier (SVC) method in the estimation. Using forward and nonlinear inverse techniques, de...Depth estimation of subsurface faults is one of the problems in gravity interpretation. We tried using the support vector classifier (SVC) method in the estimation. Using forward and nonlinear inverse techniques, detecting the depth of subsurface faults with related error is possible but it is necessary to have an initial guess for the depth and this initial guess usually comes from non-gravity data. We introduce SVC in this paper as one of the tools for estimating the depth of subsurface faults using gravity data. We can suppose that each subsurface fault depth is a class and that SVC is a classification algorithm. To better use the SVC algorithm, we select proper depth estimation features using a proper features selection (FS) algorithm. In this research, we produce a training set consisting of synthetic gravity profiles created by subsurface faults at different depths to train the SVC code to estimate the depth of real subsurface faults. Then we test our trained SVC code by a testing set consisting of other synthetic gravity profiles created by subsurface faults at different depths. We also tested our trained SVC code using real data.展开更多
Software defect prediction is aimed to find potential defects based on historical data and software features. Software features can reflect the characteristics of software modules. However, some of these features may ...Software defect prediction is aimed to find potential defects based on historical data and software features. Software features can reflect the characteristics of software modules. However, some of these features may be more relevant to the class (defective or non-defective), but others may be redundant or irrelevant. To fully measure the correlation between different features and the class, we present a feature selection approach based on a similarity measure (SM) for software defect prediction. First, the feature weights are updated according to the similarity of samples in different classes. Second, a feature ranking list is generated by sorting the feature weights in descending order, and all feature subsets are selected from the feature ranking list in sequence. Finally, all feature subsets are evaluated on a k-nearest neighbor (KNN) model and measured by an area under curve (AUC) metric for classification performance. The experiments are conducted on 11 National Aeronautics and Space Administration (NASA) datasets, and the results show that our approach performs better than or is comparable to the compared feature selection approaches in terms of classification performance.展开更多
High-dimensional data have frequently been collected in many scientific areas including genomewide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of highdimensional dat...High-dimensional data have frequently been collected in many scientific areas including genomewide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of highdimensional data poses many challenges for statisticians. Feature selection and variable selection are fundamental for high-dimensional data analysis. The sparsity principle, which assumes that only a small number of predictors contribute to the response, is frequently adopted and deemed useful in the analysis of high-dimensional data.Following this general principle, a large number of variable selection approaches via penalized least squares or likelihood have been developed in the recent literature to estimate a sparse model and select significant variables simultaneously. While the penalized variable selection methods have been successfully applied in many highdimensional analyses, modern applications in areas such as genomics and proteomics push the dimensionality of data to an even larger scale, where the dimension of data may grow exponentially with the sample size. This has been called ultrahigh-dimensional data in the literature. This work aims to present a selective overview of feature screening procedures for ultrahigh-dimensional data. We focus on insights into how to construct marginal utilities for feature screening on specific models and motivation for the need of model-free feature screening procedures.展开更多
A series of Schiff base aluminum(III) complexes bearing morpholinomethyl substituents were synthesized. Comprehensive investigations on their stereoselective and kinetic features in the ring opening polymerization of ...A series of Schiff base aluminum(III) complexes bearing morpholinomethyl substituents were synthesized. Comprehensive investigations on their stereoselective and kinetic features in the ring opening polymerization of lactide were carried out. The ring opening polymerization proved to be first-order in the catalyst and the monomer. Linear relationships between the numberaverage molecular weight of the polylactide and the monomer conversion were consistent with a well-controlled polymerization. The propagation rate was strongly affected by morpholinomethyl substituents on the salicylaldehyde moiety.展开更多
基金Supported by China 973 Program (No.2002CB312200), the National Natural Science Foundation of China (No.60574019 and No.60474045), the Key Technologies R&D Program of Zhejiang Province (No.2005C21087) and the Academician Foundation of Zhejiang Province (No.2005A1001-13).
文摘Key variable identification for classifications is related to many trouble-shooting problems in process indus-tries. Recursive feature elimination based on support vector machine (SVM-RFE) has been proposed recently in applica-tion for feature selection in cancer diagnosis. In this paper, SVM-RFE is used to the key variable selection in fault diag-nosis, and an accelerated SVM-RFE procedure based on heuristic criterion is proposed. The data from Tennessee East-man process (TEP) simulator is used to evaluate the effectiveness of the key variable selection using accelerated SVM-RFE (A-SVM-RFE). A-SVM-RFE integrates computational rate and algorithm effectiveness into a consistent framework. It not only can correctly identify the key variables, but also has very good computational rate. In comparison with contribution charts combined with principal component aralysis (PCA) and other two SVM-RFE algorithms, A-SVM-RFE performs better. It is more fitting for industrial application.
基金Supported by the National Natural Science Foundation of China (No.60175020) the National High Tech Development '863' Program of China (No.2002AA117010-09).
文摘This letter adopts a GA (Genetic Algorithm) approach to assist in learning scaling of features that are most favorable to SVM (Support Vector Machines) classifier, which is named as GA-SVM. The relevant coefficients of various features to the classification task, measured by real-valued scaling, are estimated efficiently by using GA. And GA exploits heavy-bias operator to promote sparsity in the scaling of features. There are many potential benefits of this method:Feature selection is performed by eliminating irrelevant features whose scaling is zero, an SVM classifier that has enhanced generalization ability can be learned simultaneously. Experimental comparisons using original SVM and GA-SVM demonstrate both economical feature selection and excellent classification accuracy on junk e-mail recognition problem and Internet ad recognition problem. The experimental results show that comparing with original SVM classifier, the number of support vector decreases significantly and better classification results are achieved based on GA-SVM. It also demonstrates that GA can provide a simple, general, and powerful framework for tuning parameters in optimal problem, which directly improves the recognition performance and recognition rate of SVM.
基金Supported Partially by National Science Foundation of China (No.10571109,60573018)Science & Technology Research Project of Shanxi Higher-Education (No.20051277)
文摘The model of optimization problem for Support Vector Machine(SVM) is provided, which based on the definitions of the dual norm and the distance between a point and its projection onto a given plane. The model of improved Support Vector Machine based on 1-norm(1-SVM) is provided from the optimization problem, yet it is a discrete programming. With the smoothing technique and optimality knowledge, the discrete programming is changed into a continuous programming. Experimental results show that the algorithm is easy to implement and this method can select and suppress the problem features more efficiently. Illustrative examples show that the 1-SVM deal with the linear or nonlinear classification well.
文摘Depth estimation of subsurface faults is one of the problems in gravity interpretation. We tried using the support vector classifier (SVC) method in the estimation. Using forward and nonlinear inverse techniques, detecting the depth of subsurface faults with related error is possible but it is necessary to have an initial guess for the depth and this initial guess usually comes from non-gravity data. We introduce SVC in this paper as one of the tools for estimating the depth of subsurface faults using gravity data. We can suppose that each subsurface fault depth is a class and that SVC is a classification algorithm. To better use the SVC algorithm, we select proper depth estimation features using a proper features selection (FS) algorithm. In this research, we produce a training set consisting of synthetic gravity profiles created by subsurface faults at different depths to train the SVC code to estimate the depth of real subsurface faults. Then we test our trained SVC code by a testing set consisting of other synthetic gravity profiles created by subsurface faults at different depths. We also tested our trained SVC code using real data.
基金Project supported by the National Natural Science Foundation of China (Nos. 61673384 and 61502497), the Guangxi Key Laboratory of Trusted Software (No. kx201530), the China Postdoctoral Science Foundation (No. 2015M581887), and the Scientific Research Innovation Project for Graduate Students of Jiangsu Province, China (No. KYLX15 1443)
文摘Software defect prediction is aimed to find potential defects based on historical data and software features. Software features can reflect the characteristics of software modules. However, some of these features may be more relevant to the class (defective or non-defective), but others may be redundant or irrelevant. To fully measure the correlation between different features and the class, we present a feature selection approach based on a similarity measure (SM) for software defect prediction. First, the feature weights are updated according to the similarity of samples in different classes. Second, a feature ranking list is generated by sorting the feature weights in descending order, and all feature subsets are selected from the feature ranking list in sequence. Finally, all feature subsets are evaluated on a k-nearest neighbor (KNN) model and measured by an area under curve (AUC) metric for classification performance. The experiments are conducted on 11 National Aeronautics and Space Administration (NASA) datasets, and the results show that our approach performs better than or is comparable to the compared feature selection approaches in terms of classification performance.
基金supported by National Natural Science Foundation of China(Grant Nos.11401497 and 11301435)the Fundamental Research Funds for the Central Universities(Grant No.T2013221043)+3 种基金the Scientific Research Foundation for the Returned Overseas Chinese Scholars,State Education Ministry,the Fundamental Research Funds for the Central Universities(Grant No.20720140034)National Institute on Drug Abuse,National Institutes of Health(Grant Nos.P50 DA036107 and P50 DA039838)National Science Foundation(Grant No.DMS1512422)The content is solely the responsibility of the authors and does not necessarily represent the official views of National Institute on Drug Abuse, National Institutes of Health, National Science Foundation or National Natural Science Foundation of China
文摘High-dimensional data have frequently been collected in many scientific areas including genomewide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of highdimensional data poses many challenges for statisticians. Feature selection and variable selection are fundamental for high-dimensional data analysis. The sparsity principle, which assumes that only a small number of predictors contribute to the response, is frequently adopted and deemed useful in the analysis of high-dimensional data.Following this general principle, a large number of variable selection approaches via penalized least squares or likelihood have been developed in the recent literature to estimate a sparse model and select significant variables simultaneously. While the penalized variable selection methods have been successfully applied in many highdimensional analyses, modern applications in areas such as genomics and proteomics push the dimensionality of data to an even larger scale, where the dimension of data may grow exponentially with the sample size. This has been called ultrahigh-dimensional data in the literature. This work aims to present a selective overview of feature screening procedures for ultrahigh-dimensional data. We focus on insights into how to construct marginal utilities for feature screening on specific models and motivation for the need of model-free feature screening procedures.
基金supported by the National Natural Science Foundation of China(51173183,51233004,51390484,51321062,51473029)
文摘A series of Schiff base aluminum(III) complexes bearing morpholinomethyl substituents were synthesized. Comprehensive investigations on their stereoselective and kinetic features in the ring opening polymerization of lactide were carried out. The ring opening polymerization proved to be first-order in the catalyst and the monomer. Linear relationships between the numberaverage molecular weight of the polylactide and the monomer conversion were consistent with a well-controlled polymerization. The propagation rate was strongly affected by morpholinomethyl substituents on the salicylaldehyde moiety.