A new incremental support vector machine (SVM) algorithm is proposed which is based on multiple kernel learning. Through introducing multiple kernel learning into the SVM incremental learning, large scale data set l...A new incremental support vector machine (SVM) algorithm is proposed which is based on multiple kernel learning. Through introducing multiple kernel learning into the SVM incremental learning, large scale data set learning problem can be solved effectively. Furthermore, different punishments are adopted in allusion to the training subset and the acquired support vectors, which may help to improve the performance of SVM. Simulation results indicate that the proposed algorithm can not only solve the model selection problem in SVM incremental learning, but also improve the classification or prediction precision.展开更多
As a widely used rock excavation method in civil and mining construction works, the blasting operations and theinduced side effects are always investigated by the existing studies. The occurrence of flyrock is regarded...As a widely used rock excavation method in civil and mining construction works, the blasting operations and theinduced side effects are always investigated by the existing studies. The occurrence of flyrock is regarded as one ofthe most important issues induced by blasting operations, since the accurate prediction of which is crucial fordelineating safety zone. For this purpose, this study developed a flyrock prediction model based on 234 sets ofblasting data collected from Sugun Copper Mine site. A stacked multiple kernel support vector machine (stackedMK-SVM) model was proposed for flyrock prediction. The proposed stacked structure can effectively improve themodel performance by addressing the importance level of different features. For comparison purpose, 6 othermachine learning models were developed, including SVM, MK-SVM, Lagragian Twin SVM (LTSVM), ArtificialNeural Network (ANN), Random Forest (RF) and M5 Tree. This study implemented a 5-fold cross validationprocess for hyperparameters tuning purpose. According to the evaluation results, the proposed stacked MK-SVMmodel achieved the best overall performance, with RMSE of 1.73 and 1.74, MAE of 0.58 and 1.08, VAF of 98.95and 99.25 in training and testing phase, respectively.展开更多
Extreme learning machine(ELM) has attracted much attention in recent years due to its fast convergence and good performance.Merging both ELM and support vector machine is an important trend,thus yielding an ELM kernel...Extreme learning machine(ELM) has attracted much attention in recent years due to its fast convergence and good performance.Merging both ELM and support vector machine is an important trend,thus yielding an ELM kernel.ELM kernel based methods are able to solve the nonlinear problems by inducing an explicit mapping compared with the commonly-used kernels such as Gaussian kernel.In this paper,the ELM kernel is extended to the least squares support vector regression(LSSVR),so ELM-LSSVR was proposed.ELM-LSSVR can be used to reduce the training and test time simultaneously without extra techniques such as sequential minimal optimization and pruning mechanism.Moreover,the memory space for the training and test was relieved.To confirm the efficacy and feasibility of the proposed ELM-LSSVR,the experiments are reported to demonstrate that ELM-LSSVR takes the advantage of training and test time with comparable accuracy to other algorithms.展开更多
Word Sense Disambiguation has been a trending topic of research in Natural Language Processing and Machine Learning.Mining core features and performing the text classification still exist as a challenging task.Here the...Word Sense Disambiguation has been a trending topic of research in Natural Language Processing and Machine Learning.Mining core features and performing the text classification still exist as a challenging task.Here the features of the context such as neighboring words like adjective provide the evidence for classification using machine learning approach.This paper presented the text document classification that has wide applications in information retrieval,which uses movie review datasets.Here the document indexing based on controlled vocabulary,adjective,word sense disambiguation,generating hierarchical cate-gorization of web pages,spam detection,topic labeling,web search,document summarization,etc.Here the kernel support vector machine learning algorithm helps to classify the text and feature extract is performed by cuckoo search opti-mization.Positive review and negative review of movie dataset is presented to get the better classification accuracy.Experimental results focused with context mining,feature analysis and classification.By comparing with the previous work,proposed work designed to achieve the efficient results.Overall design is per-formed with MATLAB 2020a tool.展开更多
A hybrid feature selection and classification strategy was proposed based on the simulated annealing genetic algonthrn and multiple instance learning (MIL). The band selection method was proposed from subspace decom...A hybrid feature selection and classification strategy was proposed based on the simulated annealing genetic algonthrn and multiple instance learning (MIL). The band selection method was proposed from subspace decomposition, which combines the simulated annealing algorithm with the genetic algorithm in choosing different cross-over and mutation probabilities, as well as mutation individuals. Then MIL was combined with image segmentation, clustering and support vector machine algorithms to classify hyperspectral image. The experimental results show that this proposed method can get high classification accuracy of 93.13% at small training samples and the weaknesses of the conventional methods are overcome.展开更多
The paper is related to the error analysis of Multicategory Support Vector Machine (MSVM) classifiers based on reproducing kernel Hilbert spaces. We choose the polynomial kernel as Mercer kernel and give the error e...The paper is related to the error analysis of Multicategory Support Vector Machine (MSVM) classifiers based on reproducing kernel Hilbert spaces. We choose the polynomial kernel as Mercer kernel and give the error estimate with De La Vall6e Poussin means. We also introduce the standard estimation of sample error, and derive the explicit learning rate.展开更多
With the progress of deep learning research, convolutional neural networks have become the most important method in feature extraction. How to effectively classify and recognize the extracted features will directly af...With the progress of deep learning research, convolutional neural networks have become the most important method in feature extraction. How to effectively classify and recognize the extracted features will directly affect the performance of the entire network. Traditional processing methods include classification models such as fully connected network models and support vector machines. In order to solve the problem that the traditional convolutional neural network is prone to over-fitting for the classification of small samples, a CNN-TWSVM hybrid model was proposed by fusing the twin support vector machine (TWSVM) with higher computational efficiency as the CNN classifier, and it was applied to the traffic sign recognition task. In order to improve the generalization ability of the model, the wavelet kernel function is introduced to deal with the nonlinear classification task. The method uses the network initialized from the ImageNet dataset to fine-tune the specific domain and intercept the inner layer of the network to extract the high abstract features of the traffic sign image. Finally, the TWSVM based on wavelet kernel function is used to identify the traffic signs, so as to effectively solve the over-fitting problem of traffic signs classification. On GTSRB and BELGIUMTS datasets, the validity and generalization ability of the improved model is verified by comparing with different kernel functions and different SVM classifiers.展开更多
Kernel-based methods work by embedding the data into a feature space and then searching linear hypothesis among the embedding data points. The performance is mostly affected by which kernel is used. A promising way is...Kernel-based methods work by embedding the data into a feature space and then searching linear hypothesis among the embedding data points. The performance is mostly affected by which kernel is used. A promising way is to learn the kernel from the data automatically. A general regularized risk functional (RRF) criterion for kernel matrix learning is proposed. Compared with the RRF criterion, general RRF criterion takes into account the geometric distributions of the embedding data points. It is proven that the distance between different geometric distdbutions can be estimated by their centroid distance in the reproducing kernel Hilbert space. Using this criterion for kernel matrix learning leads to a convex quadratically constrained quadratic programming (QCQP) problem. For several commonly used loss functions, their mathematical formulations are given. Experiment results on a collection of benchmark data sets demonstrate the effectiveness of the proposed method.展开更多
While malicious samples are widely found in many application fields of machine learning,suitable countermeasures have been investigated in the field of adversarial machine learning.Due to the importance and popularity...While malicious samples are widely found in many application fields of machine learning,suitable countermeasures have been investigated in the field of adversarial machine learning.Due to the importance and popularity of Support Vector Machines(SVMs),we first describe the evasion attack against SVM classification and then propose a defense strategy in this paper.The evasion attack utilizes the classification surface of SVM to iteratively find the minimal perturbations that mislead the nonlinear classifier.Specially,we propose what is called a vulnerability function to measure the vulnerability of the SVM classifiers.Utilizing this vulnerability function,we put forward an effective defense strategy based on the kernel optimization of SVMs with Gaussian kernel against the evasion attack.Our defense method is verified to be very effective on the benchmark datasets,and the SVM classifier becomes more robust after using our kernel optimization scheme.展开更多
Indoor air quality becomes increasingly important,partly because the COVID-19 pandemic increases the time people spend indoors.Research into the prediction of indoor volatile organic compounds(VOCs)is traditionally co...Indoor air quality becomes increasingly important,partly because the COVID-19 pandemic increases the time people spend indoors.Research into the prediction of indoor volatile organic compounds(VOCs)is traditionally confined to building materials and furniture.Relatively little research focuses on estimation of human-related VOCs,which have been shown to contribute significantly to indoor air quality,especially in densely-occupied environments.This study applies a machine learning approach to accurately estimate the human-related VOC emissions in a university classroom.The time-resolved concentrations of two typical human-related(ozone-related)VOCs in the classroom over a five-day period were analyzed,i.e.,6-methyl-5-hepten-2-one(6-MHO),4-oxopentanal(4-OPA).By comparing the results for 6-MHO concentration predicted via five machine learning approaches including the random forest regression(RFR),adaptive boosting(Adaboost),gradient boosting regression tree(GBRT),extreme gradient boosting(XGboost),and least squares support vector machine(LSSVM),we find that the LSSVM approach achieves the best performance,by using multi-feature parameters(number of occupants,ozone concentration,temperature,relative humidity)as the input.The LSSVM approach is then used to predict the 4-OPA concentration,with mean absolute percentage error(MAPE)less than 5%,indicating high accuracy.By combining the LSSVM with a kernel density estimation(KDE)method,we further establish an interval prediction model,which can provide uncertainty information and viable option for decision-makers.The machine learning approach in this study can easily incorporate the impact of various factors on VOC emission behaviors,making it especially suitable for concentration prediction and exposure assessment in realistic indoor settings.展开更多
A crucial task in hyperspectral image(HSI)taxonomy is exploring effective methodologies to effusively practice the 3-D and spectral data delivered by the statistics cube.For classification of images,3-D data is adjudg...A crucial task in hyperspectral image(HSI)taxonomy is exploring effective methodologies to effusively practice the 3-D and spectral data delivered by the statistics cube.For classification of images,3-D data is adjudged in the phases of pre-cataloging,an assortment of a sample,classifiers,post-cataloging,and accurateness estimation.Lastly,a viewpoint on imminent examination directions for proceeding 3-D and spectral approaches is untaken.In topical years,sparse representation is acknowledged as a dominant classification tool to effectually labels deviating difficulties and extensively exploited in several imagery dispensation errands.Encouraged by those efficacious solicitations,sparse representation(SR)has likewise been presented to categorize HSI’s and validated virtuous enactment.This research paper offers an overview of the literature on the classification of HSI technology and its applications.This assessment is centered on a methodical review of SR and support vector machine(SVM)grounded HSI taxonomy works and equates numerous approaches for this matter.We form an outline that splits the equivalent mechanisms into spectral aspects of systems,and spectral–spatial feature networks to methodically analyze the contemporary accomplishments in HSI taxonomy.Furthermore,cogitating the datum that accessible training illustrations in the remote distinguishing arena are generally appropriate restricted besides training neural networks(NNs)to necessitate an enormous integer of illustrations,we comprise certain approaches to increase taxonomy enactment,which can deliver certain strategies for imminent learnings on this issue.Lastly,numerous illustrative neural learning-centered taxonomy approaches are piloted on physical HSI’s in our experimentations.展开更多
Distance metric learning plays an important role in many machine learning tasks. In this paper, we propose a method for learning a Mahanalobis distance metric. By formulating the metric learning problem with relative ...Distance metric learning plays an important role in many machine learning tasks. In this paper, we propose a method for learning a Mahanalobis distance metric. By formulating the metric learning problem with relative distance constraints, we suggest a Relative Distance Constrained Metric Learning (RDCML) model which can be easily implemented and effectively solved by a modified support vector machine (SVM) approach. Experimental results on UCI datasets and handwritten digits datasets show that RDCML achieves better or comparable classification accuracy when compared with the state-of-the-art metric learning methods.展开更多
基金supported by the National Natural Science Key Foundation of China(69974021)
文摘A new incremental support vector machine (SVM) algorithm is proposed which is based on multiple kernel learning. Through introducing multiple kernel learning into the SVM incremental learning, large scale data set learning problem can be solved effectively. Furthermore, different punishments are adopted in allusion to the training subset and the acquired support vectors, which may help to improve the performance of SVM. Simulation results indicate that the proposed algorithm can not only solve the model selection problem in SVM incremental learning, but also improve the classification or prediction precision.
文摘As a widely used rock excavation method in civil and mining construction works, the blasting operations and theinduced side effects are always investigated by the existing studies. The occurrence of flyrock is regarded as one ofthe most important issues induced by blasting operations, since the accurate prediction of which is crucial fordelineating safety zone. For this purpose, this study developed a flyrock prediction model based on 234 sets ofblasting data collected from Sugun Copper Mine site. A stacked multiple kernel support vector machine (stackedMK-SVM) model was proposed for flyrock prediction. The proposed stacked structure can effectively improve themodel performance by addressing the importance level of different features. For comparison purpose, 6 othermachine learning models were developed, including SVM, MK-SVM, Lagragian Twin SVM (LTSVM), ArtificialNeural Network (ANN), Random Forest (RF) and M5 Tree. This study implemented a 5-fold cross validationprocess for hyperparameters tuning purpose. According to the evaluation results, the proposed stacked MK-SVMmodel achieved the best overall performance, with RMSE of 1.73 and 1.74, MAE of 0.58 and 1.08, VAF of 98.95and 99.25 in training and testing phase, respectively.
基金Sponsored by the National Natural Science Foundation of China(51006052)
文摘Extreme learning machine(ELM) has attracted much attention in recent years due to its fast convergence and good performance.Merging both ELM and support vector machine is an important trend,thus yielding an ELM kernel.ELM kernel based methods are able to solve the nonlinear problems by inducing an explicit mapping compared with the commonly-used kernels such as Gaussian kernel.In this paper,the ELM kernel is extended to the least squares support vector regression(LSSVR),so ELM-LSSVR was proposed.ELM-LSSVR can be used to reduce the training and test time simultaneously without extra techniques such as sequential minimal optimization and pruning mechanism.Moreover,the memory space for the training and test was relieved.To confirm the efficacy and feasibility of the proposed ELM-LSSVR,the experiments are reported to demonstrate that ELM-LSSVR takes the advantage of training and test time with comparable accuracy to other algorithms.
文摘Word Sense Disambiguation has been a trending topic of research in Natural Language Processing and Machine Learning.Mining core features and performing the text classification still exist as a challenging task.Here the features of the context such as neighboring words like adjective provide the evidence for classification using machine learning approach.This paper presented the text document classification that has wide applications in information retrieval,which uses movie review datasets.Here the document indexing based on controlled vocabulary,adjective,word sense disambiguation,generating hierarchical cate-gorization of web pages,spam detection,topic labeling,web search,document summarization,etc.Here the kernel support vector machine learning algorithm helps to classify the text and feature extract is performed by cuckoo search opti-mization.Positive review and negative review of movie dataset is presented to get the better classification accuracy.Experimental results focused with context mining,feature analysis and classification.By comparing with the previous work,proposed work designed to achieve the efficient results.Overall design is per-formed with MATLAB 2020a tool.
文摘A hybrid feature selection and classification strategy was proposed based on the simulated annealing genetic algonthrn and multiple instance learning (MIL). The band selection method was proposed from subspace decomposition, which combines the simulated annealing algorithm with the genetic algorithm in choosing different cross-over and mutation probabilities, as well as mutation individuals. Then MIL was combined with image segmentation, clustering and support vector machine algorithms to classify hyperspectral image. The experimental results show that this proposed method can get high classification accuracy of 93.13% at small training samples and the weaknesses of the conventional methods are overcome.
文摘The paper is related to the error analysis of Multicategory Support Vector Machine (MSVM) classifiers based on reproducing kernel Hilbert spaces. We choose the polynomial kernel as Mercer kernel and give the error estimate with De La Vall6e Poussin means. We also introduce the standard estimation of sample error, and derive the explicit learning rate.
文摘With the progress of deep learning research, convolutional neural networks have become the most important method in feature extraction. How to effectively classify and recognize the extracted features will directly affect the performance of the entire network. Traditional processing methods include classification models such as fully connected network models and support vector machines. In order to solve the problem that the traditional convolutional neural network is prone to over-fitting for the classification of small samples, a CNN-TWSVM hybrid model was proposed by fusing the twin support vector machine (TWSVM) with higher computational efficiency as the CNN classifier, and it was applied to the traffic sign recognition task. In order to improve the generalization ability of the model, the wavelet kernel function is introduced to deal with the nonlinear classification task. The method uses the network initialized from the ImageNet dataset to fine-tune the specific domain and intercept the inner layer of the network to extract the high abstract features of the traffic sign image. Finally, the TWSVM based on wavelet kernel function is used to identify the traffic signs, so as to effectively solve the over-fitting problem of traffic signs classification. On GTSRB and BELGIUMTS datasets, the validity and generalization ability of the improved model is verified by comparing with different kernel functions and different SVM classifiers.
基金supported by the National Natural Science Fundation of China (60736021)the Joint Funds of NSFC-Guangdong Province(U0735003)
文摘Kernel-based methods work by embedding the data into a feature space and then searching linear hypothesis among the embedding data points. The performance is mostly affected by which kernel is used. A promising way is to learn the kernel from the data automatically. A general regularized risk functional (RRF) criterion for kernel matrix learning is proposed. Compared with the RRF criterion, general RRF criterion takes into account the geometric distributions of the embedding data points. It is proven that the distance between different geometric distdbutions can be estimated by their centroid distance in the reproducing kernel Hilbert space. Using this criterion for kernel matrix learning leads to a convex quadratically constrained quadratic programming (QCQP) problem. For several commonly used loss functions, their mathematical formulations are given. Experiment results on a collection of benchmark data sets demonstrate the effectiveness of the proposed method.
基金supported by the National Natural Science Foundation of China under Grant No.61966011.
文摘While malicious samples are widely found in many application fields of machine learning,suitable countermeasures have been investigated in the field of adversarial machine learning.Due to the importance and popularity of Support Vector Machines(SVMs),we first describe the evasion attack against SVM classification and then propose a defense strategy in this paper.The evasion attack utilizes the classification surface of SVM to iteratively find the minimal perturbations that mislead the nonlinear classifier.Specially,we propose what is called a vulnerability function to measure the vulnerability of the SVM classifiers.Utilizing this vulnerability function,we put forward an effective defense strategy based on the kernel optimization of SVMs with Gaussian kernel against the evasion attack.Our defense method is verified to be very effective on the benchmark datasets,and the SVM classifier becomes more robust after using our kernel optimization scheme.
基金supported by the National Natural Science Foundation of China (No.52178062)the Alfred P.Sloan Foundation (No.G-2016-7050)the Opening Fund of State Key Laboratory of Green Building in Western China (LSKF202311).
文摘Indoor air quality becomes increasingly important,partly because the COVID-19 pandemic increases the time people spend indoors.Research into the prediction of indoor volatile organic compounds(VOCs)is traditionally confined to building materials and furniture.Relatively little research focuses on estimation of human-related VOCs,which have been shown to contribute significantly to indoor air quality,especially in densely-occupied environments.This study applies a machine learning approach to accurately estimate the human-related VOC emissions in a university classroom.The time-resolved concentrations of two typical human-related(ozone-related)VOCs in the classroom over a five-day period were analyzed,i.e.,6-methyl-5-hepten-2-one(6-MHO),4-oxopentanal(4-OPA).By comparing the results for 6-MHO concentration predicted via five machine learning approaches including the random forest regression(RFR),adaptive boosting(Adaboost),gradient boosting regression tree(GBRT),extreme gradient boosting(XGboost),and least squares support vector machine(LSSVM),we find that the LSSVM approach achieves the best performance,by using multi-feature parameters(number of occupants,ozone concentration,temperature,relative humidity)as the input.The LSSVM approach is then used to predict the 4-OPA concentration,with mean absolute percentage error(MAPE)less than 5%,indicating high accuracy.By combining the LSSVM with a kernel density estimation(KDE)method,we further establish an interval prediction model,which can provide uncertainty information and viable option for decision-makers.The machine learning approach in this study can easily incorporate the impact of various factors on VOC emission behaviors,making it especially suitable for concentration prediction and exposure assessment in realistic indoor settings.
文摘A crucial task in hyperspectral image(HSI)taxonomy is exploring effective methodologies to effusively practice the 3-D and spectral data delivered by the statistics cube.For classification of images,3-D data is adjudged in the phases of pre-cataloging,an assortment of a sample,classifiers,post-cataloging,and accurateness estimation.Lastly,a viewpoint on imminent examination directions for proceeding 3-D and spectral approaches is untaken.In topical years,sparse representation is acknowledged as a dominant classification tool to effectually labels deviating difficulties and extensively exploited in several imagery dispensation errands.Encouraged by those efficacious solicitations,sparse representation(SR)has likewise been presented to categorize HSI’s and validated virtuous enactment.This research paper offers an overview of the literature on the classification of HSI technology and its applications.This assessment is centered on a methodical review of SR and support vector machine(SVM)grounded HSI taxonomy works and equates numerous approaches for this matter.We form an outline that splits the equivalent mechanisms into spectral aspects of systems,and spectral–spatial feature networks to methodically analyze the contemporary accomplishments in HSI taxonomy.Furthermore,cogitating the datum that accessible training illustrations in the remote distinguishing arena are generally appropriate restricted besides training neural networks(NNs)to necessitate an enormous integer of illustrations,we comprise certain approaches to increase taxonomy enactment,which can deliver certain strategies for imminent learnings on this issue.Lastly,numerous illustrative neural learning-centered taxonomy approaches are piloted on physical HSI’s in our experimentations.
基金This work was supported in part by the National Natural Science Foundation of China under Grant 61271093,Grant 61471146, and the Program ofMinistry of Education for New Century Excellent Talents under Grant NCET-12-0150
文摘Distance metric learning plays an important role in many machine learning tasks. In this paper, we propose a method for learning a Mahanalobis distance metric. By formulating the metric learning problem with relative distance constraints, we suggest a Relative Distance Constrained Metric Learning (RDCML) model which can be easily implemented and effectively solved by a modified support vector machine (SVM) approach. Experimental results on UCI datasets and handwritten digits datasets show that RDCML achieves better or comparable classification accuracy when compared with the state-of-the-art metric learning methods.