Imbalanced data classification is the task of classifying datasets where there is a significant disparity in the number of samples between different classes.This task is prevalent in practical scenarios such as indust...Imbalanced data classification is the task of classifying datasets where there is a significant disparity in the number of samples between different classes.This task is prevalent in practical scenarios such as industrial fault diagnosis,network intrusion detection,cancer detection,etc.In imbalanced classification tasks,the focus is typically on achieving high recognition accuracy for the minority class.However,due to the challenges presented by imbalanced multi-class datasets,such as the scarcity of samples in minority classes and complex inter-class relationships with overlapping boundaries,existing methods often do not perform well in multi-class imbalanced data classification tasks,particularly in terms of recognizing minority classes with high accuracy.Therefore,this paper proposes a multi-class imbalanced data classification method called CSDSResNet,which is based on a cost-sensitive dualstream residual network.Firstly,to address the issue of limited samples in the minority class within imbalanced datasets,a dual-stream residual network backbone structure is designed to enhance the model’s feature extraction capability.Next,considering the complexities arising fromimbalanced inter-class sample quantities and imbalanced inter-class overlapping boundaries in multi-class imbalanced datasets,a unique cost-sensitive loss function is devised.This loss function places more emphasis on the minority class and the challenging classes with high interclass similarity,thereby improving the model’s classification ability.Finally,the effectiveness and generalization of the proposed method,CSDSResNet,are evaluated on two datasets:‘DryBeans’and‘Electric Motor Defects’.The experimental results demonstrate that CSDSResNet achieves the best performance on imbalanced datasets,with macro_F1-score values improving by 2.9%and 1.9%on the two datasets compared to current state-of-the-art classification methods,respectively.Furthermore,it achieves the highest precision in single-class recognition tasks for the minority class.展开更多
It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable ...It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable feature screening method is very limited;to handle this non-trivial situation, we propose a model-free feature screening for ultrahigh-dimensional multi-classification with both categorical and continuous covariates. The proposed feature screening method will be based on Gini impurity to evaluate the prediction power of covariates. Under certain regularity conditions, it is proved that the proposed screening procedure possesses the sure screening property and ranking consistency properties. We demonstrate the finite sample performance of the proposed procedure by simulation studies and illustrate using real data analysis.展开更多
It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limit...It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.展开更多
Support vector machines (SVMs) are initially designed for binary classification. How to effectively extend them for multiclass classification is still an ongoing research topic. A multiclass classifier is constructe...Support vector machines (SVMs) are initially designed for binary classification. How to effectively extend them for multiclass classification is still an ongoing research topic. A multiclass classifier is constructed by combining SVM^light algorithm with directed acyclic graph SVM (DAGSVM) method, named DAGSVM^light A new method is proposed to select the working set which is identical to the working set selected by SVM^light approach. Experimental results indicate DAGSVM^light is competitive with DAGSMO. It is more suitable for practice use. It may be an especially useful tool for large-scale multiclass classification problems and lead to more widespread use of SVMs in the engineering community due to its good performance.展开更多
Feature extraction is the most critical step in classification of multispectral image.The classification accuracy is mainly influenced by the feature sets that are selected to classify the image.In the past,handcrafte...Feature extraction is the most critical step in classification of multispectral image.The classification accuracy is mainly influenced by the feature sets that are selected to classify the image.In the past,handcrafted feature sets are used which are not adaptive for different image domains.To overcome this,an evolu-tionary learning method is developed to automatically learn the spatial-spectral features for classification.A modified Firefly Algorithm(FA)which achieves maximum classification accuracy with reduced size of feature set is proposed to gain the interest of feature selection for this purpose.For extracting the most effi-cient features from the data set,we have used 3-D discrete wavelet transform which decompose the multispectral image in all three dimensions.For selecting spatial and spectral features we have studied three different approaches namely overlapping window(OW-3DFS),non-overlapping window(NW-3DFS)adaptive window cube(AW-3DFS)and Pixel based technique.Fivefold Multiclass Support Vector Machine(MSVM)is used for classification purpose.Experiments con-ducted on Madurai LISS IV multispectral image exploited that the adaptive win-dow approach is used to increase the classification accuracy.展开更多
Quantum computing is a promising new approach to tackle the complex real-world computational problems by harnessing the power of quantum mechanics principles.The inherent parallelism and exponential computational powe...Quantum computing is a promising new approach to tackle the complex real-world computational problems by harnessing the power of quantum mechanics principles.The inherent parallelism and exponential computational power of quantum systems hold the potential to outpace classical counterparts in solving complex optimization problems,which are pervasive in machine learning.Quantum Support Vector Machine(QSVM)is a quantum machine learning algorithm inspired by classical Support Vector Machine(SVM)that exploits quantum parallelism to efficiently classify data points in high-dimensional feature spaces.We provide a comprehensive overview of the underlying principles of QSVM,elucidating how different quantum feature maps and quantum kernels enable the manipulation of quantum states to perform classification tasks.Through a comparative analysis,we reveal the quantum advantage achieved by these algorithms in terms of speedup and solution quality.As a case study,we explored the potential of quantum paradigms in the context of a real-world problem:classifying pancreatic cancer biomarker data.The Support Vector Classifier(SVC)algorithm was employed for the classical approach while the QSVM algorithm was executed on a quantum simulator provided by the Qiskit quantum computing framework.The classical approach as well as the quantum-based techniques reported similar accuracy.This uniformity suggests that these methods effectively captured similar underlying patterns in the dataset.Remarkably,quantum implementations exhibited substantially reduced execution times demonstrating the potential of quantum approaches in enhancing classification efficiency.This affirms the growing significance of quantum computing as a transformative tool for augmenting machine learning paradigms and also underscores the potency of quantum execution for computational acceleration.展开更多
tmbalanced data is a common and serious problem in many biomedical classification tasks. It causes a bias on the training of classifiers and results in lower accuracy of minority classes prediction. This problem has a...tmbalanced data is a common and serious problem in many biomedical classification tasks. It causes a bias on the training of classifiers and results in lower accuracy of minority classes prediction. This problem has attracted a lot of research interests in the past decade. Unfortunately, most research efforts only concentrate on 2-class problems. In this paper, we study a new method of formulating a multiclass Support Vector Machine (SVM) problem for imbalanced biomedical data to improve the classification performance. The proposed method applies cost-sensitive approach and ramp loss function to the Crammer and Singer multiclass SVM formulation. Experimental results on multiple biomedical datasets show that the proposed solution can effectively cure the problem when the datasets are noisy and highly imbalanced.展开更多
Head pose estimation has been considered an important and challenging task in computer vision. In this paper we propose a novel method to estimate head pose based on a deep convolutional neural network (DCNN) for 2D...Head pose estimation has been considered an important and challenging task in computer vision. In this paper we propose a novel method to estimate head pose based on a deep convolutional neural network (DCNN) for 2D face images. We design an effective and simple method to roughly crop the face from the input image, maintaining the individual-relative facial features ratio. The method can be used in various poses. Then two convolutional neural networks are set up to train the head pose classifier and then compared with each other. The simpler one has six layers. It performs well on seven yaw poses but is somewhat unsatisfactory when mixed in two pitch poses. The other has eight layers and more pixels in input layers. It has better performance on more poses and more training samples. Before training the network, two reasonable strategies including shift and zoom are executed to prepare training samples. Finally, feature extraction filters are optimized together with the weight of the classification component through training, to minimize the classification error. Our method has been evaluated on the CAS-PEAL-R1, CMU PIE, and CUBIC FacePix databases. It has better performance than state-of-the-art methods for head pose estimation.展开更多
提出一种基于双支持向量机的偏二叉树多类分类算法,偏二叉树双支持向量机多类分类算法.该算法综合了二叉树支持向量机和双支持向量机的优势,实现了在不降低分类性能的前提下,大大缩短训练时间.理论分析和UCI(University of California I...提出一种基于双支持向量机的偏二叉树多类分类算法,偏二叉树双支持向量机多类分类算法.该算法综合了二叉树支持向量机和双支持向量机的优势,实现了在不降低分类性能的前提下,大大缩短训练时间.理论分析和UCI(University of California Irvine)机器学习数据库数据集上的实验结果共同证明,偏二叉树双支持向量机多类分类算法在训练时间上具有绝对的优势,尤其在处理稍大数据集的多类分类问题时,这一优势尤为突出;实验仿真结果还证明,在采用非线性核时,该算法取得了比基于经典支持向量机的一对其余多类分类算法及二叉树支持向量机更好的分类效果;同时该算法还解决了后两种算法可能存在的样本不平衡问题,以及基于经典支持向量机的一对其余多类分类算法可能存在的不可分区域问题.展开更多
基金supported by Beijing Municipal Science and Technology Project(No.Z221100007122003)。
文摘Imbalanced data classification is the task of classifying datasets where there is a significant disparity in the number of samples between different classes.This task is prevalent in practical scenarios such as industrial fault diagnosis,network intrusion detection,cancer detection,etc.In imbalanced classification tasks,the focus is typically on achieving high recognition accuracy for the minority class.However,due to the challenges presented by imbalanced multi-class datasets,such as the scarcity of samples in minority classes and complex inter-class relationships with overlapping boundaries,existing methods often do not perform well in multi-class imbalanced data classification tasks,particularly in terms of recognizing minority classes with high accuracy.Therefore,this paper proposes a multi-class imbalanced data classification method called CSDSResNet,which is based on a cost-sensitive dualstream residual network.Firstly,to address the issue of limited samples in the minority class within imbalanced datasets,a dual-stream residual network backbone structure is designed to enhance the model’s feature extraction capability.Next,considering the complexities arising fromimbalanced inter-class sample quantities and imbalanced inter-class overlapping boundaries in multi-class imbalanced datasets,a unique cost-sensitive loss function is devised.This loss function places more emphasis on the minority class and the challenging classes with high interclass similarity,thereby improving the model’s classification ability.Finally,the effectiveness and generalization of the proposed method,CSDSResNet,are evaluated on two datasets:‘DryBeans’and‘Electric Motor Defects’.The experimental results demonstrate that CSDSResNet achieves the best performance on imbalanced datasets,with macro_F1-score values improving by 2.9%and 1.9%on the two datasets compared to current state-of-the-art classification methods,respectively.Furthermore,it achieves the highest precision in single-class recognition tasks for the minority class.
文摘It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable feature screening method is very limited;to handle this non-trivial situation, we propose a model-free feature screening for ultrahigh-dimensional multi-classification with both categorical and continuous covariates. The proposed feature screening method will be based on Gini impurity to evaluate the prediction power of covariates. Under certain regularity conditions, it is proved that the proposed screening procedure possesses the sure screening property and ranking consistency properties. We demonstrate the finite sample performance of the proposed procedure by simulation studies and illustrate using real data analysis.
文摘It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.
文摘Support vector machines (SVMs) are initially designed for binary classification. How to effectively extend them for multiclass classification is still an ongoing research topic. A multiclass classifier is constructed by combining SVM^light algorithm with directed acyclic graph SVM (DAGSVM) method, named DAGSVM^light A new method is proposed to select the working set which is identical to the working set selected by SVM^light approach. Experimental results indicate DAGSVM^light is competitive with DAGSMO. It is more suitable for practice use. It may be an especially useful tool for large-scale multiclass classification problems and lead to more widespread use of SVMs in the engineering community due to its good performance.
文摘Feature extraction is the most critical step in classification of multispectral image.The classification accuracy is mainly influenced by the feature sets that are selected to classify the image.In the past,handcrafted feature sets are used which are not adaptive for different image domains.To overcome this,an evolu-tionary learning method is developed to automatically learn the spatial-spectral features for classification.A modified Firefly Algorithm(FA)which achieves maximum classification accuracy with reduced size of feature set is proposed to gain the interest of feature selection for this purpose.For extracting the most effi-cient features from the data set,we have used 3-D discrete wavelet transform which decompose the multispectral image in all three dimensions.For selecting spatial and spectral features we have studied three different approaches namely overlapping window(OW-3DFS),non-overlapping window(NW-3DFS)adaptive window cube(AW-3DFS)and Pixel based technique.Fivefold Multiclass Support Vector Machine(MSVM)is used for classification purpose.Experiments con-ducted on Madurai LISS IV multispectral image exploited that the adaptive win-dow approach is used to increase the classification accuracy.
文摘Quantum computing is a promising new approach to tackle the complex real-world computational problems by harnessing the power of quantum mechanics principles.The inherent parallelism and exponential computational power of quantum systems hold the potential to outpace classical counterparts in solving complex optimization problems,which are pervasive in machine learning.Quantum Support Vector Machine(QSVM)is a quantum machine learning algorithm inspired by classical Support Vector Machine(SVM)that exploits quantum parallelism to efficiently classify data points in high-dimensional feature spaces.We provide a comprehensive overview of the underlying principles of QSVM,elucidating how different quantum feature maps and quantum kernels enable the manipulation of quantum states to perform classification tasks.Through a comparative analysis,we reveal the quantum advantage achieved by these algorithms in terms of speedup and solution quality.As a case study,we explored the potential of quantum paradigms in the context of a real-world problem:classifying pancreatic cancer biomarker data.The Support Vector Classifier(SVC)algorithm was employed for the classical approach while the QSVM algorithm was executed on a quantum simulator provided by the Qiskit quantum computing framework.The classical approach as well as the quantum-based techniques reported similar accuracy.This uniformity suggests that these methods effectively captured similar underlying patterns in the dataset.Remarkably,quantum implementations exhibited substantially reduced execution times demonstrating the potential of quantum approaches in enhancing classification efficiency.This affirms the growing significance of quantum computing as a transformative tool for augmenting machine learning paradigms and also underscores the potency of quantum execution for computational acceleration.
基金Supported by GSU Molecular Basis of Disease Graduate Fellow, 2011-2012
文摘tmbalanced data is a common and serious problem in many biomedical classification tasks. It causes a bias on the training of classifiers and results in lower accuracy of minority classes prediction. This problem has attracted a lot of research interests in the past decade. Unfortunately, most research efforts only concentrate on 2-class problems. In this paper, we study a new method of formulating a multiclass Support Vector Machine (SVM) problem for imbalanced biomedical data to improve the classification performance. The proposed method applies cost-sensitive approach and ramp loss function to the Crammer and Singer multiclass SVM formulation. Experimental results on multiple biomedical datasets show that the proposed solution can effectively cure the problem when the datasets are noisy and highly imbalanced.
基金Project supported by the National Key Scientific Instrument and Equipment Development Project of China(No.2013YQ49087903)the National Natural Science Foundation of China(No.61402307)the Educational Commission of Sichuan Province,China(No.15ZA0007)
文摘Head pose estimation has been considered an important and challenging task in computer vision. In this paper we propose a novel method to estimate head pose based on a deep convolutional neural network (DCNN) for 2D face images. We design an effective and simple method to roughly crop the face from the input image, maintaining the individual-relative facial features ratio. The method can be used in various poses. Then two convolutional neural networks are set up to train the head pose classifier and then compared with each other. The simpler one has six layers. It performs well on seven yaw poses but is somewhat unsatisfactory when mixed in two pitch poses. The other has eight layers and more pixels in input layers. It has better performance on more poses and more training samples. Before training the network, two reasonable strategies including shift and zoom are executed to prepare training samples. Finally, feature extraction filters are optimized together with the weight of the classification component through training, to minimize the classification error. Our method has been evaluated on the CAS-PEAL-R1, CMU PIE, and CUBIC FacePix databases. It has better performance than state-of-the-art methods for head pose estimation.
文摘提出一种基于双支持向量机的偏二叉树多类分类算法,偏二叉树双支持向量机多类分类算法.该算法综合了二叉树支持向量机和双支持向量机的优势,实现了在不降低分类性能的前提下,大大缩短训练时间.理论分析和UCI(University of California Irvine)机器学习数据库数据集上的实验结果共同证明,偏二叉树双支持向量机多类分类算法在训练时间上具有绝对的优势,尤其在处理稍大数据集的多类分类问题时,这一优势尤为突出;实验仿真结果还证明,在采用非线性核时,该算法取得了比基于经典支持向量机的一对其余多类分类算法及二叉树支持向量机更好的分类效果;同时该算法还解决了后两种算法可能存在的样本不平衡问题,以及基于经典支持向量机的一对其余多类分类算法可能存在的不可分区域问题.