It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limit...It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.展开更多
It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable ...It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable feature screening method is very limited;to handle this non-trivial situation, we propose a model-free feature screening for ultrahigh-dimensional multi-classification with both categorical and continuous covariates. The proposed feature screening method will be based on Gini impurity to evaluate the prediction power of covariates. Under certain regularity conditions, it is proved that the proposed screening procedure possesses the sure screening property and ranking consistency properties. We demonstrate the finite sample performance of the proposed procedure by simulation studies and illustrate using real data analysis.展开更多
tmbalanced data is a common and serious problem in many biomedical classification tasks. It causes a bias on the training of classifiers and results in lower accuracy of minority classes prediction. This problem has a...tmbalanced data is a common and serious problem in many biomedical classification tasks. It causes a bias on the training of classifiers and results in lower accuracy of minority classes prediction. This problem has attracted a lot of research interests in the past decade. Unfortunately, most research efforts only concentrate on 2-class problems. In this paper, we study a new method of formulating a multiclass Support Vector Machine (SVM) problem for imbalanced biomedical data to improve the classification performance. The proposed method applies cost-sensitive approach and ramp loss function to the Crammer and Singer multiclass SVM formulation. Experimental results on multiple biomedical datasets show that the proposed solution can effectively cure the problem when the datasets are noisy and highly imbalanced.展开更多
Quantum computing is a promising new approach to tackle the complex real-world computational problems by harnessing the power of quantum mechanics principles.The inherent parallelism and exponential computational powe...Quantum computing is a promising new approach to tackle the complex real-world computational problems by harnessing the power of quantum mechanics principles.The inherent parallelism and exponential computational power of quantum systems hold the potential to outpace classical counterparts in solving complex optimization problems,which are pervasive in machine learning.Quantum Support Vector Machine(QSVM)is a quantum machine learning algorithm inspired by classical Support Vector Machine(SVM)that exploits quantum parallelism to efficiently classify data points in high-dimensional feature spaces.We provide a comprehensive overview of the underlying principles of QSVM,elucidating how different quantum feature maps and quantum kernels enable the manipulation of quantum states to perform classification tasks.Through a comparative analysis,we reveal the quantum advantage achieved by these algorithms in terms of speedup and solution quality.As a case study,we explored the potential of quantum paradigms in the context of a real-world problem:classifying pancreatic cancer biomarker data.The Support Vector Classifier(SVC)algorithm was employed for the classical approach while the QSVM algorithm was executed on a quantum simulator provided by the Qiskit quantum computing framework.The classical approach as well as the quantum-based techniques reported similar accuracy.This uniformity suggests that these methods effectively captured similar underlying patterns in the dataset.Remarkably,quantum implementations exhibited substantially reduced execution times demonstrating the potential of quantum approaches in enhancing classification efficiency.This affirms the growing significance of quantum computing as a transformative tool for augmenting machine learning paradigms and also underscores the potency of quantum execution for computational acceleration.展开更多
Support vector machines (SVMs) are initially designed for binary classification. How to effectively extend them for multiclass classification is still an ongoing research topic. A multiclass classifier is constructe...Support vector machines (SVMs) are initially designed for binary classification. How to effectively extend them for multiclass classification is still an ongoing research topic. A multiclass classifier is constructed by combining SVM^light algorithm with directed acyclic graph SVM (DAGSVM) method, named DAGSVM^light A new method is proposed to select the working set which is identical to the working set selected by SVM^light approach. Experimental results indicate DAGSVM^light is competitive with DAGSMO. It is more suitable for practice use. It may be an especially useful tool for large-scale multiclass classification problems and lead to more widespread use of SVMs in the engineering community due to its good performance.展开更多
While the usage of digital ocular fundus image has been widespread in ophthalmology practice,the interpretation of the image has been still on the hands of the ophthalmologists which are quite costly.We explored a rob...While the usage of digital ocular fundus image has been widespread in ophthalmology practice,the interpretation of the image has been still on the hands of the ophthalmologists which are quite costly.We explored a robust deep learning system that detects three major ocular diseases:diabetic retinopathy(DR),glaucoma(GLC),and age-related macular degeneration(AMD).The proposed method is composed of two steps.First,an initial quality evaluation in the classification system is proposed to filter out poorquality images to enhance its performance,a technique that has not been explored previously.Second,the transfer learning technique is used with various convolutional neural networks(CNN)models that automatically learn a thousand features in the digital retinal image,and are based on those features for diagnosing eye diseases.Comparison performance of many models is conducted to find the optimal model which fits with fundus classification.Among the different CNN models,DenseNet-201 outperforms others with an area under the receiver operating characteristic curve of 0.99.Furthermore,the corresponding specificities for healthy,DR,GLC,andAMDpatients are found to be 89.52%,96.69%,89.58%,and 100%,respectively.These results demonstrate that the proposed method can reduce the time-consumption by automatically diagnosing multiple eye diseases using computer-aided assistance tools.展开更多
Imaging logging has become a popular means of well logging because it can visually represent the lithologic and structural characteristics of strata.The manual interpretation of imaging logging is affected by the limi...Imaging logging has become a popular means of well logging because it can visually represent the lithologic and structural characteristics of strata.The manual interpretation of imaging logging is affected by the limitations of the naked eye and experiential factors.As a result,manual interpretation accuracy is low.Therefore,it is highly useful to develop effective automatic imaging logging interpretation by machine learning.Resistivity imaging logging is the most widely used technology for imaging logging.In this paper,we propose an automatic extraction procedure for the geological features in resistivity imaging logging images.This procedure is based on machine learning and achieves good results in practical applications.Acknowledging that the existence of valueless data significantly affects the recognition effect,we propose three strategies for the identification of valueless data based on binary classification.We compare the effect of the three strategies both on an experimental dataset and in a production environment,and find that the merging method is the best performing of the three strategies.It effectively identifies the valueless data in the well logging images,thus significantly improving the automatic recognition effect of geological features in resistivity logging images.展开更多
文摘It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.
文摘It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable feature screening method is very limited;to handle this non-trivial situation, we propose a model-free feature screening for ultrahigh-dimensional multi-classification with both categorical and continuous covariates. The proposed feature screening method will be based on Gini impurity to evaluate the prediction power of covariates. Under certain regularity conditions, it is proved that the proposed screening procedure possesses the sure screening property and ranking consistency properties. We demonstrate the finite sample performance of the proposed procedure by simulation studies and illustrate using real data analysis.
基金Supported by GSU Molecular Basis of Disease Graduate Fellow, 2011-2012
文摘tmbalanced data is a common and serious problem in many biomedical classification tasks. It causes a bias on the training of classifiers and results in lower accuracy of minority classes prediction. This problem has attracted a lot of research interests in the past decade. Unfortunately, most research efforts only concentrate on 2-class problems. In this paper, we study a new method of formulating a multiclass Support Vector Machine (SVM) problem for imbalanced biomedical data to improve the classification performance. The proposed method applies cost-sensitive approach and ramp loss function to the Crammer and Singer multiclass SVM formulation. Experimental results on multiple biomedical datasets show that the proposed solution can effectively cure the problem when the datasets are noisy and highly imbalanced.
文摘Quantum computing is a promising new approach to tackle the complex real-world computational problems by harnessing the power of quantum mechanics principles.The inherent parallelism and exponential computational power of quantum systems hold the potential to outpace classical counterparts in solving complex optimization problems,which are pervasive in machine learning.Quantum Support Vector Machine(QSVM)is a quantum machine learning algorithm inspired by classical Support Vector Machine(SVM)that exploits quantum parallelism to efficiently classify data points in high-dimensional feature spaces.We provide a comprehensive overview of the underlying principles of QSVM,elucidating how different quantum feature maps and quantum kernels enable the manipulation of quantum states to perform classification tasks.Through a comparative analysis,we reveal the quantum advantage achieved by these algorithms in terms of speedup and solution quality.As a case study,we explored the potential of quantum paradigms in the context of a real-world problem:classifying pancreatic cancer biomarker data.The Support Vector Classifier(SVC)algorithm was employed for the classical approach while the QSVM algorithm was executed on a quantum simulator provided by the Qiskit quantum computing framework.The classical approach as well as the quantum-based techniques reported similar accuracy.This uniformity suggests that these methods effectively captured similar underlying patterns in the dataset.Remarkably,quantum implementations exhibited substantially reduced execution times demonstrating the potential of quantum approaches in enhancing classification efficiency.This affirms the growing significance of quantum computing as a transformative tool for augmenting machine learning paradigms and also underscores the potency of quantum execution for computational acceleration.
文摘Support vector machines (SVMs) are initially designed for binary classification. How to effectively extend them for multiclass classification is still an ongoing research topic. A multiclass classifier is constructed by combining SVM^light algorithm with directed acyclic graph SVM (DAGSVM) method, named DAGSVM^light A new method is proposed to select the working set which is identical to the working set selected by SVM^light approach. Experimental results indicate DAGSVM^light is competitive with DAGSMO. It is more suitable for practice use. It may be an especially useful tool for large-scale multiclass classification problems and lead to more widespread use of SVMs in the engineering community due to its good performance.
基金This work was supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.NRF-2021R1A2C1010362)and the Soonchunhyang University Research Fund.
文摘While the usage of digital ocular fundus image has been widespread in ophthalmology practice,the interpretation of the image has been still on the hands of the ophthalmologists which are quite costly.We explored a robust deep learning system that detects three major ocular diseases:diabetic retinopathy(DR),glaucoma(GLC),and age-related macular degeneration(AMD).The proposed method is composed of two steps.First,an initial quality evaluation in the classification system is proposed to filter out poorquality images to enhance its performance,a technique that has not been explored previously.Second,the transfer learning technique is used with various convolutional neural networks(CNN)models that automatically learn a thousand features in the digital retinal image,and are based on those features for diagnosing eye diseases.Comparison performance of many models is conducted to find the optimal model which fits with fundus classification.Among the different CNN models,DenseNet-201 outperforms others with an area under the receiver operating characteristic curve of 0.99.Furthermore,the corresponding specificities for healthy,DR,GLC,andAMDpatients are found to be 89.52%,96.69%,89.58%,and 100%,respectively.These results demonstrate that the proposed method can reduce the time-consumption by automatically diagnosing multiple eye diseases using computer-aided assistance tools.
文摘Imaging logging has become a popular means of well logging because it can visually represent the lithologic and structural characteristics of strata.The manual interpretation of imaging logging is affected by the limitations of the naked eye and experiential factors.As a result,manual interpretation accuracy is low.Therefore,it is highly useful to develop effective automatic imaging logging interpretation by machine learning.Resistivity imaging logging is the most widely used technology for imaging logging.In this paper,we propose an automatic extraction procedure for the geological features in resistivity imaging logging images.This procedure is based on machine learning and achieves good results in practical applications.Acknowledging that the existence of valueless data significantly affects the recognition effect,we propose three strategies for the identification of valueless data based on binary classification.We compare the effect of the three strategies both on an experimental dataset and in a production environment,and find that the merging method is the best performing of the three strategies.It effectively identifies the valueless data in the well logging images,thus significantly improving the automatic recognition effect of geological features in resistivity logging images.