It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limit...It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.展开更多
It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable ...It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable feature screening method is very limited;to handle this non-trivial situation, we propose a model-free feature screening for ultrahigh-dimensional multi-classification with both categorical and continuous covariates. The proposed feature screening method will be based on Gini impurity to evaluate the prediction power of covariates. Under certain regularity conditions, it is proved that the proposed screening procedure possesses the sure screening property and ranking consistency properties. We demonstrate the finite sample performance of the proposed procedure by simulation studies and illustrate using real data analysis.展开更多
tmbalanced data is a common and serious problem in many biomedical classification tasks. It causes a bias on the training of classifiers and results in lower accuracy of minority classes prediction. This problem has a...tmbalanced data is a common and serious problem in many biomedical classification tasks. It causes a bias on the training of classifiers and results in lower accuracy of minority classes prediction. This problem has attracted a lot of research interests in the past decade. Unfortunately, most research efforts only concentrate on 2-class problems. In this paper, we study a new method of formulating a multiclass Support Vector Machine (SVM) problem for imbalanced biomedical data to improve the classification performance. The proposed method applies cost-sensitive approach and ramp loss function to the Crammer and Singer multiclass SVM formulation. Experimental results on multiple biomedical datasets show that the proposed solution can effectively cure the problem when the datasets are noisy and highly imbalanced.展开更多
Head pose estimation has been considered an important and challenging task in computer vision. In this paper we propose a novel method to estimate head pose based on a deep convolutional neural network (DCNN) for 2D...Head pose estimation has been considered an important and challenging task in computer vision. In this paper we propose a novel method to estimate head pose based on a deep convolutional neural network (DCNN) for 2D face images. We design an effective and simple method to roughly crop the face from the input image, maintaining the individual-relative facial features ratio. The method can be used in various poses. Then two convolutional neural networks are set up to train the head pose classifier and then compared with each other. The simpler one has six layers. It performs well on seven yaw poses but is somewhat unsatisfactory when mixed in two pitch poses. The other has eight layers and more pixels in input layers. It has better performance on more poses and more training samples. Before training the network, two reasonable strategies including shift and zoom are executed to prepare training samples. Finally, feature extraction filters are optimized together with the weight of the classification component through training, to minimize the classification error. Our method has been evaluated on the CAS-PEAL-R1, CMU PIE, and CUBIC FacePix databases. It has better performance than state-of-the-art methods for head pose estimation.展开更多
Quantum computing is a promising new approach to tackle the complex real-world computational problems by harnessing the power of quantum mechanics principles.The inherent parallelism and exponential computational powe...Quantum computing is a promising new approach to tackle the complex real-world computational problems by harnessing the power of quantum mechanics principles.The inherent parallelism and exponential computational power of quantum systems hold the potential to outpace classical counterparts in solving complex optimization problems,which are pervasive in machine learning.Quantum Support Vector Machine(QSVM)is a quantum machine learning algorithm inspired by classical Support Vector Machine(SVM)that exploits quantum parallelism to efficiently classify data points in high-dimensional feature spaces.We provide a comprehensive overview of the underlying principles of QSVM,elucidating how different quantum feature maps and quantum kernels enable the manipulation of quantum states to perform classification tasks.Through a comparative analysis,we reveal the quantum advantage achieved by these algorithms in terms of speedup and solution quality.As a case study,we explored the potential of quantum paradigms in the context of a real-world problem:classifying pancreatic cancer biomarker data.The Support Vector Classifier(SVC)algorithm was employed for the classical approach while the QSVM algorithm was executed on a quantum simulator provided by the Qiskit quantum computing framework.The classical approach as well as the quantum-based techniques reported similar accuracy.This uniformity suggests that these methods effectively captured similar underlying patterns in the dataset.Remarkably,quantum implementations exhibited substantially reduced execution times demonstrating the potential of quantum approaches in enhancing classification efficiency.This affirms the growing significance of quantum computing as a transformative tool for augmenting machine learning paradigms and also underscores the potency of quantum execution for computational acceleration.展开更多
Support vector machines (SVMs) are initially designed for binary classification. How to effectively extend them for multiclass classification is still an ongoing research topic. A multiclass classifier is constructe...Support vector machines (SVMs) are initially designed for binary classification. How to effectively extend them for multiclass classification is still an ongoing research topic. A multiclass classifier is constructed by combining SVM^light algorithm with directed acyclic graph SVM (DAGSVM) method, named DAGSVM^light A new method is proposed to select the working set which is identical to the working set selected by SVM^light approach. Experimental results indicate DAGSVM^light is competitive with DAGSMO. It is more suitable for practice use. It may be an especially useful tool for large-scale multiclass classification problems and lead to more widespread use of SVMs in the engineering community due to its good performance.展开更多
While the usage of digital ocular fundus image has been widespread in ophthalmology practice,the interpretation of the image has been still on the hands of the ophthalmologists which are quite costly.We explored a rob...While the usage of digital ocular fundus image has been widespread in ophthalmology practice,the interpretation of the image has been still on the hands of the ophthalmologists which are quite costly.We explored a robust deep learning system that detects three major ocular diseases:diabetic retinopathy(DR),glaucoma(GLC),and age-related macular degeneration(AMD).The proposed method is composed of two steps.First,an initial quality evaluation in the classification system is proposed to filter out poorquality images to enhance its performance,a technique that has not been explored previously.Second,the transfer learning technique is used with various convolutional neural networks(CNN)models that automatically learn a thousand features in the digital retinal image,and are based on those features for diagnosing eye diseases.Comparison performance of many models is conducted to find the optimal model which fits with fundus classification.Among the different CNN models,DenseNet-201 outperforms others with an area under the receiver operating characteristic curve of 0.99.Furthermore,the corresponding specificities for healthy,DR,GLC,andAMDpatients are found to be 89.52%,96.69%,89.58%,and 100%,respectively.These results demonstrate that the proposed method can reduce the time-consumption by automatically diagnosing multiple eye diseases using computer-aided assistance tools.展开更多
Inter-basin water diversion projects have led to accelerated colonization of aquatic organisms,including the freshwater golden mussel(Limnoperna fortunei),exacerbating global biofouling concerns.While the influence of...Inter-basin water diversion projects have led to accelerated colonization of aquatic organisms,including the freshwater golden mussel(Limnoperna fortunei),exacerbating global biofouling concerns.While the influence of environmental factors on the mussel's invasion and biofouling impact has been studied,quantitative correlations and underlying mechanisms remain unclear,particularly in large-scale interbasin water diversion projects with diverse hydrodynamic and environmental conditions.Here,we examine the comprehensive impact of environmental variables on the establishment risk of the golden mussel in China's 1432-km-long Middle Route of the South-to-North Water Diversion Project.Logistic regression and multiclass classification models were used to investigate the environmental influence on the occurrence probability and reproductive density of the golden mussel.Total nitrogen,ammonia nitrogen,water temperature,pH,and velocity were identified as crucial environmental variables affecting the biofouling risk in the project.Logistic regression analysis revealed a negative correlation between the occurrence probability of all larval stages and levels of total nitrogen and ammonia nitrogen.The multiclass classification model showed that elevated levels of total nitrogen hindered mussel reproduction,while optimal water temperature enhanced their reproductive capacity.Appropriate velocity and pH levels were crucial in maintaining moderate larval density.This research presents a quantitative analytical framework for assessing establishment risks associated with invasive mussels,and the framework is expected to enhance invasion management and mitigate biofouling issues in water diversion projects worldwide.展开更多
Imaging logging has become a popular means of well logging because it can visually represent the lithologic and structural characteristics of strata.The manual interpretation of imaging logging is affected by the limi...Imaging logging has become a popular means of well logging because it can visually represent the lithologic and structural characteristics of strata.The manual interpretation of imaging logging is affected by the limitations of the naked eye and experiential factors.As a result,manual interpretation accuracy is low.Therefore,it is highly useful to develop effective automatic imaging logging interpretation by machine learning.Resistivity imaging logging is the most widely used technology for imaging logging.In this paper,we propose an automatic extraction procedure for the geological features in resistivity imaging logging images.This procedure is based on machine learning and achieves good results in practical applications.Acknowledging that the existence of valueless data significantly affects the recognition effect,we propose three strategies for the identification of valueless data based on binary classification.We compare the effect of the three strategies both on an experimental dataset and in a production environment,and find that the merging method is the best performing of the three strategies.It effectively identifies the valueless data in the well logging images,thus significantly improving the automatic recognition effect of geological features in resistivity logging images.展开更多
文摘It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.
文摘It is quite common that both categorical and continuous covariates appear in the data. But, most feature screening methods for ultrahigh-dimensional classification assume the covariates are continuous. And applicable feature screening method is very limited;to handle this non-trivial situation, we propose a model-free feature screening for ultrahigh-dimensional multi-classification with both categorical and continuous covariates. The proposed feature screening method will be based on Gini impurity to evaluate the prediction power of covariates. Under certain regularity conditions, it is proved that the proposed screening procedure possesses the sure screening property and ranking consistency properties. We demonstrate the finite sample performance of the proposed procedure by simulation studies and illustrate using real data analysis.
基金Supported by GSU Molecular Basis of Disease Graduate Fellow, 2011-2012
文摘tmbalanced data is a common and serious problem in many biomedical classification tasks. It causes a bias on the training of classifiers and results in lower accuracy of minority classes prediction. This problem has attracted a lot of research interests in the past decade. Unfortunately, most research efforts only concentrate on 2-class problems. In this paper, we study a new method of formulating a multiclass Support Vector Machine (SVM) problem for imbalanced biomedical data to improve the classification performance. The proposed method applies cost-sensitive approach and ramp loss function to the Crammer and Singer multiclass SVM formulation. Experimental results on multiple biomedical datasets show that the proposed solution can effectively cure the problem when the datasets are noisy and highly imbalanced.
基金Project supported by the National Key Scientific Instrument and Equipment Development Project of China(No.2013YQ49087903)the National Natural Science Foundation of China(No.61402307)the Educational Commission of Sichuan Province,China(No.15ZA0007)
文摘Head pose estimation has been considered an important and challenging task in computer vision. In this paper we propose a novel method to estimate head pose based on a deep convolutional neural network (DCNN) for 2D face images. We design an effective and simple method to roughly crop the face from the input image, maintaining the individual-relative facial features ratio. The method can be used in various poses. Then two convolutional neural networks are set up to train the head pose classifier and then compared with each other. The simpler one has six layers. It performs well on seven yaw poses but is somewhat unsatisfactory when mixed in two pitch poses. The other has eight layers and more pixels in input layers. It has better performance on more poses and more training samples. Before training the network, two reasonable strategies including shift and zoom are executed to prepare training samples. Finally, feature extraction filters are optimized together with the weight of the classification component through training, to minimize the classification error. Our method has been evaluated on the CAS-PEAL-R1, CMU PIE, and CUBIC FacePix databases. It has better performance than state-of-the-art methods for head pose estimation.
文摘Quantum computing is a promising new approach to tackle the complex real-world computational problems by harnessing the power of quantum mechanics principles.The inherent parallelism and exponential computational power of quantum systems hold the potential to outpace classical counterparts in solving complex optimization problems,which are pervasive in machine learning.Quantum Support Vector Machine(QSVM)is a quantum machine learning algorithm inspired by classical Support Vector Machine(SVM)that exploits quantum parallelism to efficiently classify data points in high-dimensional feature spaces.We provide a comprehensive overview of the underlying principles of QSVM,elucidating how different quantum feature maps and quantum kernels enable the manipulation of quantum states to perform classification tasks.Through a comparative analysis,we reveal the quantum advantage achieved by these algorithms in terms of speedup and solution quality.As a case study,we explored the potential of quantum paradigms in the context of a real-world problem:classifying pancreatic cancer biomarker data.The Support Vector Classifier(SVC)algorithm was employed for the classical approach while the QSVM algorithm was executed on a quantum simulator provided by the Qiskit quantum computing framework.The classical approach as well as the quantum-based techniques reported similar accuracy.This uniformity suggests that these methods effectively captured similar underlying patterns in the dataset.Remarkably,quantum implementations exhibited substantially reduced execution times demonstrating the potential of quantum approaches in enhancing classification efficiency.This affirms the growing significance of quantum computing as a transformative tool for augmenting machine learning paradigms and also underscores the potency of quantum execution for computational acceleration.
文摘Support vector machines (SVMs) are initially designed for binary classification. How to effectively extend them for multiclass classification is still an ongoing research topic. A multiclass classifier is constructed by combining SVM^light algorithm with directed acyclic graph SVM (DAGSVM) method, named DAGSVM^light A new method is proposed to select the working set which is identical to the working set selected by SVM^light approach. Experimental results indicate DAGSVM^light is competitive with DAGSMO. It is more suitable for practice use. It may be an especially useful tool for large-scale multiclass classification problems and lead to more widespread use of SVMs in the engineering community due to its good performance.
基金This work was supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.NRF-2021R1A2C1010362)and the Soonchunhyang University Research Fund.
文摘While the usage of digital ocular fundus image has been widespread in ophthalmology practice,the interpretation of the image has been still on the hands of the ophthalmologists which are quite costly.We explored a robust deep learning system that detects three major ocular diseases:diabetic retinopathy(DR),glaucoma(GLC),and age-related macular degeneration(AMD).The proposed method is composed of two steps.First,an initial quality evaluation in the classification system is proposed to filter out poorquality images to enhance its performance,a technique that has not been explored previously.Second,the transfer learning technique is used with various convolutional neural networks(CNN)models that automatically learn a thousand features in the digital retinal image,and are based on those features for diagnosing eye diseases.Comparison performance of many models is conducted to find the optimal model which fits with fundus classification.Among the different CNN models,DenseNet-201 outperforms others with an area under the receiver operating characteristic curve of 0.99.Furthermore,the corresponding specificities for healthy,DR,GLC,andAMDpatients are found to be 89.52%,96.69%,89.58%,and 100%,respectively.These results demonstrate that the proposed method can reduce the time-consumption by automatically diagnosing multiple eye diseases using computer-aided assistance tools.
基金supported by the National Key Research and Development Program of China(grant no.2021YFC3200902 and 2021YFC3200905)the National Natural Science Foundation of China(grant no.U2243222).
文摘Inter-basin water diversion projects have led to accelerated colonization of aquatic organisms,including the freshwater golden mussel(Limnoperna fortunei),exacerbating global biofouling concerns.While the influence of environmental factors on the mussel's invasion and biofouling impact has been studied,quantitative correlations and underlying mechanisms remain unclear,particularly in large-scale interbasin water diversion projects with diverse hydrodynamic and environmental conditions.Here,we examine the comprehensive impact of environmental variables on the establishment risk of the golden mussel in China's 1432-km-long Middle Route of the South-to-North Water Diversion Project.Logistic regression and multiclass classification models were used to investigate the environmental influence on the occurrence probability and reproductive density of the golden mussel.Total nitrogen,ammonia nitrogen,water temperature,pH,and velocity were identified as crucial environmental variables affecting the biofouling risk in the project.Logistic regression analysis revealed a negative correlation between the occurrence probability of all larval stages and levels of total nitrogen and ammonia nitrogen.The multiclass classification model showed that elevated levels of total nitrogen hindered mussel reproduction,while optimal water temperature enhanced their reproductive capacity.Appropriate velocity and pH levels were crucial in maintaining moderate larval density.This research presents a quantitative analytical framework for assessing establishment risks associated with invasive mussels,and the framework is expected to enhance invasion management and mitigate biofouling issues in water diversion projects worldwide.
文摘Imaging logging has become a popular means of well logging because it can visually represent the lithologic and structural characteristics of strata.The manual interpretation of imaging logging is affected by the limitations of the naked eye and experiential factors.As a result,manual interpretation accuracy is low.Therefore,it is highly useful to develop effective automatic imaging logging interpretation by machine learning.Resistivity imaging logging is the most widely used technology for imaging logging.In this paper,we propose an automatic extraction procedure for the geological features in resistivity imaging logging images.This procedure is based on machine learning and achieves good results in practical applications.Acknowledging that the existence of valueless data significantly affects the recognition effect,we propose three strategies for the identification of valueless data based on binary classification.We compare the effect of the three strategies both on an experimental dataset and in a production environment,and find that the merging method is the best performing of the three strategies.It effectively identifies the valueless data in the well logging images,thus significantly improving the automatic recognition effect of geological features in resistivity logging images.