The motivation of data mining is how to extract effective information from huge data in very large database. However, some redundant and irrelevant attributes, which result in low performance and high computing comple...The motivation of data mining is how to extract effective information from huge data in very large database. However, some redundant and irrelevant attributes, which result in low performance and high computing complexity, are included in the very large database in general.So, Feature Subset Selection (FSS) becomes one important issue in the field of data mining. In this letter, an FSS model based on the filter approach is built, which uses the simulated annealing genetic algorithm. Experimental results show that convergence and stability of this algorithm are adequately achieved.展开更多
In order to select effective feature subsets for pattern classification, a novel statistics rough set method is presented based on generalized attribute reduction. Unlike classical reduction approaches, the objects in...In order to select effective feature subsets for pattern classification, a novel statistics rough set method is presented based on generalized attribute reduction. Unlike classical reduction approaches, the objects in universe of discourse are signs of training sample sets and values of attributes are taken as statistical parameters. The binary relation and discernibility matrix for the reduction are induced by distance function. Furthermore, based on the monotony of the distance function defined by Mahalanobis distance, the effective feature subsets are obtained as generalized attribute reducts. Experiment result shows that the classification performance can be improved by using the selected feature subsets.展开更多
Feature Subset Selection(FSS)is an NP-hard problem to remove redundant and irrelevant features particularly from medical data,and it can be effectively addressed by metaheuristic algorithms.However,existing binary ver...Feature Subset Selection(FSS)is an NP-hard problem to remove redundant and irrelevant features particularly from medical data,and it can be effectively addressed by metaheuristic algorithms.However,existing binary versions of metaheuristic algorithms have issues with convergence and lack an effective binarization method,resulting in suboptimal solutions that hinder diagnosis and prediction accuracy.This paper aims to propose an Improved Binary Quantum-based Avian Navigation Optimizer Algorithm(IBQANA)for FSS in medical data preprocessing to address the suboptimal solutions arising from binary versions of metaheuristic algorithms.The proposed IBQANA’s contributions include the Hybrid Binary Operator(HBO)and the Distance-based Binary Search Strategy(DBSS).HBO is designed to convert continuous values into binary solutions,even for values outside the[0,1]range,ensuring accurate binary mapping.On the other hand,DBSS is a two-phase search strategy that enhances the performance of inferior search agents and accelerates convergence.By combining exploration and exploitation phases based on an adaptive probability function,DBSS effectively avoids local optima.The effectiveness of applying HBO is compared with five transfer function families and thresholding on 12 medical datasets,with feature numbers ranging from 8 to 10,509.IBQANA's effectiveness is evaluated regarding the accuracy,fitness,and selected features and compared with seven binary metaheuristic algorithms.Furthermore,IBQANA is utilized to detect COVID-19.The results reveal that the proposed IBQANA outperforms all comparative algorithms on COVID-19 and 11 other medical datasets.The proposed method presents a promising solution to the FSS problem in medical data preprocessing.展开更多
A relationship between lung transplant success and many features of recipients’/donors has long been studied.However,modeling a robust model of a potential impact on organ transplant success has proved challenging.In...A relationship between lung transplant success and many features of recipients’/donors has long been studied.However,modeling a robust model of a potential impact on organ transplant success has proved challenging.In this study,a hybrid feature selection model was developed based on ant colony opti-mization(ACO)and k-nearest neighbor(kNN)classifier to investigate the rela-tionship between the most defining features of recipients/donors and lung transplant success using data from the United Network of Organ Sharing(UNOS).The proposed ACO-kNN approach explores the features space to identify the representative attributes and classify patients’functional status(i.e.,quality of life)after lung transplantation.The efficacy of the proposed model was verified using 3,684 records and 118 input features from the UNOS.The developed approach examined the reliability and validity of the lung allocation process.The results are promising regarding accuracy prediction to be 91.3%and low computational time,along with better decision capabilities,emphasizing the potential for automatic classification of the lung and other organs allocation pro-cesses.In addition,the proposed model recommends a new perspective on how medical experts and clinicians respond to uncertain and challenging lung alloca-tion strategies.Having such ACO-kNN model,a medical professional can sum-marize information through the proposed method and make decisions for the upcoming transplants to allocate the donor organ.展开更多
Remote sensing is an important technical means to investigate land resources.Optical imagery has been widely used in crop classification and can show changes in moisture and chlorophyll content in crop leaves,whereas ...Remote sensing is an important technical means to investigate land resources.Optical imagery has been widely used in crop classification and can show changes in moisture and chlorophyll content in crop leaves,whereas synthetic aperture radar(SAR)imagery is sensitive to changes in growth states and morphological structures.Crop-type mapping with a single type of imagery sometimes has unsatisfactory precision,so providing precise spatiotemporal information on crop type at a local scale for agricultural applications is difficult.To explore the abilities of combining optical and SAR images and to solve the problem of inaccurate spatial information for land parcels,a new method is proposed in this paper to improve crop-type identification accuracy.Multifeatures were derived from the full polarimetric SAR data(GaoFen-3)and a high-resolution optical image(GaoFen-2),and the farmland parcels used as the basic for object-oriented classification were obtained from the GaoFen-2 image using optimal scale segmentation.A novel feature subset selection method based on within-class aggregation and between-class scatter(WA-BS)is proposed to extract the optimal feature subset.Finally,crop-type mapping was produced by a support vector machine(SVM)classifier.The results showed that the proposed method achieved good classification results with an overall accuracy of 89.50%,which is better than the crop classification results derived from SAR-based segmentation.Compared with the ReliefF,mRMR and LeastC feature selection algorithms,the WA-BS algorithm can effectively remove redundant features that are strongly correlated and obtain a high classification accuracy via the obtained optimal feature subset.This study shows that the accuracy of crop-type mapping in an area with multiple cropping patterns can be improved by the combination of optical and SAR remote sensing images.展开更多
The significance of the preprocessing stage in any data mining task is well known. Before attempting medical data classification, characteristics of medical datasets, including noise, incompleteness, and the existence...The significance of the preprocessing stage in any data mining task is well known. Before attempting medical data classification, characteristics of medical datasets, including noise, incompleteness, and the existence of multiple and possibly irrelevant features, need to be addressed. In this paper, we show that selecting the right combination of prepro- cessing methods has a considerable impact on the classification potential of a dataset. The preprocessing operations con- sidered include the discretization of numeric attributes, the selection of attribute subset(s), and the handling of missing values. The classification is performed by an ant colony optimization algorithm as a case study. Experimental results on 25 real-world medical datasets show that a significant relative improvement in predictive accuracy, exceeding 60% in some cases, is obtained.展开更多
Purpose-Conventional diagnostic techniques,on the other hand,may be prone to subjectivity since they depend on assessment of motions that are often subtle to individual eyes and hence hard to classify,potentially resu...Purpose-Conventional diagnostic techniques,on the other hand,may be prone to subjectivity since they depend on assessment of motions that are often subtle to individual eyes and hence hard to classify,potentially resulting in misdiagnosis.Meanwhile,early nonmotor signs of Parkinson’s disease(PD)can be mild and may be due to variety of other conditions.As a result,these signs are usually ignored,making early PD diagnosis difficult.Machine learning approaches for PD classification and healthy controls or individuals with similar medical symptoms have been introduced to solve these problems and to enhance the diagnostic and assessment processes of PD(like,movement disorders or other Parkinsonian syndromes).Design/methodology/approach-Medical observations and evaluation of medical symptoms,including characterization of a wide range of motor indications,are commonly used to diagnose PD.The quantity of the data being processed has grown in the last five years;feature selection has become a prerequisite before any classification.This study introduces a feature selection method based on the score-based artificial fish swarm algorithm(SAFSA)to overcome this issue.Findings-This study adds to the accuracy of PD identification by reducing the amount of chosen vocal features while to use the most recent and largest publicly accessible database.Feature subset selection in PD detection techniques starts by eliminating features that are not relevant or redundant.According to a few objective functions,features subset chosen should provide the best performance.Research limitations/implications-In many situations,this is an Nondeterministic Polynomial Time(NPHard)issue.This method enhances the PD detection rate by selecting the most essential features from the database.To begin,the data set’s dimensionality is reduced using Singular Value Decomposition dimensionality technique.Next,Biogeography-Based Optimization(BBO)for feature selection;the weight value is a vital parameter for finding the best features in PD classification.Originality/value-PD classification is done by using ensemble learning classification approaches such as hybrid classifier of fuzzy K-nearest neighbor,kernel support vector machines,fuzzy convolutional neural network and random forest.The suggested classifiers are trained using data from UCIMLrepository,and their results are verified using leave-one-person-out cross validation.The measures employed to assess the classifier efficiency include accuracy,F-measure,Matthews correlation coefficient.展开更多
基金Supported by the Project of the Science and Technology Plan of Chongqing City
文摘The motivation of data mining is how to extract effective information from huge data in very large database. However, some redundant and irrelevant attributes, which result in low performance and high computing complexity, are included in the very large database in general.So, Feature Subset Selection (FSS) becomes one important issue in the field of data mining. In this letter, an FSS model based on the filter approach is built, which uses the simulated annealing genetic algorithm. Experimental results show that convergence and stability of this algorithm are adequately achieved.
基金This work was supported by the National Basic Research Program of China(No.2001CB309403)
文摘In order to select effective feature subsets for pattern classification, a novel statistics rough set method is presented based on generalized attribute reduction. Unlike classical reduction approaches, the objects in universe of discourse are signs of training sample sets and values of attributes are taken as statistical parameters. The binary relation and discernibility matrix for the reduction are induced by distance function. Furthermore, based on the monotony of the distance function defined by Mahalanobis distance, the effective feature subsets are obtained as generalized attribute reducts. Experiment result shows that the classification performance can be improved by using the selected feature subsets.
文摘Feature Subset Selection(FSS)is an NP-hard problem to remove redundant and irrelevant features particularly from medical data,and it can be effectively addressed by metaheuristic algorithms.However,existing binary versions of metaheuristic algorithms have issues with convergence and lack an effective binarization method,resulting in suboptimal solutions that hinder diagnosis and prediction accuracy.This paper aims to propose an Improved Binary Quantum-based Avian Navigation Optimizer Algorithm(IBQANA)for FSS in medical data preprocessing to address the suboptimal solutions arising from binary versions of metaheuristic algorithms.The proposed IBQANA’s contributions include the Hybrid Binary Operator(HBO)and the Distance-based Binary Search Strategy(DBSS).HBO is designed to convert continuous values into binary solutions,even for values outside the[0,1]range,ensuring accurate binary mapping.On the other hand,DBSS is a two-phase search strategy that enhances the performance of inferior search agents and accelerates convergence.By combining exploration and exploitation phases based on an adaptive probability function,DBSS effectively avoids local optima.The effectiveness of applying HBO is compared with five transfer function families and thresholding on 12 medical datasets,with feature numbers ranging from 8 to 10,509.IBQANA's effectiveness is evaluated regarding the accuracy,fitness,and selected features and compared with seven binary metaheuristic algorithms.Furthermore,IBQANA is utilized to detect COVID-19.The results reveal that the proposed IBQANA outperforms all comparative algorithms on COVID-19 and 11 other medical datasets.The proposed method presents a promising solution to the FSS problem in medical data preprocessing.
文摘A relationship between lung transplant success and many features of recipients’/donors has long been studied.However,modeling a robust model of a potential impact on organ transplant success has proved challenging.In this study,a hybrid feature selection model was developed based on ant colony opti-mization(ACO)and k-nearest neighbor(kNN)classifier to investigate the rela-tionship between the most defining features of recipients/donors and lung transplant success using data from the United Network of Organ Sharing(UNOS).The proposed ACO-kNN approach explores the features space to identify the representative attributes and classify patients’functional status(i.e.,quality of life)after lung transplantation.The efficacy of the proposed model was verified using 3,684 records and 118 input features from the UNOS.The developed approach examined the reliability and validity of the lung allocation process.The results are promising regarding accuracy prediction to be 91.3%and low computational time,along with better decision capabilities,emphasizing the potential for automatic classification of the lung and other organs allocation pro-cesses.In addition,the proposed model recommends a new perspective on how medical experts and clinicians respond to uncertain and challenging lung alloca-tion strategies.Having such ACO-kNN model,a medical professional can sum-marize information through the proposed method and make decisions for the upcoming transplants to allocate the donor organ.
基金The authors acknowledge that this study was financially supported by the National Key R&D Programs of China(No.2017YFB0504201)the Strategic Priority Research Program of Chinese Academy of Sciences(No.XDA20020101)+1 种基金and the Natural Science Foundation of China(No.61473286 and No.61375002)Our sincere thanks go to the students at the State Key Laboratory of Remote Sensing Science for their assistance during the field survey campaigns.
文摘Remote sensing is an important technical means to investigate land resources.Optical imagery has been widely used in crop classification and can show changes in moisture and chlorophyll content in crop leaves,whereas synthetic aperture radar(SAR)imagery is sensitive to changes in growth states and morphological structures.Crop-type mapping with a single type of imagery sometimes has unsatisfactory precision,so providing precise spatiotemporal information on crop type at a local scale for agricultural applications is difficult.To explore the abilities of combining optical and SAR images and to solve the problem of inaccurate spatial information for land parcels,a new method is proposed in this paper to improve crop-type identification accuracy.Multifeatures were derived from the full polarimetric SAR data(GaoFen-3)and a high-resolution optical image(GaoFen-2),and the farmland parcels used as the basic for object-oriented classification were obtained from the GaoFen-2 image using optimal scale segmentation.A novel feature subset selection method based on within-class aggregation and between-class scatter(WA-BS)is proposed to extract the optimal feature subset.Finally,crop-type mapping was produced by a support vector machine(SVM)classifier.The results showed that the proposed method achieved good classification results with an overall accuracy of 89.50%,which is better than the crop classification results derived from SAR-based segmentation.Compared with the ReliefF,mRMR and LeastC feature selection algorithms,the WA-BS algorithm can effectively remove redundant features that are strongly correlated and obtain a high classification accuracy via the obtained optimal feature subset.This study shows that the accuracy of crop-type mapping in an area with multiple cropping patterns can be improved by the combination of optical and SAR remote sensing images.
文摘The significance of the preprocessing stage in any data mining task is well known. Before attempting medical data classification, characteristics of medical datasets, including noise, incompleteness, and the existence of multiple and possibly irrelevant features, need to be addressed. In this paper, we show that selecting the right combination of prepro- cessing methods has a considerable impact on the classification potential of a dataset. The preprocessing operations con- sidered include the discretization of numeric attributes, the selection of attribute subset(s), and the handling of missing values. The classification is performed by an ant colony optimization algorithm as a case study. Experimental results on 25 real-world medical datasets show that a significant relative improvement in predictive accuracy, exceeding 60% in some cases, is obtained.
文摘Purpose-Conventional diagnostic techniques,on the other hand,may be prone to subjectivity since they depend on assessment of motions that are often subtle to individual eyes and hence hard to classify,potentially resulting in misdiagnosis.Meanwhile,early nonmotor signs of Parkinson’s disease(PD)can be mild and may be due to variety of other conditions.As a result,these signs are usually ignored,making early PD diagnosis difficult.Machine learning approaches for PD classification and healthy controls or individuals with similar medical symptoms have been introduced to solve these problems and to enhance the diagnostic and assessment processes of PD(like,movement disorders or other Parkinsonian syndromes).Design/methodology/approach-Medical observations and evaluation of medical symptoms,including characterization of a wide range of motor indications,are commonly used to diagnose PD.The quantity of the data being processed has grown in the last five years;feature selection has become a prerequisite before any classification.This study introduces a feature selection method based on the score-based artificial fish swarm algorithm(SAFSA)to overcome this issue.Findings-This study adds to the accuracy of PD identification by reducing the amount of chosen vocal features while to use the most recent and largest publicly accessible database.Feature subset selection in PD detection techniques starts by eliminating features that are not relevant or redundant.According to a few objective functions,features subset chosen should provide the best performance.Research limitations/implications-In many situations,this is an Nondeterministic Polynomial Time(NPHard)issue.This method enhances the PD detection rate by selecting the most essential features from the database.To begin,the data set’s dimensionality is reduced using Singular Value Decomposition dimensionality technique.Next,Biogeography-Based Optimization(BBO)for feature selection;the weight value is a vital parameter for finding the best features in PD classification.Originality/value-PD classification is done by using ensemble learning classification approaches such as hybrid classifier of fuzzy K-nearest neighbor,kernel support vector machines,fuzzy convolutional neural network and random forest.The suggested classifiers are trained using data from UCIMLrepository,and their results are verified using leave-one-person-out cross validation.The measures employed to assess the classifier efficiency include accuracy,F-measure,Matthews correlation coefficient.