Although airborne hyperspectral data with detailed spatial and spectral information has demonstrated significant potential for tree species classification,it has not been widely used over large areas.A comprehensive p...Although airborne hyperspectral data with detailed spatial and spectral information has demonstrated significant potential for tree species classification,it has not been widely used over large areas.A comprehensive process based on multi-flightline airborne hyperspectral data is lacking over large,forested areas influenced by both the effects of bidirectional reflectance distribution function(BRDF)and cloud shadow contamination.In this study,hyperspectral data were collected over the Mengjiagang Forest Farm in Northeast China in the summer of 2017 using the Chinese Academy of Forestry's LiDAR,CCD,and hyperspectral systems(CAF-LiCHy).After BRDF correction and cloud shadow detection processing,a tree species classification workflow was developed for sunlit and cloud-shaded forest areas with input features of minimum noise fraction reduced bands,spectral vegetation indices,and texture information.Results indicate that BRDF-corrected sunlit hyperspectral data can provide a stable and high classification accuracy based on representative training data.Cloud-shaded pixels also have good spectral separability for species classification.The red-edge spectral information and ratio-based spectral indices with high importance scores are recommended as input features for species classification under varying light conditions.According to the classification accuracies through field survey data at multiple spatial scales,it was found that species classification within an extensive forest area using airborne hyperspectral data under various illuminations can be successfully carried out using the effective radiometric consistency process and feature selection strategy.展开更多
Parkinson’s disease(PD)is a neurodegenerative disease cause by a deficiency of dopamine.Investigators have identified the voice as the underlying symptom of PD.Advanced vocal disorder studies provide adequate treatment...Parkinson’s disease(PD)is a neurodegenerative disease cause by a deficiency of dopamine.Investigators have identified the voice as the underlying symptom of PD.Advanced vocal disorder studies provide adequate treatment and support for accurate PD detection.Machine learning(ML)models have recently helped to solve problems in the classification of chronic diseases.This work aims to analyze the effect of selecting features on ML efficiency on a voice-based PD detection system.It includes PD classification models of Random forest,decision Tree,neural network,logistic regression and support vector machine.The feature selection is made by RF mean-decrease in accuracy and mean-decrease in Gini techniques.Random forest kerb feature selection(RFKFS)selects only 17 features from 754 attributes.The proposed technique uses validation metrics to assess the performance of ML models.The results of the RF model with feature selection performed well among all other models with high accuracy score of 96.56%and a precision of 88.02%,a sensitivity of 98.26%,a specificity of 96.06%.The respective validation score has an Non polynomial vector(NPV)of 99.47%,a Geometric Mean(GM)of 97.15%,a Youden’s index(YI)of 94.32%,and a Matthews’s correlation method(MCC)90.84%.The proposed model is also more robust than other models.It was also realised that using the RFKFS approach in the PD results in an effective and high-performing medical classifier.展开更多
Object-based classification differentiates forest gaps from canopies at large regional scale by using remote sensing data. To study the segmentation and classification processes of object-based forest gaps classificat...Object-based classification differentiates forest gaps from canopies at large regional scale by using remote sensing data. To study the segmentation and classification processes of object-based forest gaps classification at a regional scale, we sampled a natural secondary forest in northeast China at Maoershan Experimental Forest Farm.Airborne light detection and ranging(LiDAR; 3.7 points/m2) data were collected as the original data source and the canopy height model(CHM) and topographic dataset were extracted from the LiDAR data. The accuracy of objectbased forest gaps classification depends on previous segmentation. Thus our first step was to define 10 different scale parameters in CHM image segmentation. After image segmentation, the machine learning classification method was used to classify three kinds of object classes, namely,forest gaps, tree canopies, and others. The common support vector machine(SVM) classifier with the radial basis function kernel(RBF) was first adopted to test the effect of classification features(vegetation height features and some typical topographic features) on forest gap classification.Then the different classifiers(KNN, Bayes, decision tree,and SVM with linear kernel) were further adopted to compare the effect of classifiers on machine learning forest gaps classification. Segmentation accuracy and classification accuracy were evaluated by using Mo¨ller's method and confusion metrics, respectively. The scale parameter had a significant effect on object-based forest gap segmentation and classification. Classification accuracies at different scales revealed that there were two optimal scales(10 and 20) that provided similar accuracy, with the scale of 10 yielding slightly greater accuracy than 20. The accuracy of the classification by using combination of height features and SVM classifier with linear kernel was91% at the optimal scale parameter of 10, and it was highest comparing with other classification classifiers, such as SVM RBF(90%), Decision Tree(90%), Bayes(90%),or KNN(87%). The classifiers had no significant effect on forest gap classification, but the fewer parameters in the classifier equation and higher speed of operation probably lead to a higher accuracy of final classifications. Our results confirm that object-based classification can extract forest gaps at a large regional scale with appropriate classification features and classifiers using LiDAR data. We note, however, that final satisfaction of forest gap classification depends on the determination of optimal scale(s) of segmentation.展开更多
As an important non-ferrous metal structural material most used in industry and production,aluminum(Al) alloy shows its great value in the national economy and industrial manufacturing.How to classify Al alloy rapidly...As an important non-ferrous metal structural material most used in industry and production,aluminum(Al) alloy shows its great value in the national economy and industrial manufacturing.How to classify Al alloy rapidly and accurately is a significant, popular and meaningful task.Classification methods based on laser-induced breakdown spectroscopy(LIBS) have been reported in recent years. Although LIBS is an advanced detection technology, it is necessary to combine it with some algorithm to reach the goal of rapid and accurate classification. As an important machine learning method, the random forest(RF) algorithm plays a great role in pattern recognition and material classification. This paper introduces a rapid classification method of Al alloy based on LIBS and the RF algorithm. The results show that the best accuracy that can be reached using this method to classify Al alloy samples is 98.59%, the average of which is 98.45%. It also reveals through the relationship laws that the accuracy varies with the number of trees in the RF and the size of the training sample set in the RF. According to the laws, researchers can find out the optimized parameters in the RF algorithm in order to achieve,as expected, a good result. These results prove that LIBS with the RF algorithm can exactly classify Al alloy effectively, precisely and rapidly with high accuracy, which obviously has significant practical value.展开更多
We developed a forest type classification technology for the Daxing'an Mountains of northeast China using multisource remote sensing data.A SPOT-5 image and two temporal images of RADARSAT-2 full-polarization SAR wer...We developed a forest type classification technology for the Daxing'an Mountains of northeast China using multisource remote sensing data.A SPOT-5 image and two temporal images of RADARSAT-2 full-polarization SAR were used to identify forest types in the Pangu Forest Farm of the Daxing'an Mountains.Forest types were identified using random forest(RF) classification with the following data combination types: SPOT-5 alone,SPOT-5 and SAR images in August or November,and SPOT-5 and two temporal SAR images.We identified many forest types using a combination of multitemporal SAR and SPOT-5 images,including Betula platyphylla,Larix gmelinii,Pinus sylvestris and Picea koraiensis forests.The accuracy of classification exceeded 88% and improved by 12% when compared to the classification results obtained using SPOT data alone.RF classification using a combination of multisource remote sensing data improved classification accuracy compared to that achieved using single-source remote sensing data.展开更多
Automatic sleep staging of neonates is essential for monitoring their brain development and maturity of the nervous system.EEG based neonatal sleep staging provides valuable information about an infant’s growth and h...Automatic sleep staging of neonates is essential for monitoring their brain development and maturity of the nervous system.EEG based neonatal sleep staging provides valuable information about an infant’s growth and health,but is challenging due to the unique characteristics of EEG and lack of standardized protocols.This study aims to develop and compare 18 machine learning models using Automated Machine Learning(autoML)technique for accurate and reliable multi-channel EEG-based neonatal sleep-wake classification.The study investigates autoML feasibility without extensive manual selection of features or hyperparameter tuning.The data is obtained from neonates at post-menstrual age 37±05 weeks.352530-s EEG segments from 19 infants are used to train and test the proposed models.There are twelve time and frequency domain features extracted from each channel.Each model receives the common features of nine channels as an input vector of size 108.Each model’s performance was evaluated based on a variety of evaluation metrics.The maximum mean accuracy of 84.78%and kappa of 69.63%has been obtained by the AutoML-based Random Forest estimator.This is the highest accuracy for EEG-based sleep-wake classification,until now.While,for the AutoML-based Adaboost Random Forest model,accuracy and kappa were 84.59%and 69.24%,respectively.High performance achieved in the proposed autoML-based approach can facilitate early identification and treatment of sleep-related issues in neonates.展开更多
Based on the plot data from the investigation and the theory of forest ecology and ecological system,the site classification of the eastern forest region of Daxing’an Mountains was made by mean of mathematical method...Based on the plot data from the investigation and the theory of forest ecology and ecological system,the site classification of the eastern forest region of Daxing’an Mountains was made by mean of mathematical method. The main factors were slope, thickness of soil layer, slope position and slope aspect. Grades of slope were used as the division standard for site type group. The slope aspect, slope position and thickness of soil layer were used as the division standards for site type. Altogether 7 site type groups and 15 main site types were determined the region. It provided reliable fundamental basis for the reasonable management and planting design in the area.展开更多
Acute leukemia is an aggressive disease that has high mortality rates worldwide.The error rate can be as high as 40%when classifying acute leukemia into its subtypes.So,there is an urgent need to support hematologists...Acute leukemia is an aggressive disease that has high mortality rates worldwide.The error rate can be as high as 40%when classifying acute leukemia into its subtypes.So,there is an urgent need to support hematologists during the classification process.More than two decades ago,researchers used microarray gene expression data to classify cancer and adopted acute leukemia as a test case.The high classification accuracy they achieved confirmed that it is possible to classify cancer subtypes using microarray gene expression data.Ensemble machine learning is an effective method that combines individual classifiers to classify new samples.Ensemble classifiers are recognized as powerful algorithms with numerous advantages over traditional classifiers.Over the past few decades,researchers have focused a great deal of attention on ensemble classifiers in a wide variety of fields,including but not limited to disease diagnosis,finance,bioinformatics,healthcare,manufacturing,and geography.This paper reviews the recent ensemble classifier approaches utilized for acute leukemia gene expression data classification.Moreover,a framework for classifying acute leukemia gene expression data is proposed.The pairwise correlation gene selection method and the Rotation Forest of Bayesian Networks are both used in this framework.Experimental outcomes show that the classification accuracy achieved by the acute leukemia ensemble classifiers constructed according to the suggested framework is good compared to the classification accuracy achieved in other studies.展开更多
The performance of medical image classification has been enhanced by deep convolutional neural networks(CNNs),which are typically trained with cross-entropy(CE)loss.However,when the label presents an intrinsic ordinal...The performance of medical image classification has been enhanced by deep convolutional neural networks(CNNs),which are typically trained with cross-entropy(CE)loss.However,when the label presents an intrinsic ordinal property in nature,e.g.,the development from benign to malignant tumor,CE loss cannot take into account such ordinal information to allow for better generalization.To improve model generalization with ordinal information,we propose a novel meta ordinal regression forest(MORF)method for medical image classification with ordinal labels,which learns the ordinal relationship through the combination of convolutional neural network and differential forest in a meta-learning framework.The merits of the proposed MORF come from the following two components:A tree-wise weighting net(TWW-Net)and a grouped feature selection(GFS)module.First,the TWW-Net assigns each tree in the forest with a specific weight that is mapped from the classification loss of the corresponding tree.Hence,all the trees possess varying weights,which is helpful for alleviating the tree-wise prediction variance.Second,the GFS module enables a dynamic forest rather than a fixed one that was previously used,allowing for random feature perturbation.During training,we alternatively optimize the parameters of the CNN backbone and TWW-Net in the meta-learning framework through calculating the Hessian matrix.Experimental results on two medical image classification datasets with ordinal labels,i.e.,LIDC-IDRI and Breast Ultrasound datasets,demonstrate the superior performances of our MORF method over existing state-of-the-art methods.展开更多
Decision forest is a well-renowned machine learning technique to address the detection and prediction problems related to clinical data.But,the tra-ditional decision forest(DF)algorithms have lower classification accu...Decision forest is a well-renowned machine learning technique to address the detection and prediction problems related to clinical data.But,the tra-ditional decision forest(DF)algorithms have lower classification accuracy and cannot handle high-dimensional feature space effectively.In this work,we pro-pose a bootstrap decision forest using penalizing attributes(BFPA)algorithm to predict heart disease with higher accuracy.This work integrates a significance-based attribute selection(SAS)algorithm with the BFPA classifier to improve the performance of the diagnostic system in identifying cardiac illness.The pro-posed SAS algorithm is used to determine the correlation among attributes and to select the optimum subset of feature space for learning and testing processes.BFPA selects the optimal number of learning and testing data points as well as the density of trees in the forest to realize higher prediction accuracy in classifying imbalanced datasets effectively.The effectiveness of the developed classifier is cautiously verified on the real-world database(i.e.,Heart disease dataset from UCI repository)by relating its enactment with many advanced approaches with respect to the accuracy,sensitivity,specificity,precision,and intersection over-union(IoU).The empirical results demonstrate that the intended classification approach outdoes other approaches with superior enactment regarding the accu-racy,precision,sensitivity,specificity,and IoU of 94.7%,99.2%,90.1%,91.1%,and 90.4%,correspondingly.Additionally,we carry out Wilcoxon’s rank-sum test to determine whether our proposed classifier with feature selection method enables a noteworthy enhancement related to other classifiers or not.From the experimental results,we can conclude that the integration of SAS and BFPA outperforms other classifiers recently reported in the literature.展开更多
With the rapid development and popularization of new-generation technologies such as cloud computing,big data,and artificial intelligence,the construction of smart grids has become more diversified.Accurate quick read...With the rapid development and popularization of new-generation technologies such as cloud computing,big data,and artificial intelligence,the construction of smart grids has become more diversified.Accurate quick reading and classification of the electricity consumption of residential users can provide a more in-depth perception of the actual power consumption of residents,which is essential to ensure the normal operation of the power system,energy management and planning.Based on the distributed architecture of cloud computing,this paper designs an improved random forest residential electricity classification method.It uses the unique out-of-bag error of random forest and combines the Drosophila algorithm to optimize the internal parameters of the random forest,thereby improving the performance of the random forest algorithm.This method uses MapReduce to train an improved random forest model on the cloud computing platform,and then uses the trained model to analyze the residential electricity consumption data set,divides all residents into 5 categories,and verifies the effectiveness of the model through experiments and feasibility.展开更多
Highly accurate vegetative type distribution information is of great significance for forestry resource monitoring and management.In order to improve the classification accuracy of forest types,Sentinel-1 and 2 data o...Highly accurate vegetative type distribution information is of great significance for forestry resource monitoring and management.In order to improve the classification accuracy of forest types,Sentinel-1 and 2 data of Changbai Mountain protection development zone were selected,and combined with DEM to construct a multi-featured random forest type classification model incorporating fusing intensity,texture,spectral,vegetation index and topography information and using random forest Gini index(GI)for optimization.The overall accuracy of classification was 94.60%and the Kappa coefficient was 0.933.Comparing the classification results before and after feature optimization,it shows that feature optimization has a greater impact on the classification accuracy.Comparing the classification results of random forest,maximum likelihood method and CART decision tree under the same conditions,it shows that the random forest has a higher performance and can be applied to forestry research work such as forest resource survey and monitoring.展开更多
In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (...In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.展开更多
Many researches have been performed comparing object-based classification (OBC) and pixel-based classification (PBC), particularly in classifying high-resolution satellite images. VNREDSat-1 is the first optical remot...Many researches have been performed comparing object-based classification (OBC) and pixel-based classification (PBC), particularly in classifying high-resolution satellite images. VNREDSat-1 is the first optical remote sensing satellite of Vietnam with resolution of 2.5 m (Panchromatic) and 10 m (Multispectral). The objective of this research is to compare two classification approaches using VNREDSat-1 image for mapping mangrove forest in Vien An Dong commune, Ngoc Hien district, Ca Mau province. ISODATA algorithm (in PBC method) and membership function classifier (in OBC method) were chosen to classify the same image. The results show that the overall accuracies of OBC and PBC are 73% and 62.16% respectively, and OBC solved the “salt and pepper” which is the main issue of PBC as well. Therefore, OBC is supposed to be the better approach to classify VNREDSat-1 for mapping mangrove forest in Ngoc Hien commune.展开更多
Land for protective forest on the coast has special site conditions, and site classification is the scientific basis for seaboard afforestation. The site classification system on the coast zone and islands of China ma...Land for protective forest on the coast has special site conditions, and site classification is the scientific basis for seaboard afforestation. The site classification system on the coast zone and islands of China may be classified into five levels-site region (sub - region), district, class, group, and type. The land division for afforestation is carried out by the principle of enviornmental heterogeneity among regions, sub-region and district on large scale, according to the difference of air temperature, moisture and type of coast geomorphy. It may be classified into 7 regions, 12 sub-regions and 55 districts. The medium and small scaled division for site class, group and type, subdivided in a site district, are based on medium topography, topographic climate, micro - relief and soil conditions.展开更多
This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to...This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit proficiency in recognizing patterns associated with fire within images. By training on labeled data, SVMs acquire the ability to identify distinctive attributes associated with fire, such as flames, smoke, or alterations in the visual characteristics of the forest area. The document thoroughly examines the use of SVMs, covering crucial elements like data preprocessing, feature extraction, and model training. It rigorously evaluates parameters such as accuracy, efficiency, and practical applicability. The knowledge gained from this study aids in the development of efficient forest fire detection systems, enabling prompt responses and improving disaster management. Moreover, the correlation between SVM accuracy and the difficulties presented by high-dimensional datasets is carefully investigated, demonstrated through a revealing case study. The relationship between accuracy scores and the different resolutions used for resizing the training datasets has also been discussed in this article. These comprehensive studies result in a definitive overview of the difficulties faced and the potential sectors requiring further improvement and focus.展开更多
The aim of this work was to differentiate Atlantic Forest patches, as well as their spatial distribution, from other tree covers that compose the landscape, by comparing three methods of digital images classification,...The aim of this work was to differentiate Atlantic Forest patches, as well as their spatial distribution, from other tree covers that compose the landscape, by comparing three methods of digital images classification, using techniques of geoprocessing and remote sensing. The study area was a sub-basin of the Iperó River, tributary of the Iperó-Mirim stream, Sarapuí River basin, in Araçoiaba da Serra, State of São Paulo, Brazil. This research has been developed on a Geographic Information System environment platform, using medium resolution images from Sentinel-2 Satellite. Three image classification algorithms: Maximum Likelihood Classification (MLC), Support Vector Machines (SVM) and Random Tree (RT) were applied to verify the separability of forest patches, forestry and other uses. The results were analyzed by means of a confusion matrix, accuracy and kappa index, thus showing that the three algorithms were able to successfully differentiate the targets, with the higher efficiency attributed to MLC and the lowest to RT. Overall, the three classifiers presented errors, but specifically for the forest patches, the highest accuracy was obtained from SVM.展开更多
This study designed an approach to derive land-cover in the South Africa with insufficient ground samples, and made a case demonstration in Nzhelele and Levhuvu catchments, South Africa. The method was developed based...This study designed an approach to derive land-cover in the South Africa with insufficient ground samples, and made a case demonstration in Nzhelele and Levhuvu catchments, South Africa. The method was developed based on an integration of Landsat 8, Sentinel-1, and Shuttle Radar Topography Mission(SRTM) Digital Elevation Model(DEM), and the Google Earth Engine(GEE) platform. Random forest classifier with 300 trees is employed as land-cover classification model. In order to overcome the defect of insufficient ground data, the stratified sampling method was used to generate the training and validation samples from the existing land-cover product. Likewise, in order to recognize different land-cover categories, the percentile and monthly median composites were employed to expand input metrics of random forest classifier. Results showed that the overall accuracy of the land-cover of Nzhelele and Levhuvu catchments, South Africa in 2017–2018 reached to 76.43%. Three important results can be drawn from our research. 1) The participation of Sentinel-1 data can slightly improve overall accuracy of land-cover while its contribution on land-cover classification varied with land types. 2) Under-fitting problem was observed in the training of non-dominant land-cover categories using the random sampling, the stratified sampling method is recommended to make sure the classification accuracy of non-dominant classes. 3) When related reflectance bands participated in the training process, individual Normalized Difference Vegetation index(NDVI), Enhanced Vegetation Index(EVI), Soil Adjusted Vegetation Index(SAVI), Normalized Difference Built-up Index(NDBI) have little effect on final land-cover classification result.展开更多
The diversity of tree species and the complexity of land use in cities create challenging issues for tree species classification.The combination of deep learning methods and RGB optical images obtained by unmanned aer...The diversity of tree species and the complexity of land use in cities create challenging issues for tree species classification.The combination of deep learning methods and RGB optical images obtained by unmanned aerial vehicles(UAVs) provides a new research direction for urban tree species classification.We proposed an RGB optical image dataset with 10 urban tree species,termed TCC10,which is a benchmark for tree canopy classification(TCC).TCC10 dataset contains two types of data:tree canopy images with simple backgrounds and those with complex backgrounds.The objective was to examine the possibility of using deep learning methods(AlexNet,VGG-16,and ResNet-50) for individual tree species classification.The results of convolutional neural networks(CNNs) were compared with those of K-nearest neighbor(KNN) and BP neural network.Our results demonstrated:(1) ResNet-50 achieved an overall accuracy(OA) of 92.6% and a kappa coefficient of 0.91 for tree species classification on TCC10 and outperformed AlexNet and VGG-16.(2) The classification accuracy of KNN and BP neural network was less than70%,while the accuracy of CNNs was relatively higher.(3)The classification accuracy of tree canopy images with complex backgrounds was lower than that for images with simple backgrounds.For the deciduous tree species in TCC10,the classification accuracy of ResNet-50 was higher in summer than that in autumn.Therefore,the deep learning is effective for urban tree species classification using RGB optical images.展开更多
基金supported by the National Natural Science Foundation of China (Grant No.42101403)the National Key Researchand Development Program of China (Grant No.2017YFD0600404)。
文摘Although airborne hyperspectral data with detailed spatial and spectral information has demonstrated significant potential for tree species classification,it has not been widely used over large areas.A comprehensive process based on multi-flightline airborne hyperspectral data is lacking over large,forested areas influenced by both the effects of bidirectional reflectance distribution function(BRDF)and cloud shadow contamination.In this study,hyperspectral data were collected over the Mengjiagang Forest Farm in Northeast China in the summer of 2017 using the Chinese Academy of Forestry's LiDAR,CCD,and hyperspectral systems(CAF-LiCHy).After BRDF correction and cloud shadow detection processing,a tree species classification workflow was developed for sunlit and cloud-shaded forest areas with input features of minimum noise fraction reduced bands,spectral vegetation indices,and texture information.Results indicate that BRDF-corrected sunlit hyperspectral data can provide a stable and high classification accuracy based on representative training data.Cloud-shaded pixels also have good spectral separability for species classification.The red-edge spectral information and ratio-based spectral indices with high importance scores are recommended as input features for species classification under varying light conditions.According to the classification accuracies through field survey data at multiple spatial scales,it was found that species classification within an extensive forest area using airborne hyperspectral data under various illuminations can be successfully carried out using the effective radiometric consistency process and feature selection strategy.
文摘Parkinson’s disease(PD)is a neurodegenerative disease cause by a deficiency of dopamine.Investigators have identified the voice as the underlying symptom of PD.Advanced vocal disorder studies provide adequate treatment and support for accurate PD detection.Machine learning(ML)models have recently helped to solve problems in the classification of chronic diseases.This work aims to analyze the effect of selecting features on ML efficiency on a voice-based PD detection system.It includes PD classification models of Random forest,decision Tree,neural network,logistic regression and support vector machine.The feature selection is made by RF mean-decrease in accuracy and mean-decrease in Gini techniques.Random forest kerb feature selection(RFKFS)selects only 17 features from 754 attributes.The proposed technique uses validation metrics to assess the performance of ML models.The results of the RF model with feature selection performed well among all other models with high accuracy score of 96.56%and a precision of 88.02%,a sensitivity of 98.26%,a specificity of 96.06%.The respective validation score has an Non polynomial vector(NPV)of 99.47%,a Geometric Mean(GM)of 97.15%,a Youden’s index(YI)of 94.32%,and a Matthews’s correlation method(MCC)90.84%.The proposed model is also more robust than other models.It was also realised that using the RFKFS approach in the PD results in an effective and high-performing medical classifier.
基金financially supported by grant from National Natural Science Foundation of China(No.31300533)
文摘Object-based classification differentiates forest gaps from canopies at large regional scale by using remote sensing data. To study the segmentation and classification processes of object-based forest gaps classification at a regional scale, we sampled a natural secondary forest in northeast China at Maoershan Experimental Forest Farm.Airborne light detection and ranging(LiDAR; 3.7 points/m2) data were collected as the original data source and the canopy height model(CHM) and topographic dataset were extracted from the LiDAR data. The accuracy of objectbased forest gaps classification depends on previous segmentation. Thus our first step was to define 10 different scale parameters in CHM image segmentation. After image segmentation, the machine learning classification method was used to classify three kinds of object classes, namely,forest gaps, tree canopies, and others. The common support vector machine(SVM) classifier with the radial basis function kernel(RBF) was first adopted to test the effect of classification features(vegetation height features and some typical topographic features) on forest gap classification.Then the different classifiers(KNN, Bayes, decision tree,and SVM with linear kernel) were further adopted to compare the effect of classifiers on machine learning forest gaps classification. Segmentation accuracy and classification accuracy were evaluated by using Mo¨ller's method and confusion metrics, respectively. The scale parameter had a significant effect on object-based forest gap segmentation and classification. Classification accuracies at different scales revealed that there were two optimal scales(10 and 20) that provided similar accuracy, with the scale of 10 yielding slightly greater accuracy than 20. The accuracy of the classification by using combination of height features and SVM classifier with linear kernel was91% at the optimal scale parameter of 10, and it was highest comparing with other classification classifiers, such as SVM RBF(90%), Decision Tree(90%), Bayes(90%),or KNN(87%). The classifiers had no significant effect on forest gap classification, but the fewer parameters in the classifier equation and higher speed of operation probably lead to a higher accuracy of final classifications. Our results confirm that object-based classification can extract forest gaps at a large regional scale with appropriate classification features and classifiers using LiDAR data. We note, however, that final satisfaction of forest gap classification depends on the determination of optimal scale(s) of segmentation.
基金supported by National High Technology Research and Development Program of China (863 Program. No. 2013AA102402)
文摘As an important non-ferrous metal structural material most used in industry and production,aluminum(Al) alloy shows its great value in the national economy and industrial manufacturing.How to classify Al alloy rapidly and accurately is a significant, popular and meaningful task.Classification methods based on laser-induced breakdown spectroscopy(LIBS) have been reported in recent years. Although LIBS is an advanced detection technology, it is necessary to combine it with some algorithm to reach the goal of rapid and accurate classification. As an important machine learning method, the random forest(RF) algorithm plays a great role in pattern recognition and material classification. This paper introduces a rapid classification method of Al alloy based on LIBS and the RF algorithm. The results show that the best accuracy that can be reached using this method to classify Al alloy samples is 98.59%, the average of which is 98.45%. It also reveals through the relationship laws that the accuracy varies with the number of trees in the RF and the size of the training sample set in the RF. According to the laws, researchers can find out the optimized parameters in the RF algorithm in order to achieve,as expected, a good result. These results prove that LIBS with the RF algorithm can exactly classify Al alloy effectively, precisely and rapidly with high accuracy, which obviously has significant practical value.
基金supported by the National Natural Science Foundation of China(Nos.31500518,31500519,and 31470640)
文摘We developed a forest type classification technology for the Daxing'an Mountains of northeast China using multisource remote sensing data.A SPOT-5 image and two temporal images of RADARSAT-2 full-polarization SAR were used to identify forest types in the Pangu Forest Farm of the Daxing'an Mountains.Forest types were identified using random forest(RF) classification with the following data combination types: SPOT-5 alone,SPOT-5 and SAR images in August or November,and SPOT-5 and two temporal SAR images.We identified many forest types using a combination of multitemporal SAR and SPOT-5 images,including Betula platyphylla,Larix gmelinii,Pinus sylvestris and Picea koraiensis forests.The accuracy of classification exceeded 88% and improved by 12% when compared to the classification results obtained using SPOT data alone.RF classification using a combination of multisource remote sensing data improved classification accuracy compared to that achieved using single-source remote sensing data.
文摘Automatic sleep staging of neonates is essential for monitoring their brain development and maturity of the nervous system.EEG based neonatal sleep staging provides valuable information about an infant’s growth and health,but is challenging due to the unique characteristics of EEG and lack of standardized protocols.This study aims to develop and compare 18 machine learning models using Automated Machine Learning(autoML)technique for accurate and reliable multi-channel EEG-based neonatal sleep-wake classification.The study investigates autoML feasibility without extensive manual selection of features or hyperparameter tuning.The data is obtained from neonates at post-menstrual age 37±05 weeks.352530-s EEG segments from 19 infants are used to train and test the proposed models.There are twelve time and frequency domain features extracted from each channel.Each model receives the common features of nine channels as an input vector of size 108.Each model’s performance was evaluated based on a variety of evaluation metrics.The maximum mean accuracy of 84.78%and kappa of 69.63%has been obtained by the AutoML-based Random Forest estimator.This is the highest accuracy for EEG-based sleep-wake classification,until now.While,for the AutoML-based Adaboost Random Forest model,accuracy and kappa were 84.59%and 69.24%,respectively.High performance achieved in the proposed autoML-based approach can facilitate early identification and treatment of sleep-related issues in neonates.
文摘Based on the plot data from the investigation and the theory of forest ecology and ecological system,the site classification of the eastern forest region of Daxing’an Mountains was made by mean of mathematical method. The main factors were slope, thickness of soil layer, slope position and slope aspect. Grades of slope were used as the division standard for site type group. The slope aspect, slope position and thickness of soil layer were used as the division standards for site type. Altogether 7 site type groups and 15 main site types were determined the region. It provided reliable fundamental basis for the reasonable management and planting design in the area.
文摘Acute leukemia is an aggressive disease that has high mortality rates worldwide.The error rate can be as high as 40%when classifying acute leukemia into its subtypes.So,there is an urgent need to support hematologists during the classification process.More than two decades ago,researchers used microarray gene expression data to classify cancer and adopted acute leukemia as a test case.The high classification accuracy they achieved confirmed that it is possible to classify cancer subtypes using microarray gene expression data.Ensemble machine learning is an effective method that combines individual classifiers to classify new samples.Ensemble classifiers are recognized as powerful algorithms with numerous advantages over traditional classifiers.Over the past few decades,researchers have focused a great deal of attention on ensemble classifiers in a wide variety of fields,including but not limited to disease diagnosis,finance,bioinformatics,healthcare,manufacturing,and geography.This paper reviews the recent ensemble classifier approaches utilized for acute leukemia gene expression data classification.Moreover,a framework for classifying acute leukemia gene expression data is proposed.The pairwise correlation gene selection method and the Rotation Forest of Bayesian Networks are both used in this framework.Experimental outcomes show that the classification accuracy achieved by the acute leukemia ensemble classifiers constructed according to the suggested framework is good compared to the classification accuracy achieved in other studies.
基金This work was supported in part by the Natural Science Foundation of Shanghai(21ZR1403600)the National Natural Science Foundation of China(62176059)+3 种基金Shanghai Municipal Science and Technology Major Project(2018SHZDZX01)Zhang Jiang Laboratory,Shanghai Sailing Program(21YF1402800)Shanghai Municipal of Science and Technology Project(20JC1419500)Shanghai Center for Brain Science and Brain-inspired Technology.
文摘The performance of medical image classification has been enhanced by deep convolutional neural networks(CNNs),which are typically trained with cross-entropy(CE)loss.However,when the label presents an intrinsic ordinal property in nature,e.g.,the development from benign to malignant tumor,CE loss cannot take into account such ordinal information to allow for better generalization.To improve model generalization with ordinal information,we propose a novel meta ordinal regression forest(MORF)method for medical image classification with ordinal labels,which learns the ordinal relationship through the combination of convolutional neural network and differential forest in a meta-learning framework.The merits of the proposed MORF come from the following two components:A tree-wise weighting net(TWW-Net)and a grouped feature selection(GFS)module.First,the TWW-Net assigns each tree in the forest with a specific weight that is mapped from the classification loss of the corresponding tree.Hence,all the trees possess varying weights,which is helpful for alleviating the tree-wise prediction variance.Second,the GFS module enables a dynamic forest rather than a fixed one that was previously used,allowing for random feature perturbation.During training,we alternatively optimize the parameters of the CNN backbone and TWW-Net in the meta-learning framework through calculating the Hessian matrix.Experimental results on two medical image classification datasets with ordinal labels,i.e.,LIDC-IDRI and Breast Ultrasound datasets,demonstrate the superior performances of our MORF method over existing state-of-the-art methods.
文摘Decision forest is a well-renowned machine learning technique to address the detection and prediction problems related to clinical data.But,the tra-ditional decision forest(DF)algorithms have lower classification accuracy and cannot handle high-dimensional feature space effectively.In this work,we pro-pose a bootstrap decision forest using penalizing attributes(BFPA)algorithm to predict heart disease with higher accuracy.This work integrates a significance-based attribute selection(SAS)algorithm with the BFPA classifier to improve the performance of the diagnostic system in identifying cardiac illness.The pro-posed SAS algorithm is used to determine the correlation among attributes and to select the optimum subset of feature space for learning and testing processes.BFPA selects the optimal number of learning and testing data points as well as the density of trees in the forest to realize higher prediction accuracy in classifying imbalanced datasets effectively.The effectiveness of the developed classifier is cautiously verified on the real-world database(i.e.,Heart disease dataset from UCI repository)by relating its enactment with many advanced approaches with respect to the accuracy,sensitivity,specificity,precision,and intersection over-union(IoU).The empirical results demonstrate that the intended classification approach outdoes other approaches with superior enactment regarding the accu-racy,precision,sensitivity,specificity,and IoU of 94.7%,99.2%,90.1%,91.1%,and 90.4%,correspondingly.Additionally,we carry out Wilcoxon’s rank-sum test to determine whether our proposed classifier with feature selection method enables a noteworthy enhancement related to other classifiers or not.From the experimental results,we can conclude that the integration of SAS and BFPA outperforms other classifiers recently reported in the literature.
基金This work was partially supported by the National Natural Science Foundation of China(61876089).
文摘With the rapid development and popularization of new-generation technologies such as cloud computing,big data,and artificial intelligence,the construction of smart grids has become more diversified.Accurate quick reading and classification of the electricity consumption of residential users can provide a more in-depth perception of the actual power consumption of residents,which is essential to ensure the normal operation of the power system,energy management and planning.Based on the distributed architecture of cloud computing,this paper designs an improved random forest residential electricity classification method.It uses the unique out-of-bag error of random forest and combines the Drosophila algorithm to optimize the internal parameters of the random forest,thereby improving the performance of the random forest algorithm.This method uses MapReduce to train an improved random forest model on the cloud computing platform,and then uses the trained model to analyze the residential electricity consumption data set,divides all residents into 5 categories,and verifies the effectiveness of the model through experiments and feasibility.
基金Supported by projects of National Natural Science Foundation of China(Nos.42171407,42077242)Natural Science Foundation of Jilin Province(No.20210101098JC)+1 种基金Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,MNR(No.KF-2020-05-024)National Key R&D Program of China(No.2021YFD1500100).
文摘Highly accurate vegetative type distribution information is of great significance for forestry resource monitoring and management.In order to improve the classification accuracy of forest types,Sentinel-1 and 2 data of Changbai Mountain protection development zone were selected,and combined with DEM to construct a multi-featured random forest type classification model incorporating fusing intensity,texture,spectral,vegetation index and topography information and using random forest Gini index(GI)for optimization.The overall accuracy of classification was 94.60%and the Kappa coefficient was 0.933.Comparing the classification results before and after feature optimization,it shows that feature optimization has a greater impact on the classification accuracy.Comparing the classification results of random forest,maximum likelihood method and CART decision tree under the same conditions,it shows that the random forest has a higher performance and can be applied to forestry research work such as forest resource survey and monitoring.
文摘In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.
文摘Many researches have been performed comparing object-based classification (OBC) and pixel-based classification (PBC), particularly in classifying high-resolution satellite images. VNREDSat-1 is the first optical remote sensing satellite of Vietnam with resolution of 2.5 m (Panchromatic) and 10 m (Multispectral). The objective of this research is to compare two classification approaches using VNREDSat-1 image for mapping mangrove forest in Vien An Dong commune, Ngoc Hien district, Ca Mau province. ISODATA algorithm (in PBC method) and membership function classifier (in OBC method) were chosen to classify the same image. The results show that the overall accuracies of OBC and PBC are 73% and 62.16% respectively, and OBC solved the “salt and pepper” which is the main issue of PBC as well. Therefore, OBC is supposed to be the better approach to classify VNREDSat-1 for mapping mangrove forest in Ngoc Hien commune.
文摘Land for protective forest on the coast has special site conditions, and site classification is the scientific basis for seaboard afforestation. The site classification system on the coast zone and islands of China may be classified into five levels-site region (sub - region), district, class, group, and type. The land division for afforestation is carried out by the principle of enviornmental heterogeneity among regions, sub-region and district on large scale, according to the difference of air temperature, moisture and type of coast geomorphy. It may be classified into 7 regions, 12 sub-regions and 55 districts. The medium and small scaled division for site class, group and type, subdivided in a site district, are based on medium topography, topographic climate, micro - relief and soil conditions.
文摘This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit proficiency in recognizing patterns associated with fire within images. By training on labeled data, SVMs acquire the ability to identify distinctive attributes associated with fire, such as flames, smoke, or alterations in the visual characteristics of the forest area. The document thoroughly examines the use of SVMs, covering crucial elements like data preprocessing, feature extraction, and model training. It rigorously evaluates parameters such as accuracy, efficiency, and practical applicability. The knowledge gained from this study aids in the development of efficient forest fire detection systems, enabling prompt responses and improving disaster management. Moreover, the correlation between SVM accuracy and the difficulties presented by high-dimensional datasets is carefully investigated, demonstrated through a revealing case study. The relationship between accuracy scores and the different resolutions used for resizing the training datasets has also been discussed in this article. These comprehensive studies result in a definitive overview of the difficulties faced and the potential sectors requiring further improvement and focus.
文摘The aim of this work was to differentiate Atlantic Forest patches, as well as their spatial distribution, from other tree covers that compose the landscape, by comparing three methods of digital images classification, using techniques of geoprocessing and remote sensing. The study area was a sub-basin of the Iperó River, tributary of the Iperó-Mirim stream, Sarapuí River basin, in Araçoiaba da Serra, State of São Paulo, Brazil. This research has been developed on a Geographic Information System environment platform, using medium resolution images from Sentinel-2 Satellite. Three image classification algorithms: Maximum Likelihood Classification (MLC), Support Vector Machines (SVM) and Random Tree (RT) were applied to verify the separability of forest patches, forestry and other uses. The results were analyzed by means of a confusion matrix, accuracy and kappa index, thus showing that the three algorithms were able to successfully differentiate the targets, with the higher efficiency attributed to MLC and the lowest to RT. Overall, the three classifiers presented errors, but specifically for the forest patches, the highest accuracy was obtained from SVM.
基金Under the auspices of National Natural Science Foundation of China(No.4171101213,41561144013,41991232)National Key R&D Program of China(No.2016YFC0503401,2016YFA0600304)International Partnership Program of Chinese Academy of Sciences(No.121311KYSB20170004)。
文摘This study designed an approach to derive land-cover in the South Africa with insufficient ground samples, and made a case demonstration in Nzhelele and Levhuvu catchments, South Africa. The method was developed based on an integration of Landsat 8, Sentinel-1, and Shuttle Radar Topography Mission(SRTM) Digital Elevation Model(DEM), and the Google Earth Engine(GEE) platform. Random forest classifier with 300 trees is employed as land-cover classification model. In order to overcome the defect of insufficient ground data, the stratified sampling method was used to generate the training and validation samples from the existing land-cover product. Likewise, in order to recognize different land-cover categories, the percentile and monthly median composites were employed to expand input metrics of random forest classifier. Results showed that the overall accuracy of the land-cover of Nzhelele and Levhuvu catchments, South Africa in 2017–2018 reached to 76.43%. Three important results can be drawn from our research. 1) The participation of Sentinel-1 data can slightly improve overall accuracy of land-cover while its contribution on land-cover classification varied with land types. 2) Under-fitting problem was observed in the training of non-dominant land-cover categories using the random sampling, the stratified sampling method is recommended to make sure the classification accuracy of non-dominant classes. 3) When related reflectance bands participated in the training process, individual Normalized Difference Vegetation index(NDVI), Enhanced Vegetation Index(EVI), Soil Adjusted Vegetation Index(SAVI), Normalized Difference Built-up Index(NDBI) have little effect on final land-cover classification result.
基金supported by Joint Fund of Natural Science Foundation of Zhejiang-Qingshanhu Science and Technology City(Grant No.LQY18C160002)National Natural Science Foundation of China(Grant No.U1809208)+1 种基金Zhejiang Science and Technology Key R&D Program Funded Project(Grant No.2018C02013)Natural Science Foundation of Zhejiang Province(Grant No.LQ20F020005).
文摘The diversity of tree species and the complexity of land use in cities create challenging issues for tree species classification.The combination of deep learning methods and RGB optical images obtained by unmanned aerial vehicles(UAVs) provides a new research direction for urban tree species classification.We proposed an RGB optical image dataset with 10 urban tree species,termed TCC10,which is a benchmark for tree canopy classification(TCC).TCC10 dataset contains two types of data:tree canopy images with simple backgrounds and those with complex backgrounds.The objective was to examine the possibility of using deep learning methods(AlexNet,VGG-16,and ResNet-50) for individual tree species classification.The results of convolutional neural networks(CNNs) were compared with those of K-nearest neighbor(KNN) and BP neural network.Our results demonstrated:(1) ResNet-50 achieved an overall accuracy(OA) of 92.6% and a kappa coefficient of 0.91 for tree species classification on TCC10 and outperformed AlexNet and VGG-16.(2) The classification accuracy of KNN and BP neural network was less than70%,while the accuracy of CNNs was relatively higher.(3)The classification accuracy of tree canopy images with complex backgrounds was lower than that for images with simple backgrounds.For the deciduous tree species in TCC10,the classification accuracy of ResNet-50 was higher in summer than that in autumn.Therefore,the deep learning is effective for urban tree species classification using RGB optical images.