To explore the potential of conventional image processing techniques in the classification of cervical cancer cells, in this work, a co-occurrence histogram method was employed for image feature extraction and an ense...To explore the potential of conventional image processing techniques in the classification of cervical cancer cells, in this work, a co-occurrence histogram method was employed for image feature extraction and an ensemble classifier was developed by combining the base classifiers, namely, the artificial neural network(ANN),random forest(RF), and support vector machine(SVM), for image classification. The segmented pap-smear cell image dataset was constructed by the k-means clustering technique and used to evaluate the performance of the ensemble classifier which was formed by the combination of above considered base classifiers. The result was also compared with that achieved by the individual base classifiers as well as that trained with color, texture, and shape features. The maximum average classification accuracy of 93.44% was obtained when the ensemble classifier was applied and trained with co-occurrence histogram features, which indicates that the ensemble classifier trained with co-occurrence histogram features is more suitable and advantageous for the classification of cervical cancer cells.展开更多
Textual data streams have been extensively used in practical applications where consumers of online products have expressed their views regarding online products.Due to changes in data distribution,commonly referred t...Textual data streams have been extensively used in practical applications where consumers of online products have expressed their views regarding online products.Due to changes in data distribution,commonly referred to as concept drift,mining this data stream is a challenging problem for researchers.The majority of the existing drift detection techniques are based on classification errors,which have higher probabilities of false-positive or missed detections.To improve classification accuracy,there is a need to develop more intuitive detection techniques that can identify a great number of drifts in the data streams.This paper presents an adaptive unsupervised learning technique,an ensemble classifier based on drift detection for opinion mining and sentiment classification.To improve classification performance,this approach uses four different dissimilarity measures to determine the degree of concept drifts in the data stream.Whenever a drift is detected,the proposed method builds and adds a new classifier to the ensemble.To add a new classifier,the total number of classifiers in the ensemble is first checked if the limit is exceeded before the classifier with the least weight is removed from the ensemble.To this end,a weighting mechanism is used to calculate the weight of each classifier,which decides the contribution of each classifier in the final classification results.Several experiments were conducted on real-world datasets and the resultswere evaluated on the false positive rate,miss detection rate,and accuracy measures.The proposed method is also compared with the state-of-the-art methods,which include DDM,EDDM,and PageHinkley with support vector machine(SVM)and Naive Bayes classifiers that are frequently used in concept drift detection studies.In all cases,the results show the efficiency of our proposed method.展开更多
The main challenges of data streams classification include infinite length, concept-drifting, arrival of novel classes and lack of labeled instances. Most existing techniques address only some of them and ignore other...The main challenges of data streams classification include infinite length, concept-drifting, arrival of novel classes and lack of labeled instances. Most existing techniques address only some of them and ignore others. So an ensemble classification model based on decision-feedback(ECM-BDF) is presented in this paper to address all these challenges. Firstly, a data stream is divided into sequential chunks and a classification model is trained from each labeled data chunk. To address the infinite length and concept-drifting problem, a fixed number of such models constitute an ensemble model E and subsequent labeled chunks are used to update E. To deal with the appearance of novel classes and limited labeled instances problem, the model incorporates a novel class detection mechanism to detect the arrival of a novel class without training E with labeled instances of that class. Meanwhile, unsupervised models are trained from unlabeled instances to provide useful constraints for E. An extended ensemble model Ex can be acquired with the constraints as feedback information, and then unlabeled instances can be classified more accurately by satisfying the maximum consensus of Ex. Experimental results demonstrate that the proposed ECM-BDF outperforms traditional techniques in classifying data streams with limited labeled data.展开更多
The Extreme Learning Machine(ELM) is an effective learning algorithm for a Single-Layer Feedforward Network(SLFN). It performs well in managing some problems due to its fast learning speed. However, in practical a...The Extreme Learning Machine(ELM) is an effective learning algorithm for a Single-Layer Feedforward Network(SLFN). It performs well in managing some problems due to its fast learning speed. However, in practical applications, its performance might be affected by the noise in the training data. To tackle the noise issue, we propose a novel heterogeneous ensemble of ELMs in this article. Specifically, the correntropy is used to achieve insensitive performance to outliers, while implementing Negative Correlation Learning(NCL) to enhance diversity among the ensemble. The proposed Heterogeneous Ensemble of ELMs(HE2 LM) for classification has different ELM algorithms including the Regularized ELM(RELM), the Kernel ELM(KELM), and the L2-norm-optimized ELM(ELML2). The ensemble is constructed by training a randomly selected ELM classifier on a subset of the training data selected through random resampling. Then, the class label of unseen data is predicted using a maximum weighted sum approach. After splitting the training data into subsets, the proposed HE2 LM is tested through classification and regression tasks on real-world benchmark datasets and synthetic datasets. Hence, the simulation results show that compared with other algorithms, our proposed method can achieve higher prediction accuracy, better generalization, and less sensitivity to outliers.展开更多
This review discusses the application scenarios of the machine learning-supported performance prediction and the optimization effi-ciency of tunnel boring machines(TBMs).The rock mass quality ratings,which are based o...This review discusses the application scenarios of the machine learning-supported performance prediction and the optimization effi-ciency of tunnel boring machines(TBMs).The rock mass quality ratings,which are based on the Chinese code for geological survey,were used to provide"labels"suitable for supervised learning.As a result,the generation of machine prediction for rock mass grades reason-ably agreed with the ground truth documented in geological maps.In contrast,the main operational parameters,i.e.,thrust and torque,can be reasonably predicted based on historical data.Consequently,18 collapse sections of the Yinsong project have been successfully predicted by several researchers.Preliminary studies on the selection of the optimal penetration rate and cost were conducted.This review also presents a summary of the main achievements in response to the initiatives of the Lotus Pool Contest in China.For the first time,large and well-documented TBM performance data has been shared for joint scientific research.Moreover,the review discusses the technical problems that require further study and the perspectives in the future development of intelligent TBM construction based on big data and machine learning.展开更多
Previous studies have shown that amnestic mild cognitive impairment(aMCI)involves in the morphological abnormalities of multiple regions,including cortical thickness,sulcus depth,surface area,gray matter volume,jacobi...Previous studies have shown that amnestic mild cognitive impairment(aMCI)involves in the morphological abnormalities of multiple regions,including cortical thickness,sulcus depth,surface area,gray matter volume,jacobian metric and average curvature.All the measures have unique neuropathological and genetic meanings.However,most existing methods simply average or concatenate these measures when constructing the classifiers,which may include redundant information and ignore the relationships among them.In this study,we treat each measure as a task in our multitask learning framework.Considering the actual situation that we do not know the correlation between tasks in advance,we use a robust multitask feature learning(rMTFL)method to select a group of features among correlated measures and provide additional information by identifying outlier tasks at the same time.Then,we train several SVM classifiers and for each measure,we input the selected features into the corresponding SVM classifier.Finally,we use an ensemble classification strategy to combine the results of these classifiers based on the accuracy to make the final prediction.We use the leave-one-out cross-validation to evaluate our proposed method with 46 amnestic mild cognitive impairment(aMCI)and 52 normal controls(NC).The results show that rMTFL algorithm is superior to the group lasso method and average curvature is the outlier task based on multidimensional surface measures.展开更多
文摘To explore the potential of conventional image processing techniques in the classification of cervical cancer cells, in this work, a co-occurrence histogram method was employed for image feature extraction and an ensemble classifier was developed by combining the base classifiers, namely, the artificial neural network(ANN),random forest(RF), and support vector machine(SVM), for image classification. The segmented pap-smear cell image dataset was constructed by the k-means clustering technique and used to evaluate the performance of the ensemble classifier which was formed by the combination of above considered base classifiers. The result was also compared with that achieved by the individual base classifiers as well as that trained with color, texture, and shape features. The maximum average classification accuracy of 93.44% was obtained when the ensemble classifier was applied and trained with co-occurrence histogram features, which indicates that the ensemble classifier trained with co-occurrence histogram features is more suitable and advantageous for the classification of cervical cancer cells.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Large Groups(Project under Grant Number(RGP.2/49/43)).
文摘Textual data streams have been extensively used in practical applications where consumers of online products have expressed their views regarding online products.Due to changes in data distribution,commonly referred to as concept drift,mining this data stream is a challenging problem for researchers.The majority of the existing drift detection techniques are based on classification errors,which have higher probabilities of false-positive or missed detections.To improve classification accuracy,there is a need to develop more intuitive detection techniques that can identify a great number of drifts in the data streams.This paper presents an adaptive unsupervised learning technique,an ensemble classifier based on drift detection for opinion mining and sentiment classification.To improve classification performance,this approach uses four different dissimilarity measures to determine the degree of concept drifts in the data stream.Whenever a drift is detected,the proposed method builds and adds a new classifier to the ensemble.To add a new classifier,the total number of classifiers in the ensemble is first checked if the limit is exceeded before the classifier with the least weight is removed from the ensemble.To this end,a weighting mechanism is used to calculate the weight of each classifier,which decides the contribution of each classifier in the final classification results.Several experiments were conducted on real-world datasets and the resultswere evaluated on the false positive rate,miss detection rate,and accuracy measures.The proposed method is also compared with the state-of-the-art methods,which include DDM,EDDM,and PageHinkley with support vector machine(SVM)and Naive Bayes classifiers that are frequently used in concept drift detection studies.In all cases,the results show the efficiency of our proposed method.
基金supported by the National Natural Science Foundation of China(61202082)the Fundamental Research Funds for the Central Universities(BUPT2012RC0218,BUPT2012RC0219)
文摘The main challenges of data streams classification include infinite length, concept-drifting, arrival of novel classes and lack of labeled instances. Most existing techniques address only some of them and ignore others. So an ensemble classification model based on decision-feedback(ECM-BDF) is presented in this paper to address all these challenges. Firstly, a data stream is divided into sequential chunks and a classification model is trained from each labeled data chunk. To address the infinite length and concept-drifting problem, a fixed number of such models constitute an ensemble model E and subsequent labeled chunks are used to update E. To deal with the appearance of novel classes and limited labeled instances problem, the model incorporates a novel class detection mechanism to detect the arrival of a novel class without training E with labeled instances of that class. Meanwhile, unsupervised models are trained from unlabeled instances to provide useful constraints for E. An extended ensemble model Ex can be acquired with the constraints as feedback information, and then unlabeled instances can be classified more accurately by satisfying the maximum consensus of Ex. Experimental results demonstrate that the proposed ECM-BDF outperforms traditional techniques in classifying data streams with limited labeled data.
基金supported by the National Natural Science Foundation of China(Nos.61174103 and61603032)the National Key Technologies R&D Program of China(No.2015BAK38B01)+2 种基金the National Key Research and Development Program of China(No.2017YFB0702300)the China Postdoctoral Science Foundation(No.2016M590048)the University of Science and Technology Beijing–Taipei University of Technology Joint Research Program(TW201705)
文摘The Extreme Learning Machine(ELM) is an effective learning algorithm for a Single-Layer Feedforward Network(SLFN). It performs well in managing some problems due to its fast learning speed. However, in practical applications, its performance might be affected by the noise in the training data. To tackle the noise issue, we propose a novel heterogeneous ensemble of ELMs in this article. Specifically, the correntropy is used to achieve insensitive performance to outliers, while implementing Negative Correlation Learning(NCL) to enhance diversity among the ensemble. The proposed Heterogeneous Ensemble of ELMs(HE2 LM) for classification has different ELM algorithms including the Regularized ELM(RELM), the Kernel ELM(KELM), and the L2-norm-optimized ELM(ELML2). The ensemble is constructed by training a randomly selected ELM classifier on a subset of the training data selected through random resampling. Then, the class label of unseen data is predicted using a maximum weighted sum approach. After splitting the training data into subsets, the proposed HE2 LM is tested through classification and regression tasks on real-world benchmark datasets and synthetic datasets. Hence, the simulation results show that compared with other algorithms, our proposed method can achieve higher prediction accuracy, better generalization, and less sensitivity to outliers.
基金supported by grants from the National Key R&D Program of China(Grant No.2018YFB1702504)the National Natural Science Foundation of China(Grant Nos.52179121,51879284)+3 种基金the State Key Laboratory of Simulations and Regulation of Water Cycle in River Basin,China(Grant No.SKL2022ZD05)the IWHR Research&Development Support Program,China(Grant No.GE0145B012021)the Natural Science Foundation of Shaanxi Province,China(Grant No.2021JLM-50)the National Key R&D Program of China(Grant No.2022YFE0200400).
文摘This review discusses the application scenarios of the machine learning-supported performance prediction and the optimization effi-ciency of tunnel boring machines(TBMs).The rock mass quality ratings,which are based on the Chinese code for geological survey,were used to provide"labels"suitable for supervised learning.As a result,the generation of machine prediction for rock mass grades reason-ably agreed with the ground truth documented in geological maps.In contrast,the main operational parameters,i.e.,thrust and torque,can be reasonably predicted based on historical data.Consequently,18 collapse sections of the Yinsong project have been successfully predicted by several researchers.Preliminary studies on the selection of the optimal penetration rate and cost were conducted.This review also presents a summary of the main achievements in response to the initiatives of the Lotus Pool Contest in China.For the first time,large and well-documented TBM performance data has been shared for joint scientific research.Moreover,the review discusses the technical problems that require further study and the perspectives in the future development of intelligent TBM construction based on big data and machine learning.
基金supported by the National Key Research and Development Program of China(2016YFC1306300)the National Natural Science Foundation of China(Grant No.61633018,81622025 and 81471731)Beijing Municipal Commission of Health and Family Planning(PXM2019_026283_000002)。
文摘Previous studies have shown that amnestic mild cognitive impairment(aMCI)involves in the morphological abnormalities of multiple regions,including cortical thickness,sulcus depth,surface area,gray matter volume,jacobian metric and average curvature.All the measures have unique neuropathological and genetic meanings.However,most existing methods simply average or concatenate these measures when constructing the classifiers,which may include redundant information and ignore the relationships among them.In this study,we treat each measure as a task in our multitask learning framework.Considering the actual situation that we do not know the correlation between tasks in advance,we use a robust multitask feature learning(rMTFL)method to select a group of features among correlated measures and provide additional information by identifying outlier tasks at the same time.Then,we train several SVM classifiers and for each measure,we input the selected features into the corresponding SVM classifier.Finally,we use an ensemble classification strategy to combine the results of these classifiers based on the accuracy to make the final prediction.We use the leave-one-out cross-validation to evaluate our proposed method with 46 amnestic mild cognitive impairment(aMCI)and 52 normal controls(NC).The results show that rMTFL algorithm is superior to the group lasso method and average curvature is the outlier task based on multidimensional surface measures.