The Repository Mahasiswa(RAMA)is a national repository of research reports in the form of final assignments,student projects,theses,dissertations,and research reports of lecturers or researchers that have not yet been...The Repository Mahasiswa(RAMA)is a national repository of research reports in the form of final assignments,student projects,theses,dissertations,and research reports of lecturers or researchers that have not yet been published in journals,conferences,or integrated books from the scientific repository of universities and research institutes in Indonesia.The increasing popularity of the RAMA Repository leads to security issues,including the two most widespread,vulnerable attacks i.e.,Structured Query Language(SQL)injection and cross-site scripting(XSS)attacks.An attacker gaining access to data and performing unauthorized data modifications is extremely dangerous.This paper aims to provide an attack detection system for securing the repository portal from the abovementioned attacks.The proposed system combines a Long Short–Term Memory and Principal Component Analysis(LSTM-PCA)model as a classifier.This model can effectively solve the vanishing gradient problem caused by excessive positive samples.The experiment results show that the proposed system achieves an accuracy of 96.85%using an 80%:20%ratio of training data and testing data.The rationale for this best achievement is that the LSTM’s Forget Gate works very well as the PCA supplies only selected features that are significantly relevant to the attacks’patterns.The Forget Gate in LSTM is responsible for deciding which information should be kept for computing the cell state and which one is not relevant and can be discarded.In addition,the LSTM’s Input Gate assists in finding out crucial information and stores specific relevant data in the memory.展开更多
Multiple classifier system exhibits strong classification capacity compared with single classifiers,but they require significant computational resources.Selective ensemble system aims to attain equivalent or better cl...Multiple classifier system exhibits strong classification capacity compared with single classifiers,but they require significant computational resources.Selective ensemble system aims to attain equivalent or better classification accuracy with fewer classifiers.However,current methods fail to identify precise solutions for constructing an ensemble classifier.In this study,we propose an ensemble classifier design technique based on the perturbation binary salp swarm algorithm(ECDPB).Considering that extreme learning machines(ELMs)have rapid learning rates and good generalization ability,they can serve as the basic classifier for creating multiple candidates while using fewer computational resources.Meanwhile,we introduce a combined diversity measure by taking the complementarity and accuracy of ELMs into account;it is used to identify the ELMs that have good diversity and low error.In addition,we propose an ECDPB with powerful optimizing ability;it is employed to find the optimal subset of ELMs.The selected ELMs can then be used to forman ensemble classifier.Experiments on 10 benchmark datasets have been conducted,and the results demonstrate that the proposed ECDPB delivers superior classification capacity when compared with alternative methods.展开更多
Recently, it has been seen that the ensemble classifier is an effective way to enhance the prediction performance. However, it usually suffers from the problem of how to construct an appropriate classifier based on a ...Recently, it has been seen that the ensemble classifier is an effective way to enhance the prediction performance. However, it usually suffers from the problem of how to construct an appropriate classifier based on a set of complex data, for example,the data with many dimensions or hierarchical attributes. This study proposes a method to constructe an ensemble classifier based on the key attributes. In addition to its high-performance on precision shared by common ensemble classifiers, the calculation results are highly intelligible and thus easy for understanding.Furthermore, the experimental results based on the real data collected from China Mobile show that the keyattributes-based ensemble classifier has the good performance on both of the classifier construction and the customer churn prediction.展开更多
For many real-world multiobjective optimization problems,the evaluations of the objective functions are computationally expensive.Such problems are usually called expensive multiobjective optimization problems(EMOPs)....For many real-world multiobjective optimization problems,the evaluations of the objective functions are computationally expensive.Such problems are usually called expensive multiobjective optimization problems(EMOPs).One type of feasible approaches for EMOPs is to introduce the computationally efficient surrogates for reducing the number of function evaluations.Inspired from ensemble learning,this paper proposes a multiobjective evolutionary algorithm with an ensemble classifier(MOEA-EC)for EMOPs.More specifically,multiple decision tree models are used as an ensemble classifier for the pre-selection,which is be more helpful for further reducing the function evaluations of the solutions than using single inaccurate model.The extensive experimental studies have been conducted to verify the efficiency of MOEA-EC by comparing it with several advanced multiobjective expensive optimization algorithms.The experimental results show that MOEA-EC outperforms the compared algorithms.展开更多
A robust smile recognition system could be widely used for many real-world applications.Classification of a facial smile in an unconstrained setting is difficult due to the invertible and wide variety in face images.I...A robust smile recognition system could be widely used for many real-world applications.Classification of a facial smile in an unconstrained setting is difficult due to the invertible and wide variety in face images.In this paper,an adaptive model for smile expression classification is suggested that integrates a fast features extraction algorithm and cascade classifiers.Our model takes advantage of the intrinsic association between face detection,smile,and other face features to alleviate the over-fitting issue on the limited training set and increase classification results.The features are extracted taking into account to exclude any unnecessary coefficients in the feature vector;thereby enhancing the discriminatory capacity of the extracted features and reducing the computational process.Still,the main causes of error in learning are due to noise,bias,and variance.Ensemble helps to minimize these factors.Combinations of multiple classifiers decrease variance,especially in the case of unstable classifiers,and may produce a more reliable classification than a single classifier.However,a shortcoming of bagging as the best ensemble classifier is its random selection,where the classification performance relies on the chance to pick an appropriate subset of training items.The suggested model employs a modified form of bagging while creating training sets to deal with this challenge(error-based bootstrapping).The experimental results for smile classification on the JAFFE,CK+,and CK+48 benchmark datasets show the feasibility of our proposed model.展开更多
The departure of good employee incurs direct and indirect cost and impacts for an organization.The direct cost arises from hiring to training of the relevant employee.The replacement time and lost productivity affect ...The departure of good employee incurs direct and indirect cost and impacts for an organization.The direct cost arises from hiring to training of the relevant employee.The replacement time and lost productivity affect the running of business processes.This work presents the use of ensemble classifier to identify important attributes that affects attrition significantly.The data consists of attributes related to job function,education level,satisfaction towards work and working relationship,compensation,and frequency of business travel.Both bagging and boosting classifiers were used for testing.The results show that the selected features(nine selected features)achieve the same result as the full features.The selected features are age,income,working years,source of employment,years since last promotion,salary hike,and business travelling frequency.These features were selected using ensemble classifiers.Satisfaction on work and relationship do not appear to be significant attributes in attrition from ensemble classifier’s results.展开更多
Abstract-Common spatial pattern (CSP) algorithm is a successful tool in feature estimate of brain-computer interface (BCI). However, CSP is sensitive to outlier and may result in poor outcomes since it is based on...Abstract-Common spatial pattern (CSP) algorithm is a successful tool in feature estimate of brain-computer interface (BCI). However, CSP is sensitive to outlier and may result in poor outcomes since it is based on pooling the covariance matrices of trials. In this paper, we propose a simple yet effective approach, named common spatial pattern ensemble (CSPE) classifier, to improve CSP performance. Through division of recording channels, multiple CSP filters are constructed. By projection, log-operation, and subtraction on the original signal, an ensemble classifier, majority voting, is achieved and outlier contaminations are alleviated. Experiment results demonstrate that the proposed CSPE classifier is robust to various artifacts and can achieve an average accuracy of 83.02%.展开更多
In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)...In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)feature extraction technique.First,dimensionality of the original imbalanced data is reduced using MDS so that distances between any two different samples are preserved as well as possible.Second,a novel OPCE algorithm is applied to classify imbalanced samples by placing optimised observation points in a low-dimensional data space.Third,optimization of the observation point mappings is carried out to obtain a reliable assessment of the unknown samples.Exhaustive experiments have been conducted to evaluate the feasibility,rationality,and effectiveness of the proposed OPCE algorithm using seven benchmark HDIC data sets.Experimental results show that(1)the OPCE algorithm can be trained faster on low-dimensional imbalanced data than on high-dimensional data;(2)the OPCE algorithm can correctly identify samples as the number of optimised observation points is increased;and(3)statistical analysis reveals that OPCE yields better HDIC performances on the selected data sets in comparison with eight other HDIC algorithms.This demonstrates that OPCE is a viable algorithm to deal with HDIC problems.展开更多
In today’s digital world,millions of individuals are linked to one another via the Internet and social media.This opens up new avenues for information exchange with others.Sentiment analysis(SA)has gotten a lot of at...In today’s digital world,millions of individuals are linked to one another via the Internet and social media.This opens up new avenues for information exchange with others.Sentiment analysis(SA)has gotten a lot of attention during the last decade.We analyse the challenges of Sentiment Analysis(SA)in one of the Asian regional languages known as Marathi in this study by providing a benchmark setup in which wefirst produced an annotated dataset composed of Marathi text acquired from microblogging websites such as Twitter.We also choose domain experts to manually annotate Marathi microblogging posts with positive,negative,and neutral polarity.In addition,to show the efficient use of the annotated dataset,an ensemble-based model for sentiment analysis was created.In contrast to others machine learning classifier,we achieved better performance in terms of accuracy for ensemble classifier with 10-fold cross-validation(cv),outcomes as 97.77%,f-score is 97.89%.展开更多
Objective:The annual influenza epidemic is a heavy burden on the health care system,and has increasingly become a major public health problem in some areas,such as Hong Kong(China).Therefore,based on a variety of mach...Objective:The annual influenza epidemic is a heavy burden on the health care system,and has increasingly become a major public health problem in some areas,such as Hong Kong(China).Therefore,based on a variety of machine learning methods,and considering the seasonal influenza in Hong Kong,the study aims to establish a Combinatorial Judgment Classifier(CJC)model to classify the epidemic trend and improve the accuracy of influenza epidemic early warning.展开更多
The use of machine learning to predict student employability is important in order to analyse a student’s capability to get a job.Based on the results of this type of analysis,university managers can improve the empl...The use of machine learning to predict student employability is important in order to analyse a student’s capability to get a job.Based on the results of this type of analysis,university managers can improve the employability of their students,which can help in attracting students in the future.In addition,learners can focus on the essential skills identified through this analysis during their studies,to increase their employability.An effectivemethod calledOPT-BAG(OPTimisation of BAGging classifiers)was therefore developed to model the problem of predicting the employability of students.This model can help predict the employability of students based on their competencies and can reveal weaknesses that need to be improved.First,we analyse the relationships between several variables and the outcome variable using a correlation heatmap for a student employability dataset.Next,a standard scaler function is applied in the preprocessing module to normalise the variables in the student employability dataset.The training set is then input to our model to identify the optimal parameters for the bagging classifier using a grid search cross-validation technique.Finally,the OPT-BAG model,based on a bagging classifier with optimal parameters found in the previous step,is trained on the training dataset to predict student employability.The empirical outcomes in terms of accuracy,precision,recall,and F1 indicate that the OPT-BAG approach outperforms other cutting-edge machine learning models in terms of predicting student employability.In this study,we also analyse the factors affecting the recruitment process of employers,and find that general appearance,mental alertness,and communication skills are the most important.This indicates that educational institutions should focus on these factors during the learning process to improve student employability.展开更多
Liver cancer is the second leading cause of cancer death worldwide.Early tumor detection may help identify suitable treatment and increase the survival rate.Medical imaging is a non-invasive tool that can help uncover...Liver cancer is the second leading cause of cancer death worldwide.Early tumor detection may help identify suitable treatment and increase the survival rate.Medical imaging is a non-invasive tool that can help uncover abnormalities in human organs.Magnetic Resonance Imaging(MRI),in particular,uses magnetic fields and radio waves to differentiate internal human organs tissue.However,the interpretation of medical images requires the subjective expertise of a radiologist and oncologist.Thus,building an automated diagnosis computer-based system can help specialists reduce incorrect diagnoses.This paper proposes a hybrid automated system to compare the performance of 3D features and 2D features in classifying magnetic resonance liver tumor images.This paper proposed two models;the first one employed the 3D features while the second exploited the 2D features.The first system uses 3D texture attributes,3D shape features,and 3D graphical deep descriptors beside an ensemble classifier to differentiate between four 3D tumor categories.On top of that,the proposed method is applied to 2D slices for comparison purposes.The proposed approach attained 100%accuracy in discriminating between all types of tumors,100%Area Under the Curve(AUC),100%sensitivity,and 100%specificity and precision as well in 3D liver tumors.On the other hand,the performance is lower in 2D classification.The maximum accuracy reached 96.4%for two classes and 92.1%for four classes.The top-class performance of the proposed system can be attributed to the exploitation of various types of feature selection methods besides utilizing the ReliefF features selection technique to choose the most relevant features associated with different classes.The novelty of this work appeared in building a highly accurate system under specific circumstances without any processing for the images and human input,besides comparing the performance between 2D and 3D classification.In the future,the presented work can be extended to be used in the huge dataset.Then,it can be a reliable,efficient Computer Aided Diagnosis(CAD)system employed in hospitals in rural areas.展开更多
Redundancy,correlation,feature irrelevance,and missing samples are just a few problems that make it difficult to analyze software defect data.Additionally,it might be challenging to maintain an even distribution of da...Redundancy,correlation,feature irrelevance,and missing samples are just a few problems that make it difficult to analyze software defect data.Additionally,it might be challenging to maintain an even distribution of data relating to both defective and non-defective software.The latter software class’s data are predominately present in the dataset in the majority of experimental situations.The objective of this review study is to demonstrate the effectiveness of combining ensemble learning and feature selection in improving the performance of defect classification.Besides the successful feature selection approach,a novel variant of the ensemble learning technique is analyzed to address the challenges of feature redundancy and data imbalance,providing robustness in the classification process.To overcome these problems and lessen their impact on the fault classification performance,authors carefully integrate effective feature selection with ensemble learning models.Forward selection demonstrates that a significant area under the receiver operating curve(ROC)can be attributed to only a small subset of features.The Greedy forward selection(GFS)technique outperformed Pearson’s correlation method when evaluating feature selection techniques on the datasets.Ensemble learners,such as random forests(RF)and the proposed average probability ensemble(APE),demonstrate greater resistance to the impact of weak features when compared to weighted support vector machines(W-SVMs)and extreme learning machines(ELM).Furthermore,in the case of the NASA and Java datasets,the enhanced average probability ensemble model,which incorporates the Greedy forward selection technique with the average probability ensemble model,achieved remarkably high accuracy for the area under the ROC.It approached a value of 1.0,indicating exceptional performance.This review emphasizes the importance of meticulously selecting attributes in a software dataset to accurately classify damaged components.In addition,the suggested ensemble learning model successfully addressed the aforementioned problems with software data and produced outstanding classification performance.展开更多
The Hand Gestures Recognition(HGR)System can be employed to facilitate communication between humans and computers instead of using special input and output devices.These devices may complicate communication with compu...The Hand Gestures Recognition(HGR)System can be employed to facilitate communication between humans and computers instead of using special input and output devices.These devices may complicate communication with computers especially for people with disabilities.Hand gestures can be defined as a natural human-to-human communication method,which also can be used in human-computer interaction.Many researchers developed various techniques and methods that aimed to understand and recognize specific hand gestures by employing one or two machine learning algorithms with a reasonable accuracy.Thiswork aims to develop a powerful hand gesture recognition model with a 100%recognition rate.We proposed an ensemble classification model that combines the most powerful machine learning classifiers to obtain diversity and improve accuracy.The majority voting method was used to aggregate accuracies produced by each classifier and get the final classification result.Our model was trained using a self-constructed dataset containing 1600 images of ten different hand gestures.The employing of canny’s edge detector and histogram of oriented gradient method was a great combination with the ensemble classifier and the recognition rate.The experimental results had shown the robustness of our proposed model.Logistic Regression and Support Vector Machine have achieved 100%accuracy.The developed model was validated using two public datasets,and the findings have proved that our model outperformed other compared studies.展开更多
Accurately predicting and estimating the squeezing and ground response to tunneling remains challenging.Moreover,tunnel-squeezing hazards are much more likely to occur in deeply buried long tunnels with complex engine...Accurately predicting and estimating the squeezing and ground response to tunneling remains challenging.Moreover,tunnel-squeezing hazards are much more likely to occur in deeply buried long tunnels with complex engineering-geological environments.There-fore,a high-performance predictive model for tunnel squeezing is necessary.A superior ensemble classifier is put forward in this study,which is composed of four individual classifiers(gradient boosting classifier,extra-trees classifier,AdaBoost classifier,and Logistic regression classifier)and two optimization algorithms(Bayesian optimization(BO)and sparrow search algorithm(SSA)).The training database covers five parameters:tunnel depth(H),rock tunneling quality index(Q),tunnel diameter(D),support stiffness(K),and strength stress ratio(SSR),about which the basic information is accessible at the early design phases.However,the dataset compiled from the literature is insufficient.Thus,the ten proposed methods are used to replace the missing values.During the model training pro-cess,BO shows its strong ability to optimize seventeen hyperparameters.When applied to tune the classifiers’weights,SSA achieves a fast and efficient performance.The novel Shapley Additive Explanations–LightGBM method indicates that the K is the most important input feature,followed by SSR,Q,H,and D,respectively.The ensemble classifier is then validated using the test set and additional his-torical case projects.The validation shows that the model can achieve an accuracy of 98%(i.e.,the error rate is 2%)on the test set,higher than those achieved by previous prediction models.Moreover,the predicted probability could provide warning information for timely support measures.Finally,the application results are illustrated through tests on the tunnel sections that have not yet been excavated in the line of the Sichuan–Tibet railway project.The applied predictive tendencies and laws are in line with the practical experience.In sum-mary,the proposed model’s prediction results are reasonable,and its prediction will be more accurate as more data is collected and trained for prewarning the tunnel squeezing hazard.展开更多
Membrane proteins are embedded in the lipid bilayer,which creates a suitable environment for their actions. It is important to decide which tpye it belongs to because it is closely relevant to its biological function ...Membrane proteins are embedded in the lipid bilayer,which creates a suitable environment for their actions. It is important to decide which tpye it belongs to because it is closely relevant to its biological function and its interaction process with other molecules in a biological system. Membrane proteins have different types. The function of a membrane protein is closely correlated with the type it belongs to. In this study,on the basis of the concept of pseudo amino acid (PseAA) composition originally introduced by Chou,the value of approximate entropy (ApEn) of the query membrane protein was used to integrate the complementary information. By fusing fifteen powerful individual fuzzy K-nearest neighbor ( FKNN) classifiers,an ensemble classifier was presented. Each basic classifier was trained in PseAA composition of membrane protein sequences with different parameters. The results of experiments demonstrate it is efficient for the structural prediction of membrane proteins.展开更多
Leukemia is blood cancer, including bone marrow and lymphatictissues, typically involving white blood cells. Leukemia produces an abnormalamount of white blood cells compared to normal blood. Deoxyribonucleicacid (DNA...Leukemia is blood cancer, including bone marrow and lymphatictissues, typically involving white blood cells. Leukemia produces an abnormalamount of white blood cells compared to normal blood. Deoxyribonucleicacid (DNA) microarrays provide reliable medical diagnostic services to helpmore patients find the proposed treatment for infections. DNA microarraysare also known as biochips that consist of microscopic DNA spots attachedto a solid glass surface. Currently, it is difficult to classify cancers usingmicroarray data. Nearly many data mining techniques have failed becauseof the small sample size, which has become more critical for organizations.However, they are not highly effective in improving results and are frequently employed by doctors for cancer diagnosis. This study proposes a novelmethod using machine learning algorithms based on microarrays of leukemiaGSE9476 cells. The main aim was to predict the initial leukemia disease.Machine learning algorithms such as decision tree (DT), naive bayes (NB),random forest (RF), gradient boosting machine (GBM), linear regression(LinR), support vector machine (SVM), and novel approach based on thecombination of Logistic Regression (LR), DT and SVM named as ensembleLDSVM model. The k-fold cross-validation and grid search optimizationmethods were used with the LDSVM model to classify leukemia in patientsand comparatively analyze their impacts. The proposed approach evaluatedbetter accuracy, precision, recall, and f1 scores than the other algorithms.Furthermore, the results were relatively assessed, which showed LDSVMperformance. This study aims to successfully predict leukemia in patientsand enhance prediction accuracy in minimum time. Moreover, a Syntheticminority oversampling technique (SMOTE) and Principal compenent analysis(PCA) approaches were implemented. This makes the records generalized andevaluates the outcomes well. PCA reduces the feature count without losing anyinformation and deals with class imbalanced datasets, as well as faster modelexecution along with less computation cost. In this study, a novel processwas used to reduce the column results to develop a faster and more rapidexperiment execution.展开更多
One recent area of interest in computer science is data stream management and processing. By ‘data stream', we refer to continuous and rapidly generated packages of data. Specific features of data streams are imm...One recent area of interest in computer science is data stream management and processing. By ‘data stream', we refer to continuous and rapidly generated packages of data. Specific features of data streams are immense volume, high production rate, limited data processing time, and data concept drift; these features differentiate the data stream from standard types of data. An issue for the data stream is classification of input data. A novel ensemble classifier is proposed in this paper. The classifier uses base classifiers of two weighting functions under different data input conditions. In addition, a new method is used to determine drift, which emphasizes the precision of the algorithm. Another characteristic of the proposed method is removal of different numbers of the base classifiers based on their quality. Implementation of a weighting mechanism to the base classifiers at the decision-making stage is another advantage of the algorithm. This facilitates adaptability when drifts take place, which leads to classifiers with higher efficiency. Furthermore, the proposed method is tested on a set of standard data and the results confirm higher accuracy compared to available ensemble classifiers and single classifiers. In addition, in some cases the proposed classifier is faster and needs less storage space.展开更多
Recognition and counting of greenhouse pests are important for monitoring and forecasting pest population dynamics.This study used image processing techniques to recognize and count whiteflies and thrips on a sticky t...Recognition and counting of greenhouse pests are important for monitoring and forecasting pest population dynamics.This study used image processing techniques to recognize and count whiteflies and thrips on a sticky trap located in a greenhouse environment.The digital images of sticky traps were collected using an image-acquisition system under different greenhouse conditions.If a single color space is used,it is difficult to segment the small pests correctly because of the detrimental effects of non-uniform illumination in complex scenarios.Therefore,a method that first segments object pests in two color spaces using the Prewitt operator in I component of the hue-saturation-intensity(HSI)color space and the Canny operator in the B component of the Lab color space was proposed.Then,the segmented results for the two-color spaces were summed and achieved 91.57%segmentation accuracy.Next,because different features of pests contribute differently to the classification of pest species,the study extracted multiple features(e.g.,color and shape features)in different color spaces for each segmented pest region to improve the recognition performance.Twenty decision trees were used to form a strong ensemble learning classifier that used a majority voting mechanism and obtains 95.73%recognition accuracy.The proposed method is a feasible and effective way to process greenhouse pest images.The system accurately recognized and counted pests in sticky trap images captured under real greenhouse conditions.展开更多
文摘The Repository Mahasiswa(RAMA)is a national repository of research reports in the form of final assignments,student projects,theses,dissertations,and research reports of lecturers or researchers that have not yet been published in journals,conferences,or integrated books from the scientific repository of universities and research institutes in Indonesia.The increasing popularity of the RAMA Repository leads to security issues,including the two most widespread,vulnerable attacks i.e.,Structured Query Language(SQL)injection and cross-site scripting(XSS)attacks.An attacker gaining access to data and performing unauthorized data modifications is extremely dangerous.This paper aims to provide an attack detection system for securing the repository portal from the abovementioned attacks.The proposed system combines a Long Short–Term Memory and Principal Component Analysis(LSTM-PCA)model as a classifier.This model can effectively solve the vanishing gradient problem caused by excessive positive samples.The experiment results show that the proposed system achieves an accuracy of 96.85%using an 80%:20%ratio of training data and testing data.The rationale for this best achievement is that the LSTM’s Forget Gate works very well as the PCA supplies only selected features that are significantly relevant to the attacks’patterns.The Forget Gate in LSTM is responsible for deciding which information should be kept for computing the cell state and which one is not relevant and can be discarded.In addition,the LSTM’s Input Gate assists in finding out crucial information and stores specific relevant data in the memory.
基金supported in part by the Anhui Provincial Natural Science Founda-tion[1908085QG298,1908085MG232]the National Nature Science Foundation of China[91546108,61806068]+5 种基金the National Social Science Foundation of China[21BTJ002]the Anhui Provincial Science:and Technology Major Projects Grant[201903a05020020]the Fundamental Research Funds for the Central Universities[Z2019HGTA0053,JZ2019HG BZ0128]the Humanities and Social Science Fund of Ministry of Education of China[20YJA790021]the Major Project of Philosophy and Social Science Planning of Zhejiang Province[22YJRC07ZD]the Open Research Fund Program of Key Laboratory of Process Optimization and Intelligent Decision-Making(Hefei University of Technology),Ministry of Education.
文摘Multiple classifier system exhibits strong classification capacity compared with single classifiers,but they require significant computational resources.Selective ensemble system aims to attain equivalent or better classification accuracy with fewer classifiers.However,current methods fail to identify precise solutions for constructing an ensemble classifier.In this study,we propose an ensemble classifier design technique based on the perturbation binary salp swarm algorithm(ECDPB).Considering that extreme learning machines(ELMs)have rapid learning rates and good generalization ability,they can serve as the basic classifier for creating multiple candidates while using fewer computational resources.Meanwhile,we introduce a combined diversity measure by taking the complementarity and accuracy of ELMs into account;it is used to identify the ELMs that have good diversity and low error.In addition,we propose an ECDPB with powerful optimizing ability;it is employed to find the optimal subset of ELMs.The selected ELMs can then be used to forman ensemble classifier.Experiments on 10 benchmark datasets have been conducted,and the results demonstrate that the proposed ECDPB delivers superior classification capacity when compared with alternative methods.
基金supported by the National Natural Science Foundation of China under Grants No.71271044 and 71572029
文摘Recently, it has been seen that the ensemble classifier is an effective way to enhance the prediction performance. However, it usually suffers from the problem of how to construct an appropriate classifier based on a set of complex data, for example,the data with many dimensions or hierarchical attributes. This study proposes a method to constructe an ensemble classifier based on the key attributes. In addition to its high-performance on precision shared by common ensemble classifiers, the calculation results are highly intelligible and thus easy for understanding.Furthermore, the experimental results based on the real data collected from China Mobile show that the keyattributes-based ensemble classifier has the good performance on both of the classifier construction and the customer churn prediction.
文摘For many real-world multiobjective optimization problems,the evaluations of the objective functions are computationally expensive.Such problems are usually called expensive multiobjective optimization problems(EMOPs).One type of feasible approaches for EMOPs is to introduce the computationally efficient surrogates for reducing the number of function evaluations.Inspired from ensemble learning,this paper proposes a multiobjective evolutionary algorithm with an ensemble classifier(MOEA-EC)for EMOPs.More specifically,multiple decision tree models are used as an ensemble classifier for the pre-selection,which is be more helpful for further reducing the function evaluations of the solutions than using single inaccurate model.The extensive experimental studies have been conducted to verify the efficiency of MOEA-EC by comparing it with several advanced multiobjective expensive optimization algorithms.The experimental results show that MOEA-EC outperforms the compared algorithms.
文摘A robust smile recognition system could be widely used for many real-world applications.Classification of a facial smile in an unconstrained setting is difficult due to the invertible and wide variety in face images.In this paper,an adaptive model for smile expression classification is suggested that integrates a fast features extraction algorithm and cascade classifiers.Our model takes advantage of the intrinsic association between face detection,smile,and other face features to alleviate the over-fitting issue on the limited training set and increase classification results.The features are extracted taking into account to exclude any unnecessary coefficients in the feature vector;thereby enhancing the discriminatory capacity of the extracted features and reducing the computational process.Still,the main causes of error in learning are due to noise,bias,and variance.Ensemble helps to minimize these factors.Combinations of multiple classifiers decrease variance,especially in the case of unstable classifiers,and may produce a more reliable classification than a single classifier.However,a shortcoming of bagging as the best ensemble classifier is its random selection,where the classification performance relies on the chance to pick an appropriate subset of training items.The suggested model employs a modified form of bagging while creating training sets to deal with this challenge(error-based bootstrapping).The experimental results for smile classification on the JAFFE,CK+,and CK+48 benchmark datasets show the feasibility of our proposed model.
文摘The departure of good employee incurs direct and indirect cost and impacts for an organization.The direct cost arises from hiring to training of the relevant employee.The replacement time and lost productivity affect the running of business processes.This work presents the use of ensemble classifier to identify important attributes that affects attrition significantly.The data consists of attributes related to job function,education level,satisfaction towards work and working relationship,compensation,and frequency of business travel.Both bagging and boosting classifiers were used for testing.The results show that the selected features(nine selected features)achieve the same result as the full features.The selected features are age,income,working years,source of employment,years since last promotion,salary hike,and business travelling frequency.These features were selected using ensemble classifiers.Satisfaction on work and relationship do not appear to be significant attributes in attrition from ensemble classifier’s results.
基金supported by the National Natural Science Foundation of China under Grant No. 30525030, 60701015, and 60736029.
文摘Abstract-Common spatial pattern (CSP) algorithm is a successful tool in feature estimate of brain-computer interface (BCI). However, CSP is sensitive to outlier and may result in poor outcomes since it is based on pooling the covariance matrices of trials. In this paper, we propose a simple yet effective approach, named common spatial pattern ensemble (CSPE) classifier, to improve CSP performance. Through division of recording channels, multiple CSP filters are constructed. By projection, log-operation, and subtraction on the original signal, an ensemble classifier, majority voting, is achieved and outlier contaminations are alleviated. Experiment results demonstrate that the proposed CSPE classifier is robust to various artifacts and can achieve an average accuracy of 83.02%.
基金National Natural Science Foundation of China,Grant/Award Number:61972261Basic Research Foundations of Shenzhen,Grant/Award Numbers:JCYJ20210324093609026,JCYJ20200813091134001。
文摘In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)feature extraction technique.First,dimensionality of the original imbalanced data is reduced using MDS so that distances between any two different samples are preserved as well as possible.Second,a novel OPCE algorithm is applied to classify imbalanced samples by placing optimised observation points in a low-dimensional data space.Third,optimization of the observation point mappings is carried out to obtain a reliable assessment of the unknown samples.Exhaustive experiments have been conducted to evaluate the feasibility,rationality,and effectiveness of the proposed OPCE algorithm using seven benchmark HDIC data sets.Experimental results show that(1)the OPCE algorithm can be trained faster on low-dimensional imbalanced data than on high-dimensional data;(2)the OPCE algorithm can correctly identify samples as the number of optimised observation points is increased;and(3)statistical analysis reveals that OPCE yields better HDIC performances on the selected data sets in comparison with eight other HDIC algorithms.This demonstrates that OPCE is a viable algorithm to deal with HDIC problems.
基金This paper was supported by Wonkwang University in 2022.
文摘In today’s digital world,millions of individuals are linked to one another via the Internet and social media.This opens up new avenues for information exchange with others.Sentiment analysis(SA)has gotten a lot of attention during the last decade.We analyse the challenges of Sentiment Analysis(SA)in one of the Asian regional languages known as Marathi in this study by providing a benchmark setup in which wefirst produced an annotated dataset composed of Marathi text acquired from microblogging websites such as Twitter.We also choose domain experts to manually annotate Marathi microblogging posts with positive,negative,and neutral polarity.In addition,to show the efficient use of the annotated dataset,an ensemble-based model for sentiment analysis was created.In contrast to others machine learning classifier,we achieved better performance in terms of accuracy for ensemble classifier with 10-fold cross-validation(cv),outcomes as 97.77%,f-score is 97.89%.
基金This project was supported by grants from the Ministry of Education Humanities and Social Sciences Research Fund Project。
文摘Objective:The annual influenza epidemic is a heavy burden on the health care system,and has increasingly become a major public health problem in some areas,such as Hong Kong(China).Therefore,based on a variety of machine learning methods,and considering the seasonal influenza in Hong Kong,the study aims to establish a Combinatorial Judgment Classifier(CJC)model to classify the epidemic trend and improve the accuracy of influenza epidemic early warning.
文摘The use of machine learning to predict student employability is important in order to analyse a student’s capability to get a job.Based on the results of this type of analysis,university managers can improve the employability of their students,which can help in attracting students in the future.In addition,learners can focus on the essential skills identified through this analysis during their studies,to increase their employability.An effectivemethod calledOPT-BAG(OPTimisation of BAGging classifiers)was therefore developed to model the problem of predicting the employability of students.This model can help predict the employability of students based on their competencies and can reveal weaknesses that need to be improved.First,we analyse the relationships between several variables and the outcome variable using a correlation heatmap for a student employability dataset.Next,a standard scaler function is applied in the preprocessing module to normalise the variables in the student employability dataset.The training set is then input to our model to identify the optimal parameters for the bagging classifier using a grid search cross-validation technique.Finally,the OPT-BAG model,based on a bagging classifier with optimal parameters found in the previous step,is trained on the training dataset to predict student employability.The empirical outcomes in terms of accuracy,precision,recall,and F1 indicate that the OPT-BAG approach outperforms other cutting-edge machine learning models in terms of predicting student employability.In this study,we also analyse the factors affecting the recruitment process of employers,and find that general appearance,mental alertness,and communication skills are the most important.This indicates that educational institutions should focus on these factors during the learning process to improve student employability.
文摘Liver cancer is the second leading cause of cancer death worldwide.Early tumor detection may help identify suitable treatment and increase the survival rate.Medical imaging is a non-invasive tool that can help uncover abnormalities in human organs.Magnetic Resonance Imaging(MRI),in particular,uses magnetic fields and radio waves to differentiate internal human organs tissue.However,the interpretation of medical images requires the subjective expertise of a radiologist and oncologist.Thus,building an automated diagnosis computer-based system can help specialists reduce incorrect diagnoses.This paper proposes a hybrid automated system to compare the performance of 3D features and 2D features in classifying magnetic resonance liver tumor images.This paper proposed two models;the first one employed the 3D features while the second exploited the 2D features.The first system uses 3D texture attributes,3D shape features,and 3D graphical deep descriptors beside an ensemble classifier to differentiate between four 3D tumor categories.On top of that,the proposed method is applied to 2D slices for comparison purposes.The proposed approach attained 100%accuracy in discriminating between all types of tumors,100%Area Under the Curve(AUC),100%sensitivity,and 100%specificity and precision as well in 3D liver tumors.On the other hand,the performance is lower in 2D classification.The maximum accuracy reached 96.4%for two classes and 92.1%for four classes.The top-class performance of the proposed system can be attributed to the exploitation of various types of feature selection methods besides utilizing the ReliefF features selection technique to choose the most relevant features associated with different classes.The novelty of this work appeared in building a highly accurate system under specific circumstances without any processing for the images and human input,besides comparing the performance between 2D and 3D classification.In the future,the presented work can be extended to be used in the huge dataset.Then,it can be a reliable,efficient Computer Aided Diagnosis(CAD)system employed in hospitals in rural areas.
文摘Redundancy,correlation,feature irrelevance,and missing samples are just a few problems that make it difficult to analyze software defect data.Additionally,it might be challenging to maintain an even distribution of data relating to both defective and non-defective software.The latter software class’s data are predominately present in the dataset in the majority of experimental situations.The objective of this review study is to demonstrate the effectiveness of combining ensemble learning and feature selection in improving the performance of defect classification.Besides the successful feature selection approach,a novel variant of the ensemble learning technique is analyzed to address the challenges of feature redundancy and data imbalance,providing robustness in the classification process.To overcome these problems and lessen their impact on the fault classification performance,authors carefully integrate effective feature selection with ensemble learning models.Forward selection demonstrates that a significant area under the receiver operating curve(ROC)can be attributed to only a small subset of features.The Greedy forward selection(GFS)technique outperformed Pearson’s correlation method when evaluating feature selection techniques on the datasets.Ensemble learners,such as random forests(RF)and the proposed average probability ensemble(APE),demonstrate greater resistance to the impact of weak features when compared to weighted support vector machines(W-SVMs)and extreme learning machines(ELM).Furthermore,in the case of the NASA and Java datasets,the enhanced average probability ensemble model,which incorporates the Greedy forward selection technique with the average probability ensemble model,achieved remarkably high accuracy for the area under the ROC.It approached a value of 1.0,indicating exceptional performance.This review emphasizes the importance of meticulously selecting attributes in a software dataset to accurately classify damaged components.In addition,the suggested ensemble learning model successfully addressed the aforementioned problems with software data and produced outstanding classification performance.
文摘The Hand Gestures Recognition(HGR)System can be employed to facilitate communication between humans and computers instead of using special input and output devices.These devices may complicate communication with computers especially for people with disabilities.Hand gestures can be defined as a natural human-to-human communication method,which also can be used in human-computer interaction.Many researchers developed various techniques and methods that aimed to understand and recognize specific hand gestures by employing one or two machine learning algorithms with a reasonable accuracy.Thiswork aims to develop a powerful hand gesture recognition model with a 100%recognition rate.We proposed an ensemble classification model that combines the most powerful machine learning classifiers to obtain diversity and improve accuracy.The majority voting method was used to aggregate accuracies produced by each classifier and get the final classification result.Our model was trained using a self-constructed dataset containing 1600 images of ten different hand gestures.The employing of canny’s edge detector and histogram of oriented gradient method was a great combination with the ensemble classifier and the recognition rate.The experimental results had shown the robustness of our proposed model.Logistic Regression and Support Vector Machine have achieved 100%accuracy.The developed model was validated using two public datasets,and the findings have proved that our model outperformed other compared studies.
基金supported by the National Natural Science Foundation of China(Grant Nos.U21A20153,41941018,52074258,41807250,42177140)the Key Research and Development Project of Hubei Province,China(Grant No.2021BCA133).
文摘Accurately predicting and estimating the squeezing and ground response to tunneling remains challenging.Moreover,tunnel-squeezing hazards are much more likely to occur in deeply buried long tunnels with complex engineering-geological environments.There-fore,a high-performance predictive model for tunnel squeezing is necessary.A superior ensemble classifier is put forward in this study,which is composed of four individual classifiers(gradient boosting classifier,extra-trees classifier,AdaBoost classifier,and Logistic regression classifier)and two optimization algorithms(Bayesian optimization(BO)and sparrow search algorithm(SSA)).The training database covers five parameters:tunnel depth(H),rock tunneling quality index(Q),tunnel diameter(D),support stiffness(K),and strength stress ratio(SSR),about which the basic information is accessible at the early design phases.However,the dataset compiled from the literature is insufficient.Thus,the ten proposed methods are used to replace the missing values.During the model training pro-cess,BO shows its strong ability to optimize seventeen hyperparameters.When applied to tune the classifiers’weights,SSA achieves a fast and efficient performance.The novel Shapley Additive Explanations–LightGBM method indicates that the K is the most important input feature,followed by SSR,Q,H,and D,respectively.The ensemble classifier is then validated using the test set and additional his-torical case projects.The validation shows that the model can achieve an accuracy of 98%(i.e.,the error rate is 2%)on the test set,higher than those achieved by previous prediction models.Moreover,the predicted probability could provide warning information for timely support measures.Finally,the application results are illustrated through tests on the tunnel sections that have not yet been excavated in the line of the Sichuan–Tibet railway project.The applied predictive tendencies and laws are in line with the practical experience.In sum-mary,the proposed model’s prediction results are reasonable,and its prediction will be more accurate as more data is collected and trained for prewarning the tunnel squeezing hazard.
基金National Nature Science Foundations of China (No.60975059, No.60775052)Specialized Research Fund for the Doctoral Program of Higher Education from Ministry of Education of China ( No.20090075110002)Projects of the Shanghai Committee of Science and Technology (No.09JC1400900, No.08JC1400100, No.10DZ0506500)
文摘Membrane proteins are embedded in the lipid bilayer,which creates a suitable environment for their actions. It is important to decide which tpye it belongs to because it is closely relevant to its biological function and its interaction process with other molecules in a biological system. Membrane proteins have different types. The function of a membrane protein is closely correlated with the type it belongs to. In this study,on the basis of the concept of pseudo amino acid (PseAA) composition originally introduced by Chou,the value of approximate entropy (ApEn) of the query membrane protein was used to integrate the complementary information. By fusing fifteen powerful individual fuzzy K-nearest neighbor ( FKNN) classifiers,an ensemble classifier was presented. Each basic classifier was trained in PseAA composition of membrane protein sequences with different parameters. The results of experiments demonstrate it is efficient for the structural prediction of membrane proteins.
文摘Leukemia is blood cancer, including bone marrow and lymphatictissues, typically involving white blood cells. Leukemia produces an abnormalamount of white blood cells compared to normal blood. Deoxyribonucleicacid (DNA) microarrays provide reliable medical diagnostic services to helpmore patients find the proposed treatment for infections. DNA microarraysare also known as biochips that consist of microscopic DNA spots attachedto a solid glass surface. Currently, it is difficult to classify cancers usingmicroarray data. Nearly many data mining techniques have failed becauseof the small sample size, which has become more critical for organizations.However, they are not highly effective in improving results and are frequently employed by doctors for cancer diagnosis. This study proposes a novelmethod using machine learning algorithms based on microarrays of leukemiaGSE9476 cells. The main aim was to predict the initial leukemia disease.Machine learning algorithms such as decision tree (DT), naive bayes (NB),random forest (RF), gradient boosting machine (GBM), linear regression(LinR), support vector machine (SVM), and novel approach based on thecombination of Logistic Regression (LR), DT and SVM named as ensembleLDSVM model. The k-fold cross-validation and grid search optimizationmethods were used with the LDSVM model to classify leukemia in patientsand comparatively analyze their impacts. The proposed approach evaluatedbetter accuracy, precision, recall, and f1 scores than the other algorithms.Furthermore, the results were relatively assessed, which showed LDSVMperformance. This study aims to successfully predict leukemia in patientsand enhance prediction accuracy in minimum time. Moreover, a Syntheticminority oversampling technique (SMOTE) and Principal compenent analysis(PCA) approaches were implemented. This makes the records generalized andevaluates the outcomes well. PCA reduces the feature count without losing anyinformation and deals with class imbalanced datasets, as well as faster modelexecution along with less computation cost. In this study, a novel processwas used to reduce the column results to develop a faster and more rapidexperiment execution.
文摘One recent area of interest in computer science is data stream management and processing. By ‘data stream', we refer to continuous and rapidly generated packages of data. Specific features of data streams are immense volume, high production rate, limited data processing time, and data concept drift; these features differentiate the data stream from standard types of data. An issue for the data stream is classification of input data. A novel ensemble classifier is proposed in this paper. The classifier uses base classifiers of two weighting functions under different data input conditions. In addition, a new method is used to determine drift, which emphasizes the precision of the algorithm. Another characteristic of the proposed method is removal of different numbers of the base classifiers based on their quality. Implementation of a weighting mechanism to the base classifiers at the decision-making stage is another advantage of the algorithm. This facilitates adaptability when drifts take place, which leads to classifiers with higher efficiency. Furthermore, the proposed method is tested on a set of standard data and the results confirm higher accuracy compared to available ensemble classifiers and single classifiers. In addition, in some cases the proposed classifier is faster and needs less storage space.
基金This work was financially supported by the National Natural Science Foundation of China(Grant No.61601034)and the National Natural Science Foundation of China(Grant No.31871525)The authors acknowledge Kimberly Moravec,PhD,from Liwen Bianji,Edanz Editing China(www.liwenbianji.cn/ac),for editing the English text of a draft of this manuscript.
文摘Recognition and counting of greenhouse pests are important for monitoring and forecasting pest population dynamics.This study used image processing techniques to recognize and count whiteflies and thrips on a sticky trap located in a greenhouse environment.The digital images of sticky traps were collected using an image-acquisition system under different greenhouse conditions.If a single color space is used,it is difficult to segment the small pests correctly because of the detrimental effects of non-uniform illumination in complex scenarios.Therefore,a method that first segments object pests in two color spaces using the Prewitt operator in I component of the hue-saturation-intensity(HSI)color space and the Canny operator in the B component of the Lab color space was proposed.Then,the segmented results for the two-color spaces were summed and achieved 91.57%segmentation accuracy.Next,because different features of pests contribute differently to the classification of pest species,the study extracted multiple features(e.g.,color and shape features)in different color spaces for each segmented pest region to improve the recognition performance.Twenty decision trees were used to form a strong ensemble learning classifier that used a majority voting mechanism and obtains 95.73%recognition accuracy.The proposed method is a feasible and effective way to process greenhouse pest images.The system accurately recognized and counted pests in sticky trap images captured under real greenhouse conditions.