The Repository Mahasiswa(RAMA)is a national repository of research reports in the form of final assignments,student projects,theses,dissertations,and research reports of lecturers or researchers that have not yet been...The Repository Mahasiswa(RAMA)is a national repository of research reports in the form of final assignments,student projects,theses,dissertations,and research reports of lecturers or researchers that have not yet been published in journals,conferences,or integrated books from the scientific repository of universities and research institutes in Indonesia.The increasing popularity of the RAMA Repository leads to security issues,including the two most widespread,vulnerable attacks i.e.,Structured Query Language(SQL)injection and cross-site scripting(XSS)attacks.An attacker gaining access to data and performing unauthorized data modifications is extremely dangerous.This paper aims to provide an attack detection system for securing the repository portal from the abovementioned attacks.The proposed system combines a Long Short–Term Memory and Principal Component Analysis(LSTM-PCA)model as a classifier.This model can effectively solve the vanishing gradient problem caused by excessive positive samples.The experiment results show that the proposed system achieves an accuracy of 96.85%using an 80%:20%ratio of training data and testing data.The rationale for this best achievement is that the LSTM’s Forget Gate works very well as the PCA supplies only selected features that are significantly relevant to the attacks’patterns.The Forget Gate in LSTM is responsible for deciding which information should be kept for computing the cell state and which one is not relevant and can be discarded.In addition,the LSTM’s Input Gate assists in finding out crucial information and stores specific relevant data in the memory.展开更多
In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)...In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)feature extraction technique.First,dimensionality of the original imbalanced data is reduced using MDS so that distances between any two different samples are preserved as well as possible.Second,a novel OPCE algorithm is applied to classify imbalanced samples by placing optimised observation points in a low-dimensional data space.Third,optimization of the observation point mappings is carried out to obtain a reliable assessment of the unknown samples.Exhaustive experiments have been conducted to evaluate the feasibility,rationality,and effectiveness of the proposed OPCE algorithm using seven benchmark HDIC data sets.Experimental results show that(1)the OPCE algorithm can be trained faster on low-dimensional imbalanced data than on high-dimensional data;(2)the OPCE algorithm can correctly identify samples as the number of optimised observation points is increased;and(3)statistical analysis reveals that OPCE yields better HDIC performances on the selected data sets in comparison with eight other HDIC algorithms.This demonstrates that OPCE is a viable algorithm to deal with HDIC problems.展开更多
Abstract-Common spatial pattern (CSP) algorithm is a successful tool in feature estimate of brain-computer interface (BCI). However, CSP is sensitive to outlier and may result in poor outcomes since it is based on...Abstract-Common spatial pattern (CSP) algorithm is a successful tool in feature estimate of brain-computer interface (BCI). However, CSP is sensitive to outlier and may result in poor outcomes since it is based on pooling the covariance matrices of trials. In this paper, we propose a simple yet effective approach, named common spatial pattern ensemble (CSPE) classifier, to improve CSP performance. Through division of recording channels, multiple CSP filters are constructed. By projection, log-operation, and subtraction on the original signal, an ensemble classifier, majority voting, is achieved and outlier contaminations are alleviated. Experiment results demonstrate that the proposed CSPE classifier is robust to various artifacts and can achieve an average accuracy of 83.02%.展开更多
Recently, it has been seen that the ensemble classifier is an effective way to enhance the prediction performance. However, it usually suffers from the problem of how to construct an appropriate classifier based on a ...Recently, it has been seen that the ensemble classifier is an effective way to enhance the prediction performance. However, it usually suffers from the problem of how to construct an appropriate classifier based on a set of complex data, for example,the data with many dimensions or hierarchical attributes. This study proposes a method to constructe an ensemble classifier based on the key attributes. In addition to its high-performance on precision shared by common ensemble classifiers, the calculation results are highly intelligible and thus easy for understanding.Furthermore, the experimental results based on the real data collected from China Mobile show that the keyattributes-based ensemble classifier has the good performance on both of the classifier construction and the customer churn prediction.展开更多
Multiple classifier system exhibits strong classification capacity compared with single classifiers,but they require significant computational resources.Selective ensemble system aims to attain equivalent or better cl...Multiple classifier system exhibits strong classification capacity compared with single classifiers,but they require significant computational resources.Selective ensemble system aims to attain equivalent or better classification accuracy with fewer classifiers.However,current methods fail to identify precise solutions for constructing an ensemble classifier.In this study,we propose an ensemble classifier design technique based on the perturbation binary salp swarm algorithm(ECDPB).Considering that extreme learning machines(ELMs)have rapid learning rates and good generalization ability,they can serve as the basic classifier for creating multiple candidates while using fewer computational resources.Meanwhile,we introduce a combined diversity measure by taking the complementarity and accuracy of ELMs into account;it is used to identify the ELMs that have good diversity and low error.In addition,we propose an ECDPB with powerful optimizing ability;it is employed to find the optimal subset of ELMs.The selected ELMs can then be used to forman ensemble classifier.Experiments on 10 benchmark datasets have been conducted,and the results demonstrate that the proposed ECDPB delivers superior classification capacity when compared with alternative methods.展开更多
For many real-world multiobjective optimization problems,the evaluations of the objective functions are computationally expensive.Such problems are usually called expensive multiobjective optimization problems(EMOPs)....For many real-world multiobjective optimization problems,the evaluations of the objective functions are computationally expensive.Such problems are usually called expensive multiobjective optimization problems(EMOPs).One type of feasible approaches for EMOPs is to introduce the computationally efficient surrogates for reducing the number of function evaluations.Inspired from ensemble learning,this paper proposes a multiobjective evolutionary algorithm with an ensemble classifier(MOEA-EC)for EMOPs.More specifically,multiple decision tree models are used as an ensemble classifier for the pre-selection,which is be more helpful for further reducing the function evaluations of the solutions than using single inaccurate model.The extensive experimental studies have been conducted to verify the efficiency of MOEA-EC by comparing it with several advanced multiobjective expensive optimization algorithms.The experimental results show that MOEA-EC outperforms the compared algorithms.展开更多
A robust smile recognition system could be widely used for many real-world applications.Classification of a facial smile in an unconstrained setting is difficult due to the invertible and wide variety in face images.I...A robust smile recognition system could be widely used for many real-world applications.Classification of a facial smile in an unconstrained setting is difficult due to the invertible and wide variety in face images.In this paper,an adaptive model for smile expression classification is suggested that integrates a fast features extraction algorithm and cascade classifiers.Our model takes advantage of the intrinsic association between face detection,smile,and other face features to alleviate the over-fitting issue on the limited training set and increase classification results.The features are extracted taking into account to exclude any unnecessary coefficients in the feature vector;thereby enhancing the discriminatory capacity of the extracted features and reducing the computational process.Still,the main causes of error in learning are due to noise,bias,and variance.Ensemble helps to minimize these factors.Combinations of multiple classifiers decrease variance,especially in the case of unstable classifiers,and may produce a more reliable classification than a single classifier.However,a shortcoming of bagging as the best ensemble classifier is its random selection,where the classification performance relies on the chance to pick an appropriate subset of training items.The suggested model employs a modified form of bagging while creating training sets to deal with this challenge(error-based bootstrapping).The experimental results for smile classification on the JAFFE,CK+,and CK+48 benchmark datasets show the feasibility of our proposed model.展开更多
Deep Learning is a powerful technique that is widely applied to Image Recognition and Natural Language Processing tasks amongst many other tasks. In this work, we propose an efficient technique to utilize pre-trained ...Deep Learning is a powerful technique that is widely applied to Image Recognition and Natural Language Processing tasks amongst many other tasks. In this work, we propose an efficient technique to utilize pre-trained Convolutional Neural Network (CNN) architectures to extract powerful features from images for object recognition purposes. We have built on the existing concept of extending the learning from pre-trained CNNs to new databases through activations by proposing to consider multiple deep layers. We have exploited the progressive learning that happens at the various intermediate layers of the CNNs to construct Deep Multi-Layer (DM-L) based Feature Extraction vectors to achieve excellent object recognition performance. Two popular pre-trained CNN architecture models i.e. the VGG_16 and VGG_19 have been used in this work to extract the feature sets from 3 deep fully connected multiple layers namely “fc6”, “fc7” and “fc8” from inside the models for object recognition purposes. Using the Principal Component Analysis (PCA) technique, the Dimensionality of the DM-L feature vectors has been reduced to form powerful feature vectors that have been fed to an external Classifier Ensemble for classification instead of the Softmax based classification layers of the two original pre-trained CNN models. The proposed DM-L technique has been applied to the Benchmark Caltech-101 object recognition database. Conventional wisdom may suggest that feature extractions based on the deepest layer i.e. “fc8” compared to “fc6” will result in the best recognition performance but our results have proved it otherwise for the two considered models. Our experiments have revealed that for the two models under consideration, the “fc6” based feature vectors have achieved the best recognition performance. State-of-the-Art recognition performances of 91.17% and 91.35% have been achieved by utilizing the “fc6” based feature vectors for the VGG_16 and VGG_19 models respectively. The recognition performance has been achieved by considering 30 sample images per class whereas the proposed system is capable of achieving improved performance by considering all sample images per class. Our research shows that for feature extraction based on CNNs, multiple layers should be considered and then the best layer can be selected that maximizes the recognition performance.展开更多
The departure of good employee incurs direct and indirect cost and impacts for an organization.The direct cost arises from hiring to training of the relevant employee.The replacement time and lost productivity affect ...The departure of good employee incurs direct and indirect cost and impacts for an organization.The direct cost arises from hiring to training of the relevant employee.The replacement time and lost productivity affect the running of business processes.This work presents the use of ensemble classifier to identify important attributes that affects attrition significantly.The data consists of attributes related to job function,education level,satisfaction towards work and working relationship,compensation,and frequency of business travel.Both bagging and boosting classifiers were used for testing.The results show that the selected features(nine selected features)achieve the same result as the full features.The selected features are age,income,working years,source of employment,years since last promotion,salary hike,and business travelling frequency.These features were selected using ensemble classifiers.Satisfaction on work and relationship do not appear to be significant attributes in attrition from ensemble classifier’s results.展开更多
Incredible progress has been made in human action recognition(HAR),significantly impacting computer vision applications in sports analytics.However,identifying dynamic and complex movements in sports like badminton re...Incredible progress has been made in human action recognition(HAR),significantly impacting computer vision applications in sports analytics.However,identifying dynamic and complex movements in sports like badminton remains challenging due to the need for precise recognition accuracy and better management of complex motion patterns.Deep learning techniques like convolutional neural networks(CNNs),long short-term memory(LSTM),and graph convolutional networks(GCNs)improve recognition in large datasets,while the traditional machine learning methods like SVM(support vector machines),RF(random forest),and LR(logistic regression),combined with handcrafted features and ensemble approaches,perform well but struggle with the complexity of fast-paced sports like badminton.We proposed an ensemble learning model combining support vector machines(SVM),logistic regression(LR),random forest(RF),and adaptive boosting(AdaBoost)for badminton action recognition.The data in this study consist of video recordings of badminton stroke techniques,which have been extracted into spatiotemporal data.The three-dimensional distance between each skeleton point and the right hip represents the spatial features.The temporal features are the results of Fast Dynamic Time Warping(FDTW)calculations applied to 15 frames of each video sequence.The weighted ensemble model employs soft voting classifiers from SVM,LR,RF,and AdaBoost to enhance the accuracy of badminton action recognition.The E2 ensemble model,which combines SVM,LR,and AdaBoost,achieves the highest accuracy of 95.38%.展开更多
Redundancy,correlation,feature irrelevance,and missing samples are just a few problems that make it difficult to analyze software defect data.Additionally,it might be challenging to maintain an even distribution of da...Redundancy,correlation,feature irrelevance,and missing samples are just a few problems that make it difficult to analyze software defect data.Additionally,it might be challenging to maintain an even distribution of data relating to both defective and non-defective software.The latter software class’s data are predominately present in the dataset in the majority of experimental situations.The objective of this review study is to demonstrate the effectiveness of combining ensemble learning and feature selection in improving the performance of defect classification.Besides the successful feature selection approach,a novel variant of the ensemble learning technique is analyzed to address the challenges of feature redundancy and data imbalance,providing robustness in the classification process.To overcome these problems and lessen their impact on the fault classification performance,authors carefully integrate effective feature selection with ensemble learning models.Forward selection demonstrates that a significant area under the receiver operating curve(ROC)can be attributed to only a small subset of features.The Greedy forward selection(GFS)technique outperformed Pearson’s correlation method when evaluating feature selection techniques on the datasets.Ensemble learners,such as random forests(RF)and the proposed average probability ensemble(APE),demonstrate greater resistance to the impact of weak features when compared to weighted support vector machines(W-SVMs)and extreme learning machines(ELM).Furthermore,in the case of the NASA and Java datasets,the enhanced average probability ensemble model,which incorporates the Greedy forward selection technique with the average probability ensemble model,achieved remarkably high accuracy for the area under the ROC.It approached a value of 1.0,indicating exceptional performance.This review emphasizes the importance of meticulously selecting attributes in a software dataset to accurately classify damaged components.In addition,the suggested ensemble learning model successfully addressed the aforementioned problems with software data and produced outstanding classification performance.展开更多
In today’s digital world,millions of individuals are linked to one another via the Internet and social media.This opens up new avenues for information exchange with others.Sentiment analysis(SA)has gotten a lot of at...In today’s digital world,millions of individuals are linked to one another via the Internet and social media.This opens up new avenues for information exchange with others.Sentiment analysis(SA)has gotten a lot of attention during the last decade.We analyse the challenges of Sentiment Analysis(SA)in one of the Asian regional languages known as Marathi in this study by providing a benchmark setup in which wefirst produced an annotated dataset composed of Marathi text acquired from microblogging websites such as Twitter.We also choose domain experts to manually annotate Marathi microblogging posts with positive,negative,and neutral polarity.In addition,to show the efficient use of the annotated dataset,an ensemble-based model for sentiment analysis was created.In contrast to others machine learning classifier,we achieved better performance in terms of accuracy for ensemble classifier with 10-fold cross-validation(cv),outcomes as 97.77%,f-score is 97.89%.展开更多
Cardiotocography(CTG)represents the fetus’s health inside the womb during labor.However,assessment of its readings can be a highly subjective process depending on the expertise of the obstetrician.Digital signals fro...Cardiotocography(CTG)represents the fetus’s health inside the womb during labor.However,assessment of its readings can be a highly subjective process depending on the expertise of the obstetrician.Digital signals from fetal monitors acquire parameters(i.e.,fetal heart rate,contractions,acceleration).Objective:This paper aims to classify the CTG readings containing imbalanced healthy,suspected,and pathological fetus readings.Method:We perform two sets of experiments.Firstly,we employ five classifiers:Random Forest(RF),Adaptive Boosting(AdaBoost),Categorical Boosting(CatBoost),Extreme Gradient Boosting(XGBoost),and Light Gradient Boosting Machine(LGBM)without over-sampling to classify CTG readings into three categories:healthy,suspected,and pathological.Secondly,we employ an ensemble of the above-described classifiers with the oversamplingmethod.We use a random over-sampling technique to balance CTG records to train the ensemble models.We use 3602 CTG readings to train the ensemble classifiers and 1201 records to evaluate them.The outcomes of these classifiers are then fed into the soft voting classifier to obtain the most accurate results.Results:Each classifier evaluates accuracy,Precision,Recall,F1-scores,and Area Under the Receiver Operating Curve(AUROC)values.Results reveal that the XGBoost,LGBM,and CatBoost classifiers yielded 99%accuracy.Conclusion:Using ensemble classifiers over a balanced CTG dataset improves the detection accuracy compared to the previous studies and our first experiment.A soft voting classifier then eliminates the weakness of one individual classifier to yield superior performance of the overall model.展开更多
Fuel injectors are considered as an important component of combustion engines. Operational weakness can possibly lead to the complete machine malfunction, decreasing reliability and leading to loss of production. To o...Fuel injectors are considered as an important component of combustion engines. Operational weakness can possibly lead to the complete machine malfunction, decreasing reliability and leading to loss of production. To overcome these circumstances, various condition monitoring techniques can be applied. The application of acoustic signals is common in the field of fault diagnosis of rotating machinery. Advanced signal processing is utilized for the construction of features that are specialized in detecting fuel injector faults. A performance comparison between novelty detection algorithms in the form of one-class classifiers is presented. The one-class classifiers that were tested included One-Class Support Vector Machine (OCSVM) and One-Class Self Organizing Map (OCSOM). The acoustic signals of fuel injectors in different operational conditions were processed for feature extraction. Features from all the signals were used as input to the one-class classifiers. The one-class classifiers were trained only with healthy fuel injector conditions and compared with new experimental data which belonged to different operational conditions that were not included in the training set so as to contribute to generalization. The results present the effectiveness of one-class classifiers for detecting faults in fuel injectors.展开更多
Objective:The annual influenza epidemic is a heavy burden on the health care system,and has increasingly become a major public health problem in some areas,such as Hong Kong(China).Therefore,based on a variety of mach...Objective:The annual influenza epidemic is a heavy burden on the health care system,and has increasingly become a major public health problem in some areas,such as Hong Kong(China).Therefore,based on a variety of machine learning methods,and considering the seasonal influenza in Hong Kong,the study aims to establish a Combinatorial Judgment Classifier(CJC)model to classify the epidemic trend and improve the accuracy of influenza epidemic early warning.展开更多
Automatic Speech Emotion Recognition(SER)is used to recognize emotion from speech automatically.Speech Emotion recognition is working well in a laboratory environment but real-time emotion recognition has been influen...Automatic Speech Emotion Recognition(SER)is used to recognize emotion from speech automatically.Speech Emotion recognition is working well in a laboratory environment but real-time emotion recognition has been influenced by the variations in gender,age,the cultural and acoustical background of the speaker.The acoustical resemblance between emotional expressions further increases the complexity of recognition.Many recent research works are concentrated to address these effects individually.Instead of addressing every influencing attribute individually,we would like to design a system,which reduces the effect that arises on any factor.We propose a two-level Hierarchical classifier named Interpreter of responses(IR).Thefirst level of IR has been realized using Support Vector Machine(SVM)and Gaussian Mixer Model(GMM)classifiers.In the second level of IR,a discriminative SVM classifier has been trained and tested with meta information offirst-level classifiers along with the input acoustical feature vector which is used in primary classifiers.To train the system with a corpus of versatile nature,an integrated emotion corpus has been composed using emotion samples of 5 speech corpora,namely;EMO-DB,IITKGP-SESC,SAVEE Corpus,Spanish emotion corpus,CMU's Woogle corpus.The hierarchical classifier has been trained and tested using MFCC and Low-Level Descriptors(LLD).The empirical analysis shows that the proposed classifier outperforms the traditional classifiers.The proposed ensemble design is very generic and can be adapted even when the number and nature of features change.Thefirst-level classifiers GMM or SVM may be replaced with any other learning algorithm.展开更多
Because most ensemble learning algorithms use the centralized model, and the training instances must be centralized on a single station, it is difficult to centralize the training data on a station. A distributed ense...Because most ensemble learning algorithms use the centralized model, and the training instances must be centralized on a single station, it is difficult to centralize the training data on a station. A distributed ensemble learning algorithm is proposed which has two kinds of weight genes of instances that denote the global distribution and the local distribution. Instead of the repeated sampling method in the standard ensemble learning, non-balance sampling from each station is used to train the base classifier set of each station. The concept of the effective nearby region for local integration classifier is proposed, and is used for the dynamic integration method of multiple classifiers in distributed environment. The experiments show that the ensemble learning algorithm in distributed environment proposed could reduce the time of training the base classifiers effectively, and ensure the classify performance is as same as the centralized learning method.展开更多
One of the most common types of threats to the digital world is malicious software.It is of great importance to detect and prevent existing and new malware before it damages information assets.Machine learning approac...One of the most common types of threats to the digital world is malicious software.It is of great importance to detect and prevent existing and new malware before it damages information assets.Machine learning approaches are used effectively for this purpose.In this study,we present a model in which supervised and unsupervised learning algorithms are used together.Clustering is used to enhance the prediction performance of the supervised classifiers.The aim of the proposed model is to make predictions in the shortest possible time with high accuracy and f1 score.In the first stage of the model,the data are clustered with the k-means algorithm.In the second stage,the prediction is made with the combination of the classifier with the best prediction performance for the related cluster.While choosing the best classifiers for the given clusters,triple combinations of ten machine learning algorithms(kernel support vector machine,k-nearest neighbor,naive Bayes,decision tree,random forest,extra gradient boosting,categorical boosting,adaptive boosting,extra trees,and gradient boosting)are used.The selected triple classifier combination is positioned in two stages.The prediction time of the model is improved by positioning the classifier with the slowest prediction time in the second stage.The selected triple classifier combination is positioned in two tiers.The prediction time of the model is improved by positioning the classifier with the highest prediction time in the second tier.It is seen that clustering before classification improves prediction performance,which is presented using Blue Hexagon Open Dataset for Malware Analysis(BODMAS),Elastic Malware Benchmark for Empowering Researchers(EMBER)2018 and Kaggle malware detection datasets.The model has 99.74%accuracy and 99.77%f1 score for the BODMAS dataset,99.04%accuracy and 98.63%f1 score for the Kaggle malware detection dataset,and 96.77%accuracy and 96.77%f1 score for the EMBER 2018 dataset.In addition,the tiered positioning of classifiers shortened the average prediction time by 76.13%for the BODMAS dataset and 95.95%for the EMBER 2018 dataset.The proposed method’s prediction performance is better than the rest of the studies in the literature in which BODMAS and EMBER 2018 datasets are used.展开更多
Land cover classification of mountainous environments continues to be a challenging remote sensing problem,owing to landscape complexities exhibited by the region.This study explored a multiple classifier system(MCS)a...Land cover classification of mountainous environments continues to be a challenging remote sensing problem,owing to landscape complexities exhibited by the region.This study explored a multiple classifier system(MCS)approach to the classification of mountain land cover for the Khumbu region in the Himalayas using Sentinel-2 images and a cloud-based model framework.The relationship between classification accuracy and MCS diversity was investigated,and the effects of different diversification and combination methods on MCS classification performance were comparatively assessed for this environment.We present ten MCS models that implement a homogeneous ensemble approach,using the high performing Random Forest(RF)algorithm as the selected classifier.The base classifiers of each MCS model were developed using different combinations of three diversity techniques:(1)distinct training sets,(2)Mean Decrease Accuracy feature selection,and(3)‘One-vs-All’problem reduction.The base classifier predictions of each RFMCS model were combined using:(1)majority vote,(2)weighted argmax,and(3)a meta RF classifier.All MCS models reported higher classification accuracies than the benchmark classifier(overall accuracy with 95% confidence interval:87.33%±0.97%),with the highest performing model reporting an overall accuracy(±95% confidence interval)of 90.95%±0.84%.Our key findings include:(1)MCS is effective in mountainous environments prone to noise from landscape complexities,(2)problem reduction is indicated as a stronger method over feature selection in improving the diversity of the MCS,(3)although the MCS diversity and accuracy have a positive correlation,our results suggest this is a weak relationship for mountainous classifications,and(4)the selected diversity methods improve the discriminability of MCS against vegetation and forest classes in mountainous land cover classifications and exhibit a cumulative effect on MCS diversity for this context.展开更多
文摘The Repository Mahasiswa(RAMA)is a national repository of research reports in the form of final assignments,student projects,theses,dissertations,and research reports of lecturers or researchers that have not yet been published in journals,conferences,or integrated books from the scientific repository of universities and research institutes in Indonesia.The increasing popularity of the RAMA Repository leads to security issues,including the two most widespread,vulnerable attacks i.e.,Structured Query Language(SQL)injection and cross-site scripting(XSS)attacks.An attacker gaining access to data and performing unauthorized data modifications is extremely dangerous.This paper aims to provide an attack detection system for securing the repository portal from the abovementioned attacks.The proposed system combines a Long Short–Term Memory and Principal Component Analysis(LSTM-PCA)model as a classifier.This model can effectively solve the vanishing gradient problem caused by excessive positive samples.The experiment results show that the proposed system achieves an accuracy of 96.85%using an 80%:20%ratio of training data and testing data.The rationale for this best achievement is that the LSTM’s Forget Gate works very well as the PCA supplies only selected features that are significantly relevant to the attacks’patterns.The Forget Gate in LSTM is responsible for deciding which information should be kept for computing the cell state and which one is not relevant and can be discarded.In addition,the LSTM’s Input Gate assists in finding out crucial information and stores specific relevant data in the memory.
基金National Natural Science Foundation of China,Grant/Award Number:61972261Basic Research Foundations of Shenzhen,Grant/Award Numbers:JCYJ20210324093609026,JCYJ20200813091134001。
文摘In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)feature extraction technique.First,dimensionality of the original imbalanced data is reduced using MDS so that distances between any two different samples are preserved as well as possible.Second,a novel OPCE algorithm is applied to classify imbalanced samples by placing optimised observation points in a low-dimensional data space.Third,optimization of the observation point mappings is carried out to obtain a reliable assessment of the unknown samples.Exhaustive experiments have been conducted to evaluate the feasibility,rationality,and effectiveness of the proposed OPCE algorithm using seven benchmark HDIC data sets.Experimental results show that(1)the OPCE algorithm can be trained faster on low-dimensional imbalanced data than on high-dimensional data;(2)the OPCE algorithm can correctly identify samples as the number of optimised observation points is increased;and(3)statistical analysis reveals that OPCE yields better HDIC performances on the selected data sets in comparison with eight other HDIC algorithms.This demonstrates that OPCE is a viable algorithm to deal with HDIC problems.
基金supported by the National Natural Science Foundation of China under Grant No. 30525030, 60701015, and 60736029.
文摘Abstract-Common spatial pattern (CSP) algorithm is a successful tool in feature estimate of brain-computer interface (BCI). However, CSP is sensitive to outlier and may result in poor outcomes since it is based on pooling the covariance matrices of trials. In this paper, we propose a simple yet effective approach, named common spatial pattern ensemble (CSPE) classifier, to improve CSP performance. Through division of recording channels, multiple CSP filters are constructed. By projection, log-operation, and subtraction on the original signal, an ensemble classifier, majority voting, is achieved and outlier contaminations are alleviated. Experiment results demonstrate that the proposed CSPE classifier is robust to various artifacts and can achieve an average accuracy of 83.02%.
基金supported by the National Natural Science Foundation of China under Grants No.71271044 and 71572029
文摘Recently, it has been seen that the ensemble classifier is an effective way to enhance the prediction performance. However, it usually suffers from the problem of how to construct an appropriate classifier based on a set of complex data, for example,the data with many dimensions or hierarchical attributes. This study proposes a method to constructe an ensemble classifier based on the key attributes. In addition to its high-performance on precision shared by common ensemble classifiers, the calculation results are highly intelligible and thus easy for understanding.Furthermore, the experimental results based on the real data collected from China Mobile show that the keyattributes-based ensemble classifier has the good performance on both of the classifier construction and the customer churn prediction.
基金supported in part by the Anhui Provincial Natural Science Founda-tion[1908085QG298,1908085MG232]the National Nature Science Foundation of China[91546108,61806068]+5 种基金the National Social Science Foundation of China[21BTJ002]the Anhui Provincial Science:and Technology Major Projects Grant[201903a05020020]the Fundamental Research Funds for the Central Universities[Z2019HGTA0053,JZ2019HG BZ0128]the Humanities and Social Science Fund of Ministry of Education of China[20YJA790021]the Major Project of Philosophy and Social Science Planning of Zhejiang Province[22YJRC07ZD]the Open Research Fund Program of Key Laboratory of Process Optimization and Intelligent Decision-Making(Hefei University of Technology),Ministry of Education.
文摘Multiple classifier system exhibits strong classification capacity compared with single classifiers,but they require significant computational resources.Selective ensemble system aims to attain equivalent or better classification accuracy with fewer classifiers.However,current methods fail to identify precise solutions for constructing an ensemble classifier.In this study,we propose an ensemble classifier design technique based on the perturbation binary salp swarm algorithm(ECDPB).Considering that extreme learning machines(ELMs)have rapid learning rates and good generalization ability,they can serve as the basic classifier for creating multiple candidates while using fewer computational resources.Meanwhile,we introduce a combined diversity measure by taking the complementarity and accuracy of ELMs into account;it is used to identify the ELMs that have good diversity and low error.In addition,we propose an ECDPB with powerful optimizing ability;it is employed to find the optimal subset of ELMs.The selected ELMs can then be used to forman ensemble classifier.Experiments on 10 benchmark datasets have been conducted,and the results demonstrate that the proposed ECDPB delivers superior classification capacity when compared with alternative methods.
文摘For many real-world multiobjective optimization problems,the evaluations of the objective functions are computationally expensive.Such problems are usually called expensive multiobjective optimization problems(EMOPs).One type of feasible approaches for EMOPs is to introduce the computationally efficient surrogates for reducing the number of function evaluations.Inspired from ensemble learning,this paper proposes a multiobjective evolutionary algorithm with an ensemble classifier(MOEA-EC)for EMOPs.More specifically,multiple decision tree models are used as an ensemble classifier for the pre-selection,which is be more helpful for further reducing the function evaluations of the solutions than using single inaccurate model.The extensive experimental studies have been conducted to verify the efficiency of MOEA-EC by comparing it with several advanced multiobjective expensive optimization algorithms.The experimental results show that MOEA-EC outperforms the compared algorithms.
文摘A robust smile recognition system could be widely used for many real-world applications.Classification of a facial smile in an unconstrained setting is difficult due to the invertible and wide variety in face images.In this paper,an adaptive model for smile expression classification is suggested that integrates a fast features extraction algorithm and cascade classifiers.Our model takes advantage of the intrinsic association between face detection,smile,and other face features to alleviate the over-fitting issue on the limited training set and increase classification results.The features are extracted taking into account to exclude any unnecessary coefficients in the feature vector;thereby enhancing the discriminatory capacity of the extracted features and reducing the computational process.Still,the main causes of error in learning are due to noise,bias,and variance.Ensemble helps to minimize these factors.Combinations of multiple classifiers decrease variance,especially in the case of unstable classifiers,and may produce a more reliable classification than a single classifier.However,a shortcoming of bagging as the best ensemble classifier is its random selection,where the classification performance relies on the chance to pick an appropriate subset of training items.The suggested model employs a modified form of bagging while creating training sets to deal with this challenge(error-based bootstrapping).The experimental results for smile classification on the JAFFE,CK+,and CK+48 benchmark datasets show the feasibility of our proposed model.
文摘Deep Learning is a powerful technique that is widely applied to Image Recognition and Natural Language Processing tasks amongst many other tasks. In this work, we propose an efficient technique to utilize pre-trained Convolutional Neural Network (CNN) architectures to extract powerful features from images for object recognition purposes. We have built on the existing concept of extending the learning from pre-trained CNNs to new databases through activations by proposing to consider multiple deep layers. We have exploited the progressive learning that happens at the various intermediate layers of the CNNs to construct Deep Multi-Layer (DM-L) based Feature Extraction vectors to achieve excellent object recognition performance. Two popular pre-trained CNN architecture models i.e. the VGG_16 and VGG_19 have been used in this work to extract the feature sets from 3 deep fully connected multiple layers namely “fc6”, “fc7” and “fc8” from inside the models for object recognition purposes. Using the Principal Component Analysis (PCA) technique, the Dimensionality of the DM-L feature vectors has been reduced to form powerful feature vectors that have been fed to an external Classifier Ensemble for classification instead of the Softmax based classification layers of the two original pre-trained CNN models. The proposed DM-L technique has been applied to the Benchmark Caltech-101 object recognition database. Conventional wisdom may suggest that feature extractions based on the deepest layer i.e. “fc8” compared to “fc6” will result in the best recognition performance but our results have proved it otherwise for the two considered models. Our experiments have revealed that for the two models under consideration, the “fc6” based feature vectors have achieved the best recognition performance. State-of-the-Art recognition performances of 91.17% and 91.35% have been achieved by utilizing the “fc6” based feature vectors for the VGG_16 and VGG_19 models respectively. The recognition performance has been achieved by considering 30 sample images per class whereas the proposed system is capable of achieving improved performance by considering all sample images per class. Our research shows that for feature extraction based on CNNs, multiple layers should be considered and then the best layer can be selected that maximizes the recognition performance.
文摘The departure of good employee incurs direct and indirect cost and impacts for an organization.The direct cost arises from hiring to training of the relevant employee.The replacement time and lost productivity affect the running of business processes.This work presents the use of ensemble classifier to identify important attributes that affects attrition significantly.The data consists of attributes related to job function,education level,satisfaction towards work and working relationship,compensation,and frequency of business travel.Both bagging and boosting classifiers were used for testing.The results show that the selected features(nine selected features)achieve the same result as the full features.The selected features are age,income,working years,source of employment,years since last promotion,salary hike,and business travelling frequency.These features were selected using ensemble classifiers.Satisfaction on work and relationship do not appear to be significant attributes in attrition from ensemble classifier’s results.
基金supported by the Center for Higher Education Funding(BPPT)and the Indonesia Endowment Fund for Education(LPDP),as acknowledged in decree number 02092/J5.2.3/BPI.06/9/2022。
文摘Incredible progress has been made in human action recognition(HAR),significantly impacting computer vision applications in sports analytics.However,identifying dynamic and complex movements in sports like badminton remains challenging due to the need for precise recognition accuracy and better management of complex motion patterns.Deep learning techniques like convolutional neural networks(CNNs),long short-term memory(LSTM),and graph convolutional networks(GCNs)improve recognition in large datasets,while the traditional machine learning methods like SVM(support vector machines),RF(random forest),and LR(logistic regression),combined with handcrafted features and ensemble approaches,perform well but struggle with the complexity of fast-paced sports like badminton.We proposed an ensemble learning model combining support vector machines(SVM),logistic regression(LR),random forest(RF),and adaptive boosting(AdaBoost)for badminton action recognition.The data in this study consist of video recordings of badminton stroke techniques,which have been extracted into spatiotemporal data.The three-dimensional distance between each skeleton point and the right hip represents the spatial features.The temporal features are the results of Fast Dynamic Time Warping(FDTW)calculations applied to 15 frames of each video sequence.The weighted ensemble model employs soft voting classifiers from SVM,LR,RF,and AdaBoost to enhance the accuracy of badminton action recognition.The E2 ensemble model,which combines SVM,LR,and AdaBoost,achieves the highest accuracy of 95.38%.
文摘Redundancy,correlation,feature irrelevance,and missing samples are just a few problems that make it difficult to analyze software defect data.Additionally,it might be challenging to maintain an even distribution of data relating to both defective and non-defective software.The latter software class’s data are predominately present in the dataset in the majority of experimental situations.The objective of this review study is to demonstrate the effectiveness of combining ensemble learning and feature selection in improving the performance of defect classification.Besides the successful feature selection approach,a novel variant of the ensemble learning technique is analyzed to address the challenges of feature redundancy and data imbalance,providing robustness in the classification process.To overcome these problems and lessen their impact on the fault classification performance,authors carefully integrate effective feature selection with ensemble learning models.Forward selection demonstrates that a significant area under the receiver operating curve(ROC)can be attributed to only a small subset of features.The Greedy forward selection(GFS)technique outperformed Pearson’s correlation method when evaluating feature selection techniques on the datasets.Ensemble learners,such as random forests(RF)and the proposed average probability ensemble(APE),demonstrate greater resistance to the impact of weak features when compared to weighted support vector machines(W-SVMs)and extreme learning machines(ELM).Furthermore,in the case of the NASA and Java datasets,the enhanced average probability ensemble model,which incorporates the Greedy forward selection technique with the average probability ensemble model,achieved remarkably high accuracy for the area under the ROC.It approached a value of 1.0,indicating exceptional performance.This review emphasizes the importance of meticulously selecting attributes in a software dataset to accurately classify damaged components.In addition,the suggested ensemble learning model successfully addressed the aforementioned problems with software data and produced outstanding classification performance.
基金This paper was supported by Wonkwang University in 2022.
文摘In today’s digital world,millions of individuals are linked to one another via the Internet and social media.This opens up new avenues for information exchange with others.Sentiment analysis(SA)has gotten a lot of attention during the last decade.We analyse the challenges of Sentiment Analysis(SA)in one of the Asian regional languages known as Marathi in this study by providing a benchmark setup in which wefirst produced an annotated dataset composed of Marathi text acquired from microblogging websites such as Twitter.We also choose domain experts to manually annotate Marathi microblogging posts with positive,negative,and neutral polarity.In addition,to show the efficient use of the annotated dataset,an ensemble-based model for sentiment analysis was created.In contrast to others machine learning classifier,we achieved better performance in terms of accuracy for ensemble classifier with 10-fold cross-validation(cv),outcomes as 97.77%,f-score is 97.89%.
文摘Cardiotocography(CTG)represents the fetus’s health inside the womb during labor.However,assessment of its readings can be a highly subjective process depending on the expertise of the obstetrician.Digital signals from fetal monitors acquire parameters(i.e.,fetal heart rate,contractions,acceleration).Objective:This paper aims to classify the CTG readings containing imbalanced healthy,suspected,and pathological fetus readings.Method:We perform two sets of experiments.Firstly,we employ five classifiers:Random Forest(RF),Adaptive Boosting(AdaBoost),Categorical Boosting(CatBoost),Extreme Gradient Boosting(XGBoost),and Light Gradient Boosting Machine(LGBM)without over-sampling to classify CTG readings into three categories:healthy,suspected,and pathological.Secondly,we employ an ensemble of the above-described classifiers with the oversamplingmethod.We use a random over-sampling technique to balance CTG records to train the ensemble models.We use 3602 CTG readings to train the ensemble classifiers and 1201 records to evaluate them.The outcomes of these classifiers are then fed into the soft voting classifier to obtain the most accurate results.Results:Each classifier evaluates accuracy,Precision,Recall,F1-scores,and Area Under the Receiver Operating Curve(AUROC)values.Results reveal that the XGBoost,LGBM,and CatBoost classifiers yielded 99%accuracy.Conclusion:Using ensemble classifiers over a balanced CTG dataset improves the detection accuracy compared to the previous studies and our first experiment.A soft voting classifier then eliminates the weakness of one individual classifier to yield superior performance of the overall model.
文摘Fuel injectors are considered as an important component of combustion engines. Operational weakness can possibly lead to the complete machine malfunction, decreasing reliability and leading to loss of production. To overcome these circumstances, various condition monitoring techniques can be applied. The application of acoustic signals is common in the field of fault diagnosis of rotating machinery. Advanced signal processing is utilized for the construction of features that are specialized in detecting fuel injector faults. A performance comparison between novelty detection algorithms in the form of one-class classifiers is presented. The one-class classifiers that were tested included One-Class Support Vector Machine (OCSVM) and One-Class Self Organizing Map (OCSOM). The acoustic signals of fuel injectors in different operational conditions were processed for feature extraction. Features from all the signals were used as input to the one-class classifiers. The one-class classifiers were trained only with healthy fuel injector conditions and compared with new experimental data which belonged to different operational conditions that were not included in the training set so as to contribute to generalization. The results present the effectiveness of one-class classifiers for detecting faults in fuel injectors.
基金This project was supported by grants from the Ministry of Education Humanities and Social Sciences Research Fund Project。
文摘Objective:The annual influenza epidemic is a heavy burden on the health care system,and has increasingly become a major public health problem in some areas,such as Hong Kong(China).Therefore,based on a variety of machine learning methods,and considering the seasonal influenza in Hong Kong,the study aims to establish a Combinatorial Judgment Classifier(CJC)model to classify the epidemic trend and improve the accuracy of influenza epidemic early warning.
文摘Automatic Speech Emotion Recognition(SER)is used to recognize emotion from speech automatically.Speech Emotion recognition is working well in a laboratory environment but real-time emotion recognition has been influenced by the variations in gender,age,the cultural and acoustical background of the speaker.The acoustical resemblance between emotional expressions further increases the complexity of recognition.Many recent research works are concentrated to address these effects individually.Instead of addressing every influencing attribute individually,we would like to design a system,which reduces the effect that arises on any factor.We propose a two-level Hierarchical classifier named Interpreter of responses(IR).Thefirst level of IR has been realized using Support Vector Machine(SVM)and Gaussian Mixer Model(GMM)classifiers.In the second level of IR,a discriminative SVM classifier has been trained and tested with meta information offirst-level classifiers along with the input acoustical feature vector which is used in primary classifiers.To train the system with a corpus of versatile nature,an integrated emotion corpus has been composed using emotion samples of 5 speech corpora,namely;EMO-DB,IITKGP-SESC,SAVEE Corpus,Spanish emotion corpus,CMU's Woogle corpus.The hierarchical classifier has been trained and tested using MFCC and Low-Level Descriptors(LLD).The empirical analysis shows that the proposed classifier outperforms the traditional classifiers.The proposed ensemble design is very generic and can be adapted even when the number and nature of features change.Thefirst-level classifiers GMM or SVM may be replaced with any other learning algorithm.
基金the Natural Science Foundation of Shaan’xi Province (2005F51).
文摘Because most ensemble learning algorithms use the centralized model, and the training instances must be centralized on a single station, it is difficult to centralize the training data on a station. A distributed ensemble learning algorithm is proposed which has two kinds of weight genes of instances that denote the global distribution and the local distribution. Instead of the repeated sampling method in the standard ensemble learning, non-balance sampling from each station is used to train the base classifier set of each station. The concept of the effective nearby region for local integration classifier is proposed, and is used for the dynamic integration method of multiple classifiers in distributed environment. The experiments show that the ensemble learning algorithm in distributed environment proposed could reduce the time of training the base classifiers effectively, and ensure the classify performance is as same as the centralized learning method.
文摘One of the most common types of threats to the digital world is malicious software.It is of great importance to detect and prevent existing and new malware before it damages information assets.Machine learning approaches are used effectively for this purpose.In this study,we present a model in which supervised and unsupervised learning algorithms are used together.Clustering is used to enhance the prediction performance of the supervised classifiers.The aim of the proposed model is to make predictions in the shortest possible time with high accuracy and f1 score.In the first stage of the model,the data are clustered with the k-means algorithm.In the second stage,the prediction is made with the combination of the classifier with the best prediction performance for the related cluster.While choosing the best classifiers for the given clusters,triple combinations of ten machine learning algorithms(kernel support vector machine,k-nearest neighbor,naive Bayes,decision tree,random forest,extra gradient boosting,categorical boosting,adaptive boosting,extra trees,and gradient boosting)are used.The selected triple classifier combination is positioned in two stages.The prediction time of the model is improved by positioning the classifier with the slowest prediction time in the second stage.The selected triple classifier combination is positioned in two tiers.The prediction time of the model is improved by positioning the classifier with the highest prediction time in the second tier.It is seen that clustering before classification improves prediction performance,which is presented using Blue Hexagon Open Dataset for Malware Analysis(BODMAS),Elastic Malware Benchmark for Empowering Researchers(EMBER)2018 and Kaggle malware detection datasets.The model has 99.74%accuracy and 99.77%f1 score for the BODMAS dataset,99.04%accuracy and 98.63%f1 score for the Kaggle malware detection dataset,and 96.77%accuracy and 96.77%f1 score for the EMBER 2018 dataset.In addition,the tiered positioning of classifiers shortened the average prediction time by 76.13%for the BODMAS dataset and 95.95%for the EMBER 2018 dataset.The proposed method’s prediction performance is better than the rest of the studies in the literature in which BODMAS and EMBER 2018 datasets are used.
文摘Land cover classification of mountainous environments continues to be a challenging remote sensing problem,owing to landscape complexities exhibited by the region.This study explored a multiple classifier system(MCS)approach to the classification of mountain land cover for the Khumbu region in the Himalayas using Sentinel-2 images and a cloud-based model framework.The relationship between classification accuracy and MCS diversity was investigated,and the effects of different diversification and combination methods on MCS classification performance were comparatively assessed for this environment.We present ten MCS models that implement a homogeneous ensemble approach,using the high performing Random Forest(RF)algorithm as the selected classifier.The base classifiers of each MCS model were developed using different combinations of three diversity techniques:(1)distinct training sets,(2)Mean Decrease Accuracy feature selection,and(3)‘One-vs-All’problem reduction.The base classifier predictions of each RFMCS model were combined using:(1)majority vote,(2)weighted argmax,and(3)a meta RF classifier.All MCS models reported higher classification accuracies than the benchmark classifier(overall accuracy with 95% confidence interval:87.33%±0.97%),with the highest performing model reporting an overall accuracy(±95% confidence interval)of 90.95%±0.84%.Our key findings include:(1)MCS is effective in mountainous environments prone to noise from landscape complexities,(2)problem reduction is indicated as a stronger method over feature selection in improving the diversity of the MCS,(3)although the MCS diversity and accuracy have a positive correlation,our results suggest this is a weak relationship for mountainous classifications,and(4)the selected diversity methods improve the discriminability of MCS against vegetation and forest classes in mountainous land cover classifications and exhibit a cumulative effect on MCS diversity for this context.