期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
Fast Maximum Entropy Machine for Big Imbalanced Datasets
1
作者 Feng Yin Shuqing Lin +1 位作者 Chuxin Piao Shuguang(Robert)Cui 《Journal of Communications and Information Networks》 2018年第3期20-30,共11页
Driven by the need of a plethora of machine learning applications,several attempts have been made at improving the performance of classifiers applied to imbalanced datasets.In this paper,we present a fast maximum entr... Driven by the need of a plethora of machine learning applications,several attempts have been made at improving the performance of classifiers applied to imbalanced datasets.In this paper,we present a fast maximum entropy machine(MEM)combined with a synthetic minority over-sampling technique for handling binary classification problems with high imbalance ratios,large numbers of data samples,and medium/large numbers of features.A random Fourier feature representation of kernel functions and primal estimated sub-gradient solver for support vector machine(PEGASOS)are applied to speed up the classic MEM.Experiments have been conducted using various real datasets(including two China Mobile datasets and several other standard test datasets)with various configurations.The obtained results demonstrate that the proposed algorithm has extremely low complexity but an excellent overall classification performance(in terms of several widely used evaluation metrics)as compared to the classic MEM and some other state-of-the-art methods.The proposed algorithm is particularly valuable in big data applications owing to its significantly low computational complexity. 展开更多
关键词 binary classification imbalanced datasets maximum entropy machine PEGASOS random Fourier feature SMOTE
原文传递
An Imbalanced Dataset and Class Overlapping Classification Model for Big Data
2
作者 Mini Prince P.M.Joe Prathap 《Computer Systems Science & Engineering》 SCIE EI 2023年第2期1009-1024,共16页
Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imba... Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imbalance arises.When dealing with large datasets,most traditional classifiers are stuck in the local optimum problem.As a result,it’s necessary to look into new methods for dealing with large data collections.Several solutions have been proposed for overcoming this issue.The rapid growth of the available data threatens to limit the usefulness of many traditional methods.Methods such as oversampling and undersampling have shown great promises in addressing the issues of class imbalance.Among all of these techniques,Synthetic Minority Oversampling TechniquE(SMOTE)has produced the best results by generating synthetic samples for the minority class in creating a balanced dataset.The issue is that their practical applicability is restricted to problems involving tens of thousands or lower instances of each.In this paper,we have proposed a parallel mode method using SMOTE and MapReduce strategy,this distributes the operation of the algorithm among a group of computational nodes for addressing the aforementioned problem.Our proposed solution has been divided into three stages.Thefirst stage involves the process of splitting the data into different blocks using a mapping function,followed by a pre-processing step for each mapping block that employs a hybrid SMOTE algo-rithm for solving the class imbalanced problem.On each map block,a decision tree model would be constructed.Finally,the decision tree blocks would be com-bined for creating a classification model.We have used numerous datasets with up to 4 million instances in our experiments for testing the proposed scheme’s cap-abilities.As a result,the Hybrid SMOTE appears to have good scalability within the framework proposed,and it also cuts down the processing time. 展开更多
关键词 imbalanced dataset class overlapping SMOTE MAPREDUCE parallel programming OVERSAMPLING
下载PDF
AMDnet:An Academic Misconduct Detection Method for Authors’Behaviors
3
作者 Shihao Zhou Ziyuan Xu +2 位作者 JinHan Xingming Sun Yi Cao 《Computers, Materials & Continua》 SCIE EI 2022年第6期5995-6009,共15页
In recent years,academic misconduct has been frequently exposed by the media,with serious impacts on the academic community.Current research on academic misconduct focuses mainly on detecting plagiarism in article con... In recent years,academic misconduct has been frequently exposed by the media,with serious impacts on the academic community.Current research on academic misconduct focuses mainly on detecting plagiarism in article content through the application of character-based and non-text element detection techniques over the entirety of a manuscript.For the most part,these techniques can only detect cases of textual plagiarism,which means that potential culprits can easily avoid discovery through clever editing and alterations of text content.In this paper,we propose an academic misconduct detection method based on scholars’submission behaviors.The model can effectively capture the atypical behavioral approach and operation of the author.As such,it is able to detect various types of misconduct,thereby improving the accuracy of detection when combined with a text content analysis.The model learns by forming a dual network group that processes text features and user behavior features to detect potential academic misconduct.First,the effect of scholars’behavioral features on the model are considered and analyzed.Second,the Synthetic Minority Oversampling Technique(SMOTE)is applied to address the problem of imbalanced samples of positive and negative classes among contributing scholars.Finally,the text features of the papers are combined with the scholars’behavioral data to improve recognition precision.Experimental results on the imbalanced dataset demonstrate that our model has a highly satisfactory performance in terms of accuracy and recall. 展开更多
关键词 Academic misconduct neural network imbalanced dataset
下载PDF
Tackling imbalanced data in cybersecurity with transfer learning: a case with ROP payload detection
4
作者 Haizhou Wang Anoop Singhal Peng Liu 《Cybersecurity》 EI CSCD 2023年第2期29-43,共15页
In recent years,deep learning gained proliferating popularity in the cybersecurity application domain,since when being compared to traditional machine learning methods,it usually involves less human efforts,produces b... In recent years,deep learning gained proliferating popularity in the cybersecurity application domain,since when being compared to traditional machine learning methods,it usually involves less human efforts,produces better results,and provides better generalizability.However,the imbalanced data issue is very common in cybersecurity,which can substantially deteriorate the performance of the deep learning models.This paper introduces a transfer learning based method to tackle the imbalanced data issue in cybersecurity using return-oriented programming payload detection as a case study.We achieved 0.0290 average false positive rate,0.9705 average F1 score and 0.9521 average detection rate on 3 different target domain programs using 2 different source domain programs,with 0 benign training data sample in the target domain.The performance improvement compared to the baseline is a trade-off between false positive rate and detection rate.Using our approach,the total number of false positives is reduced by 23.16%,and as a trade-off,the number of detected malicious samples decreases by 0.68%. 展开更多
关键词 Domain adaptation Return-oriented programming imbalanced dataset
原文传递
Electricity Theft Detection Method Based on Ensemble Learning and Prototype Learning
5
作者 Xinwu Sun Jiaxiang Hu +4 位作者 Zhenyuan Zhang Di Cao Qi Huang Zhe Chen Weihao Hu 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2024年第1期213-224,共12页
With the development of advanced metering infrastructure(AMI),large amounts of electricity consumption data can be collected for electricity theft detection.However,the imbalance of electricity consumption data is vio... With the development of advanced metering infrastructure(AMI),large amounts of electricity consumption data can be collected for electricity theft detection.However,the imbalance of electricity consumption data is violent,which makes the training of detection model challenging.In this case,this paper proposes an electricity theft detection method based on ensemble learning and prototype learning,which has great performance on imbalanced dataset and abnormal data with different abnormal level.In this paper,convolutional neural network(CNN)and long short-term memory(LSTM)are employed to obtain abstract feature from electricity consumption data.After calculating the means of the abstract feature,the prototype per class is obtained,which is used to predict the labels of unknown samples.In the meanwhile,through training the network by different balanced subsets of training set,the prototype is representative.Compared with some mainstream methods including CNN,random forest(RF)and so on,the proposed method has been proved to effectively deal with the electricity theft detection when abnormal data only account for 2.5%and 1.25%of normal data.The results show that the proposed method outperforms other state-of-the-art methods. 展开更多
关键词 Electricity theft detection ensemble learning prototype learning imbalanced dataset deep learning abnormal level
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部