期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions
1
作者 Minh Thanh Vo Anh H.Vo +2 位作者 Trang Nguyen Rohit Sharma Tuong Le 《Computers, Materials & Continua》 SCIE EI 2021年第7期521-535,共15页
In recent years,the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age.Identifying fraud in job ... In recent years,the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age.Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting.However,the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs.This causes a reduction in the predictability and performance of traditional machine learning models.We therefore present an efficient framework that uses an oversampling technique called FJD-OT(Fake Job Description Detection Using Oversampling Techniques)to improve the predictability of detecting fake job descriptions.In the proposed framework,we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module.We then use a bag of words in combination with the term frequency-inverse document frequency(TF-IDF)approach to extract the features from the text data to create the feature dataset in the second module.Next,our framework applies k-fold cross-validation,a commonly used technique to test the effectiveness of machine learning models,that splits the experimental dataset[the Employment Scam Aegean(ESA)dataset in our study]into training and test sets for evaluation.The training set is passed through the third module,an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module.The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics. 展开更多
关键词 Fake job description detection class imbalance problem oversampling techniques
下载PDF
MCBC-SMOTE:A Majority Clustering Model for Classification of Imbalanced Data
2
作者 Jyoti Arora Meena Tushir +4 位作者 Keshav Sharma Lalit Mohan Aman Singh Abdullah Alharbi Wael Alosaimi 《Computers, Materials & Continua》 SCIE EI 2022年第12期4801-4817,共17页
Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms.In supervised learning,dealing with the problem of class imbalance is still considered to be a challe... Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms.In supervised learning,dealing with the problem of class imbalance is still considered to be a challenging research problem.Various machine learning techniques are designed to operate on balanced datasets;therefore,the state of the art,different undersampling,over-sampling and hybrid strategies have been proposed to deal with the problem of imbalanced datasets,but highly skewed datasets still pose the problem of generalization and noise generation during resampling.To overcome these problems,this paper proposes amajority clusteringmodel for classification of imbalanced datasets known as MCBC-SMOTE(Majority Clustering for balanced Classification-SMOTE).The model provides a method to convert the problem of binary classification into a multi-class problem.In the proposed algorithm,the number of clusters for themajority class is calculated using the elbow method and the minority class is over-sampled as an average of clustered majority classes to generate a symmetrical class distribution.The proposed technique is cost-effective,reduces the problem of noise generation and successfully disables the imbalances present in between and within classes.The results of the evaluations on diverse real datasets proved to provide better classification results as compared to state of the art existing methodologies based on several performance metrics. 展开更多
关键词 imbalance class problem CLASSIFICATION SMOTE K-MEANS CLUSTERING sampling
下载PDF
Optimized Stacked Autoencoder for IoT Enabled Financial Crisis Prediction Model 被引量:2
3
作者 Mesfer Al Duhayyim Hadeel Alsolai +5 位作者 Fahd N.Al-Wesabi Nadhem Nemri Hany Mahgoub Anwer Mustafa Hilal Manar Ahmed Hamza Mohammed Rizwanullah 《Computers, Materials & Continua》 SCIE EI 2022年第4期1079-1094,共16页
Recently,Financial Technology(FinTech)has received more attention among financial sectors and researchers to derive effective solutions for any financial institution or firm.Financial crisis prediction(FCP)is an essen... Recently,Financial Technology(FinTech)has received more attention among financial sectors and researchers to derive effective solutions for any financial institution or firm.Financial crisis prediction(FCP)is an essential topic in business sector that finds it useful to identify the financial condition of a financial institution.At the same time,the development of the internet of things(IoT)has altered the mode of human interaction with the physical world.The IoT can be combined with the FCP model to examine the financial data from the users and perform decision making process.This paper presents a novel multi-objective squirrel search optimization algorithm with stacked autoencoder(MOSSA-SAE)model for FCP in IoT environment.The MOSSA-SAE model encompasses different subprocesses namely preprocessing,class imbalance handling,parameter tuning,and classification.Primarily,the MOSSA-SAE model allows the IoT devices such as smartphones,laptops,etc.,to collect the financial details of the users which are then transmitted to the cloud for further analysis.In addition,SMOTE technique is employed to handle class imbalance problems.The goal of MOSSA in SMOTE is to determine the oversampling rate and area of nearest neighbors of SMOTE.Besides,SAE model is utilized as a classification technique to determine the class label of the financial data.At the same time,the MOSSA is applied to appropriately select the‘weights’and‘bias’values of the SAE.An extensive experimental validation process is performed on the benchmark financial dataset and the results are examined under distinct aspects.The experimental values ensured the superior performance of the MOSSA-SAE model on the applied dataset. 展开更多
关键词 Financial data financial crisis prediction class imbalance problem internet of things stacked autoencoder
下载PDF
A New Classifier for Imbalanced Data Based on a Generalized Density Ratio Model
4
作者 Junjun Li Wenquan Cui 《Communications in Mathematics and Statistics》 SCIE CSCD 2023年第2期369-401,共33页
Achieving higher true positive rate when decreasing false positive rate is always a great challenge to the imbalance learning community.This work combines penalized empirical likelihood method,lower bound algorithm an... Achieving higher true positive rate when decreasing false positive rate is always a great challenge to the imbalance learning community.This work combines penalized empirical likelihood method,lower bound algorithm and Nyströmmethod and applies these techniques along with kernel method to density ratio model.The resulting classifier,density ratio classifier(DRC),is a combination of kernelization,regularization,efficient implementation and threshold moving,all of which are critical to enable DRC to be an effective and powerful method for solving difficult imbalance problems.Compared with other methods,DRC is competitive in that it is widely applicable and it is simple and easy to use without additional imbalance handling skills.In addition,the convergence rate of the estimate of log density ratio is discussed as well.And the results of numerical analysis also show that DRC outperforms other methods in AUC and G-mean score. 展开更多
关键词 CLASSIFIER Density ratio model imbalance problems Kernel method ROC curve
原文传递
SEQUENCE-BASED PROTEIN-PROTEIN INTERACTION PREDICTION VIA SUPPORT VECTOR MACHINE 被引量:1
5
作者 Yongcui WANG Jiguang WANG +1 位作者 Zhixia YANG Naiyang DENG 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2010年第5期1012-1023,共12页
This paper develops sequence-based methods for identifying novel protein-protein interactions (PPIs) by means of support vector machines (SVMs). The authors encode proteins ont only in the gene level but also in t... This paper develops sequence-based methods for identifying novel protein-protein interactions (PPIs) by means of support vector machines (SVMs). The authors encode proteins ont only in the gene level but also in the amino acid level, and design a procedure to select negative training set for dealing with the training dataset imbalance problem, i.e., the number of interacting protein pairs is scarce relative to large scale non-interacting protein pairs. The proposed methods are validated on PPIs data of Plasmodium falciparum and Escherichia coli, and yields the predictive accuracy of 93.8% and 95.3%, respectively. The functional annotation analysis and database search indicate that our novel predictions are worthy of future experimental validation. The new methods will be useful supplementary tools for the future proteomics studies. 展开更多
关键词 imbalance problem protein-protein interactions sequence-based support vector machine.
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部