In recent years,the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age.Identifying fraud in job ...In recent years,the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age.Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting.However,the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs.This causes a reduction in the predictability and performance of traditional machine learning models.We therefore present an efficient framework that uses an oversampling technique called FJD-OT(Fake Job Description Detection Using Oversampling Techniques)to improve the predictability of detecting fake job descriptions.In the proposed framework,we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module.We then use a bag of words in combination with the term frequency-inverse document frequency(TF-IDF)approach to extract the features from the text data to create the feature dataset in the second module.Next,our framework applies k-fold cross-validation,a commonly used technique to test the effectiveness of machine learning models,that splits the experimental dataset[the Employment Scam Aegean(ESA)dataset in our study]into training and test sets for evaluation.The training set is passed through the third module,an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module.The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.展开更多
Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms.In supervised learning,dealing with the problem of class imbalance is still considered to be a challe...Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms.In supervised learning,dealing with the problem of class imbalance is still considered to be a challenging research problem.Various machine learning techniques are designed to operate on balanced datasets;therefore,the state of the art,different undersampling,over-sampling and hybrid strategies have been proposed to deal with the problem of imbalanced datasets,but highly skewed datasets still pose the problem of generalization and noise generation during resampling.To overcome these problems,this paper proposes amajority clusteringmodel for classification of imbalanced datasets known as MCBC-SMOTE(Majority Clustering for balanced Classification-SMOTE).The model provides a method to convert the problem of binary classification into a multi-class problem.In the proposed algorithm,the number of clusters for themajority class is calculated using the elbow method and the minority class is over-sampled as an average of clustered majority classes to generate a symmetrical class distribution.The proposed technique is cost-effective,reduces the problem of noise generation and successfully disables the imbalances present in between and within classes.The results of the evaluations on diverse real datasets proved to provide better classification results as compared to state of the art existing methodologies based on several performance metrics.展开更多
Recently,Financial Technology(FinTech)has received more attention among financial sectors and researchers to derive effective solutions for any financial institution or firm.Financial crisis prediction(FCP)is an essen...Recently,Financial Technology(FinTech)has received more attention among financial sectors and researchers to derive effective solutions for any financial institution or firm.Financial crisis prediction(FCP)is an essential topic in business sector that finds it useful to identify the financial condition of a financial institution.At the same time,the development of the internet of things(IoT)has altered the mode of human interaction with the physical world.The IoT can be combined with the FCP model to examine the financial data from the users and perform decision making process.This paper presents a novel multi-objective squirrel search optimization algorithm with stacked autoencoder(MOSSA-SAE)model for FCP in IoT environment.The MOSSA-SAE model encompasses different subprocesses namely preprocessing,class imbalance handling,parameter tuning,and classification.Primarily,the MOSSA-SAE model allows the IoT devices such as smartphones,laptops,etc.,to collect the financial details of the users which are then transmitted to the cloud for further analysis.In addition,SMOTE technique is employed to handle class imbalance problems.The goal of MOSSA in SMOTE is to determine the oversampling rate and area of nearest neighbors of SMOTE.Besides,SAE model is utilized as a classification technique to determine the class label of the financial data.At the same time,the MOSSA is applied to appropriately select the‘weights’and‘bias’values of the SAE.An extensive experimental validation process is performed on the benchmark financial dataset and the results are examined under distinct aspects.The experimental values ensured the superior performance of the MOSSA-SAE model on the applied dataset.展开更多
文摘In recent years,the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age.Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting.However,the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs.This causes a reduction in the predictability and performance of traditional machine learning models.We therefore present an efficient framework that uses an oversampling technique called FJD-OT(Fake Job Description Detection Using Oversampling Techniques)to improve the predictability of detecting fake job descriptions.In the proposed framework,we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module.We then use a bag of words in combination with the term frequency-inverse document frequency(TF-IDF)approach to extract the features from the text data to create the feature dataset in the second module.Next,our framework applies k-fold cross-validation,a commonly used technique to test the effectiveness of machine learning models,that splits the experimental dataset[the Employment Scam Aegean(ESA)dataset in our study]into training and test sets for evaluation.The training set is passed through the third module,an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module.The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.
基金This research was supported by Taif University Researchers Supporting Project number(TURSP-2020/254),Taif University,Taif,Saudi Arabia.
文摘Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms.In supervised learning,dealing with the problem of class imbalance is still considered to be a challenging research problem.Various machine learning techniques are designed to operate on balanced datasets;therefore,the state of the art,different undersampling,over-sampling and hybrid strategies have been proposed to deal with the problem of imbalanced datasets,but highly skewed datasets still pose the problem of generalization and noise generation during resampling.To overcome these problems,this paper proposes amajority clusteringmodel for classification of imbalanced datasets known as MCBC-SMOTE(Majority Clustering for balanced Classification-SMOTE).The model provides a method to convert the problem of binary classification into a multi-class problem.In the proposed algorithm,the number of clusters for themajority class is calculated using the elbow method and the minority class is over-sampled as an average of clustered majority classes to generate a symmetrical class distribution.The proposed technique is cost-effective,reduces the problem of noise generation and successfully disables the imbalances present in between and within classes.The results of the evaluations on diverse real datasets proved to provide better classification results as compared to state of the art existing methodologies based on several performance metrics.
文摘Recently,Financial Technology(FinTech)has received more attention among financial sectors and researchers to derive effective solutions for any financial institution or firm.Financial crisis prediction(FCP)is an essential topic in business sector that finds it useful to identify the financial condition of a financial institution.At the same time,the development of the internet of things(IoT)has altered the mode of human interaction with the physical world.The IoT can be combined with the FCP model to examine the financial data from the users and perform decision making process.This paper presents a novel multi-objective squirrel search optimization algorithm with stacked autoencoder(MOSSA-SAE)model for FCP in IoT environment.The MOSSA-SAE model encompasses different subprocesses namely preprocessing,class imbalance handling,parameter tuning,and classification.Primarily,the MOSSA-SAE model allows the IoT devices such as smartphones,laptops,etc.,to collect the financial details of the users which are then transmitted to the cloud for further analysis.In addition,SMOTE technique is employed to handle class imbalance problems.The goal of MOSSA in SMOTE is to determine the oversampling rate and area of nearest neighbors of SMOTE.Besides,SAE model is utilized as a classification technique to determine the class label of the financial data.At the same time,the MOSSA is applied to appropriately select the‘weights’and‘bias’values of the SAE.An extensive experimental validation process is performed on the benchmark financial dataset and the results are examined under distinct aspects.The experimental values ensured the superior performance of the MOSSA-SAE model on the applied dataset.