In recent years,the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age.Identifying fraud in job ...In recent years,the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age.Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting.However,the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs.This causes a reduction in the predictability and performance of traditional machine learning models.We therefore present an efficient framework that uses an oversampling technique called FJD-OT(Fake Job Description Detection Using Oversampling Techniques)to improve the predictability of detecting fake job descriptions.In the proposed framework,we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module.We then use a bag of words in combination with the term frequency-inverse document frequency(TF-IDF)approach to extract the features from the text data to create the feature dataset in the second module.Next,our framework applies k-fold cross-validation,a commonly used technique to test the effectiveness of machine learning models,that splits the experimental dataset[the Employment Scam Aegean(ESA)dataset in our study]into training and test sets for evaluation.The training set is passed through the third module,an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module.The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.展开更多
文摘In recent years,the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age.Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting.However,the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs.This causes a reduction in the predictability and performance of traditional machine learning models.We therefore present an efficient framework that uses an oversampling technique called FJD-OT(Fake Job Description Detection Using Oversampling Techniques)to improve the predictability of detecting fake job descriptions.In the proposed framework,we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module.We then use a bag of words in combination with the term frequency-inverse document frequency(TF-IDF)approach to extract the features from the text data to create the feature dataset in the second module.Next,our framework applies k-fold cross-validation,a commonly used technique to test the effectiveness of machine learning models,that splits the experimental dataset[the Employment Scam Aegean(ESA)dataset in our study]into training and test sets for evaluation.The training set is passed through the third module,an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module.The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.