A dandelion algorithm(DA) is a recently developed intelligent optimization algorithm for function optimization problems. Many of its parameters need to be set by experience in DA,which might not be appropriate for all...A dandelion algorithm(DA) is a recently developed intelligent optimization algorithm for function optimization problems. Many of its parameters need to be set by experience in DA,which might not be appropriate for all optimization problems. A self-adapting and efficient dandelion algorithm is proposed in this work to lower the number of DA's parameters and simplify DA's structure. Only the normal sowing operator is retained;while the other operators are discarded. An adaptive seeding radius strategy is designed for the core dandelion. The results show that the proposed algorithm achieves better performance on the standard test functions with less time consumption than its competitive peers. In addition, the proposed algorithm is applied to feature selection for credit card fraud detection(CCFD), and the results indicate that it can obtain higher classification and detection performance than the-state-of-the-art methods.展开更多
With the popularity of online payment, how to perform creditcard fraud detection more accurately has also become a hot issue. And withthe emergence of the adaptive boosting algorithm (Adaboost), credit cardfraud detec...With the popularity of online payment, how to perform creditcard fraud detection more accurately has also become a hot issue. And withthe emergence of the adaptive boosting algorithm (Adaboost), credit cardfraud detection has started to use this method in large numbers, but thetraditional Adaboost is prone to overfitting in the presence of noisy samples.Therefore, in order to alleviate this phenomenon, this paper proposes a newidea: using the number of consecutive sample misclassifications to determinethe noisy samples, while constructing a penalty factor to reconstruct thesample weight assignment. Firstly, the theoretical analysis shows that thetraditional Adaboost method is overfitting in a noisy training set, which leadsto the degradation of classification accuracy. To this end, the penalty factorconstructed by the number of consecutive misclassifications of samples isused to reconstruct the sample weight assignment to prevent the classifierfrom over-focusing on noisy samples, and its reasonableness is demonstrated.Then, by comparing the penalty strength of the three different penalty factorsproposed in this paper, a more reasonable penalty factor is selected.Meanwhile, in order to make the constructed model more in line with theactual requirements on training time consumption, the Adaboost algorithmwith adaptive weight trimming (AWTAdaboost) is used in this paper, so thepenalty factor-based AWTAdaboost (PF_AWTAdaboost) is finally obtained.Finally, PF_AWTAdaboost is experimentally validated against other traditionalmachine learning algorithms on credit card fraud datasets and otherdatasets. The results show that the PF_AWTAdaboost method has betterperformance, including detection accuracy, model recall and robustness, thanother methods on the credit card fraud dataset. And the PF_AWTAdaboostmethod also shows excellent generalization performance on other datasets.From the experimental results, it is shown that the PF_AWTAdaboost algorithmhas better classification performance.展开更多
Credit Card Fraud Detection(CCFD)is an essential technology for banking institutions to control fraud risks and safeguard their reputation.Class imbalance and insufficient representation of feature data relating to cr...Credit Card Fraud Detection(CCFD)is an essential technology for banking institutions to control fraud risks and safeguard their reputation.Class imbalance and insufficient representation of feature data relating to credit card transactions are two prevalent issues in the current study field of CCFD,which significantly impact classification models’performance.To address these issues,this research proposes a novel CCFD model based on Multifeature Fusion and Generative Adversarial Networks(MFGAN).The MFGAN model consists of two modules:a multi-feature fusion module for integrating static and dynamic behavior data of cardholders into a unified highdimensional feature space,and a balance module based on the generative adversarial network to decrease the class imbalance ratio.The effectiveness of theMFGAN model is validated on two actual credit card datasets.The impacts of different class balance ratios on the performance of the four resamplingmodels are analyzed,and the contribution of the two different modules to the performance of the MFGAN model is investigated via ablation experiments.Experimental results demonstrate that the proposed model does better than state-of-the-art models in terms of recall,F1,and Area Under the Curve(AUC)metrics,which means that the MFGAN model can help banks find more fraudulent transactions and reduce fraud losses.展开更多
The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud,particularly in credit card transactions.Advanced technologies such as machin...The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud,particularly in credit card transactions.Advanced technologies such as machine learning have been widely employed to enhance the early detection and prevention of losses arising frompotentially fraudulent activities.However,a prevalent approach in existing literature involves the use of extensive data sampling and feature selection algorithms as a precursor to subsequent investigations.While sampling techniques can significantly reduce computational time,the resulting dataset relies on generated data and the accuracy of the pre-processing machine learning models employed.Such datasets often lack true representativeness of realworld data,potentially introducing secondary issues that affect the precision of the results.For instance,undersampling may result in the loss of critical information,while over-sampling can lead to overfitting machine learning models.In this paper,we proposed a classification study of credit card fraud using fundamental machine learning models without the application of any sampling techniques on all the features present in the original dataset.The results indicate that Support Vector Machine(SVM)consistently achieves classification performance exceeding 90%across various evaluation metrics.This discovery serves as a valuable reference for future research,encouraging comparative studies on original dataset without the reliance on sampling techniques.Furthermore,we explore hybrid machine learning techniques,such as ensemble learning constructed based on SVM,K-Nearest Neighbor(KNN)and decision tree,highlighting their potential advancements in the field.The study demonstrates that the proposed machine learning models yield promising results,suggesting that pre-processing the dataset with sampling algorithm or additional machine learning technique may not always be necessary.This research contributes to the field of credit card fraud detection by emphasizing the potential of employing machine learning models directly on original datasets,thereby simplifying the workflow and potentially improving the accuracy and efficiency of fraud detection systems.展开更多
Credit card fraud remains a significant challenge, with financial losses and consumer protection at stake. This study addresses the need for practical, real-time fraud detection methodologies. Using a Kaggle credit ca...Credit card fraud remains a significant challenge, with financial losses and consumer protection at stake. This study addresses the need for practical, real-time fraud detection methodologies. Using a Kaggle credit card dataset, I tackle class imbalance using the Synthetic Minority Oversampling Technique (SMOTE) to enhance modeling efficiency. I compare several machine learning algorithms, including Logistic Regression, Linear Discriminant Analysis, K-nearest Neighbors, Classification and Regression Tree, Naive Bayes, Support Vector, Random Forest, XGBoost, and Light Gradient-Boosting Machine to classify transactions as fraud or genuine. Rigorous evaluation metrics, such as AUC, PRAUC, F1, KS, Recall, and Precision, identify the Random Forest as the best performer in detecting fraudulent activities. The Random Forest model successfully identifies approximately 92% of transactions scoring 90 and above as fraudulent, equating to a detection rate of over 70% for all fraudulent transactions in the test dataset. Moreover, the model captures more than half of the fraud in each bin of the test dataset. SHAP values provide model explainability, with the SHAP summary plot highlighting the global importance of individual features, such as “V12” and “V14”. SHAP force plots offer local interpretability, revealing the impact of specific features on individual predictions. This study demonstrates the potential of machine learning, particularly the Random Forest model, for real-time credit card fraud detection, offering a promising approach to mitigate financial losses and protect consumers.展开更多
Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the verac...Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the veracity of the detection algorithms become critical to the deployment of a model that accurately scores fraudulent transactions taking into account case imbalance, and the cost of identifying a case as genuine when, in fact, the case is a fraudulent transaction. In this paper, a new criterion to judge classification algorithms, which considers the cost of misclassification, is proposed, and several undersampling techniques are compared by this new criterion. At the same time, a weighted support vector machine (SVM) algorithm considering the financial cost of misclassification is introduced, proving to be more practical for credit card fraud detection than traditional methodologies. This weighted SVM uses transaction balances as weights for fraudulent transactions, and a uniformed weight for nonfraudulent transactions. The results show this strategy greatly improve performance of credit card fraud detection.展开更多
In recent years,the rapid development of e-commerce exposes great vulnerabilities in online transactions for fraudsters to exploit.Credit card transactions take a salient role in nowadays’online transactions for its ...In recent years,the rapid development of e-commerce exposes great vulnerabilities in online transactions for fraudsters to exploit.Credit card transactions take a salient role in nowadays’online transactions for its obvious advantages including discounts and earning credit card points.So credit card fraudulence has become a target of concern.In order to deal with the situation,credit card fraud detection based on machine learning is been studied recently.Yet,it is difficult to detect fraudulent transactions due to data imbalance(normal and fraudulent transactions),for which Smote algorithm is proposed in order to resolve data imbalance.The assessment of Light Gradient Boosting Machine model which proposed in the paper depends much on datasets collected from clients’daily transactions.Besides,to prove the new model’s superiority in detecting credit card fraudulence,Light Gradient Boosting Machine model is compared with Random Forest and Gradient Boosting Machine algorithm in the experiment.The results indicate that Light Gradient Boosting Machine model has a good performance.The experiment in credit card fraud detection based on Light Gradient Boosting Machine model achieved a total recall rate of 99%in real dataset and fast feedback,which proves the new model’s efficiency in detecting credit card fraudulence.展开更多
Credit card fraud is a wide-ranging issue for financial institutions, involving theft and fraud committed using a payment card. In this paper, we explore the application of linear and nonlinear statistical modeling an...Credit card fraud is a wide-ranging issue for financial institutions, involving theft and fraud committed using a payment card. In this paper, we explore the application of linear and nonlinear statistical modeling and machine learning models on real credit card transaction data. The models built are supervised fraud models that attempt to identify which transactions are most likely fraudulent. We discuss the processes of data exploration, data cleaning, variable creation, feature selection, model algorithms, and results. Five different supervised models are explored and compared including logistic regression, neural networks, random forest, boosted tree and support vector machines. The boosted tree model shows the best fraud detection result (FDR = 49.83%) for this particular data set. The resulting model can be utilized in a credit card fraud detection system. A similar model development process can be performed in related business domains such as insurance and telecommunications, to avoid or detect fraudulent activity.展开更多
基金supported by the Institutional Fund Projects(IFPIP-1481-611-1443)the Key Projects of Natural Science Research in Anhui Higher Education Institutions(2022AH051909)+1 种基金the Provincial Quality Project of Colleges and Universities in Anhui Province(2022sdxx020,2022xqhz044)Bengbu University 2021 High-Level Scientific Research and Cultivation Project(2021pyxm04)。
文摘A dandelion algorithm(DA) is a recently developed intelligent optimization algorithm for function optimization problems. Many of its parameters need to be set by experience in DA,which might not be appropriate for all optimization problems. A self-adapting and efficient dandelion algorithm is proposed in this work to lower the number of DA's parameters and simplify DA's structure. Only the normal sowing operator is retained;while the other operators are discarded. An adaptive seeding radius strategy is designed for the core dandelion. The results show that the proposed algorithm achieves better performance on the standard test functions with less time consumption than its competitive peers. In addition, the proposed algorithm is applied to feature selection for credit card fraud detection(CCFD), and the results indicate that it can obtain higher classification and detection performance than the-state-of-the-art methods.
基金This research was funded by Innovation and Entrepreneurship Training Program for College Students in Hunan Province in 2022(3915).
文摘With the popularity of online payment, how to perform creditcard fraud detection more accurately has also become a hot issue. And withthe emergence of the adaptive boosting algorithm (Adaboost), credit cardfraud detection has started to use this method in large numbers, but thetraditional Adaboost is prone to overfitting in the presence of noisy samples.Therefore, in order to alleviate this phenomenon, this paper proposes a newidea: using the number of consecutive sample misclassifications to determinethe noisy samples, while constructing a penalty factor to reconstruct thesample weight assignment. Firstly, the theoretical analysis shows that thetraditional Adaboost method is overfitting in a noisy training set, which leadsto the degradation of classification accuracy. To this end, the penalty factorconstructed by the number of consecutive misclassifications of samples isused to reconstruct the sample weight assignment to prevent the classifierfrom over-focusing on noisy samples, and its reasonableness is demonstrated.Then, by comparing the penalty strength of the three different penalty factorsproposed in this paper, a more reasonable penalty factor is selected.Meanwhile, in order to make the constructed model more in line with theactual requirements on training time consumption, the Adaboost algorithmwith adaptive weight trimming (AWTAdaboost) is used in this paper, so thepenalty factor-based AWTAdaboost (PF_AWTAdaboost) is finally obtained.Finally, PF_AWTAdaboost is experimentally validated against other traditionalmachine learning algorithms on credit card fraud datasets and otherdatasets. The results show that the PF_AWTAdaboost method has betterperformance, including detection accuracy, model recall and robustness, thanother methods on the credit card fraud dataset. And the PF_AWTAdaboostmethod also shows excellent generalization performance on other datasets.From the experimental results, it is shown that the PF_AWTAdaboost algorithmhas better classification performance.
基金supported by the National Key R&D Program of China(Nos.2022YFB3104103,and 2019QY1406)the National Natural Science Foundation of China(Nos.61732022,61732004,61672020,and 62072131).
文摘Credit Card Fraud Detection(CCFD)is an essential technology for banking institutions to control fraud risks and safeguard their reputation.Class imbalance and insufficient representation of feature data relating to credit card transactions are two prevalent issues in the current study field of CCFD,which significantly impact classification models’performance.To address these issues,this research proposes a novel CCFD model based on Multifeature Fusion and Generative Adversarial Networks(MFGAN).The MFGAN model consists of two modules:a multi-feature fusion module for integrating static and dynamic behavior data of cardholders into a unified highdimensional feature space,and a balance module based on the generative adversarial network to decrease the class imbalance ratio.The effectiveness of theMFGAN model is validated on two actual credit card datasets.The impacts of different class balance ratios on the performance of the four resamplingmodels are analyzed,and the contribution of the two different modules to the performance of the MFGAN model is investigated via ablation experiments.Experimental results demonstrate that the proposed model does better than state-of-the-art models in terms of recall,F1,and Area Under the Curve(AUC)metrics,which means that the MFGAN model can help banks find more fraudulent transactions and reduce fraud losses.
文摘The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud,particularly in credit card transactions.Advanced technologies such as machine learning have been widely employed to enhance the early detection and prevention of losses arising frompotentially fraudulent activities.However,a prevalent approach in existing literature involves the use of extensive data sampling and feature selection algorithms as a precursor to subsequent investigations.While sampling techniques can significantly reduce computational time,the resulting dataset relies on generated data and the accuracy of the pre-processing machine learning models employed.Such datasets often lack true representativeness of realworld data,potentially introducing secondary issues that affect the precision of the results.For instance,undersampling may result in the loss of critical information,while over-sampling can lead to overfitting machine learning models.In this paper,we proposed a classification study of credit card fraud using fundamental machine learning models without the application of any sampling techniques on all the features present in the original dataset.The results indicate that Support Vector Machine(SVM)consistently achieves classification performance exceeding 90%across various evaluation metrics.This discovery serves as a valuable reference for future research,encouraging comparative studies on original dataset without the reliance on sampling techniques.Furthermore,we explore hybrid machine learning techniques,such as ensemble learning constructed based on SVM,K-Nearest Neighbor(KNN)and decision tree,highlighting their potential advancements in the field.The study demonstrates that the proposed machine learning models yield promising results,suggesting that pre-processing the dataset with sampling algorithm or additional machine learning technique may not always be necessary.This research contributes to the field of credit card fraud detection by emphasizing the potential of employing machine learning models directly on original datasets,thereby simplifying the workflow and potentially improving the accuracy and efficiency of fraud detection systems.
文摘Credit card fraud remains a significant challenge, with financial losses and consumer protection at stake. This study addresses the need for practical, real-time fraud detection methodologies. Using a Kaggle credit card dataset, I tackle class imbalance using the Synthetic Minority Oversampling Technique (SMOTE) to enhance modeling efficiency. I compare several machine learning algorithms, including Logistic Regression, Linear Discriminant Analysis, K-nearest Neighbors, Classification and Regression Tree, Naive Bayes, Support Vector, Random Forest, XGBoost, and Light Gradient-Boosting Machine to classify transactions as fraud or genuine. Rigorous evaluation metrics, such as AUC, PRAUC, F1, KS, Recall, and Precision, identify the Random Forest as the best performer in detecting fraudulent activities. The Random Forest model successfully identifies approximately 92% of transactions scoring 90 and above as fraudulent, equating to a detection rate of over 70% for all fraudulent transactions in the test dataset. Moreover, the model captures more than half of the fraud in each bin of the test dataset. SHAP values provide model explainability, with the SHAP summary plot highlighting the global importance of individual features, such as “V12” and “V14”. SHAP force plots offer local interpretability, revealing the impact of specific features on individual predictions. This study demonstrates the potential of machine learning, particularly the Random Forest model, for real-time credit card fraud detection, offering a promising approach to mitigate financial losses and protect consumers.
文摘Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the veracity of the detection algorithms become critical to the deployment of a model that accurately scores fraudulent transactions taking into account case imbalance, and the cost of identifying a case as genuine when, in fact, the case is a fraudulent transaction. In this paper, a new criterion to judge classification algorithms, which considers the cost of misclassification, is proposed, and several undersampling techniques are compared by this new criterion. At the same time, a weighted support vector machine (SVM) algorithm considering the financial cost of misclassification is introduced, proving to be more practical for credit card fraud detection than traditional methodologies. This weighted SVM uses transaction balances as weights for fraudulent transactions, and a uniformed weight for nonfraudulent transactions. The results show this strategy greatly improve performance of credit card fraud detection.
文摘In recent years,the rapid development of e-commerce exposes great vulnerabilities in online transactions for fraudsters to exploit.Credit card transactions take a salient role in nowadays’online transactions for its obvious advantages including discounts and earning credit card points.So credit card fraudulence has become a target of concern.In order to deal with the situation,credit card fraud detection based on machine learning is been studied recently.Yet,it is difficult to detect fraudulent transactions due to data imbalance(normal and fraudulent transactions),for which Smote algorithm is proposed in order to resolve data imbalance.The assessment of Light Gradient Boosting Machine model which proposed in the paper depends much on datasets collected from clients’daily transactions.Besides,to prove the new model’s superiority in detecting credit card fraudulence,Light Gradient Boosting Machine model is compared with Random Forest and Gradient Boosting Machine algorithm in the experiment.The results indicate that Light Gradient Boosting Machine model has a good performance.The experiment in credit card fraud detection based on Light Gradient Boosting Machine model achieved a total recall rate of 99%in real dataset and fast feedback,which proves the new model’s efficiency in detecting credit card fraudulence.
文摘Credit card fraud is a wide-ranging issue for financial institutions, involving theft and fraud committed using a payment card. In this paper, we explore the application of linear and nonlinear statistical modeling and machine learning models on real credit card transaction data. The models built are supervised fraud models that attempt to identify which transactions are most likely fraudulent. We discuss the processes of data exploration, data cleaning, variable creation, feature selection, model algorithms, and results. Five different supervised models are explored and compared including logistic regression, neural networks, random forest, boosted tree and support vector machines. The boosted tree model shows the best fraud detection result (FDR = 49.83%) for this particular data set. The resulting model can be utilized in a credit card fraud detection system. A similar model development process can be performed in related business domains such as insurance and telecommunications, to avoid or detect fraudulent activity.