A dandelion algorithm(DA) is a recently developed intelligent optimization algorithm for function optimization problems. Many of its parameters need to be set by experience in DA,which might not be appropriate for all...A dandelion algorithm(DA) is a recently developed intelligent optimization algorithm for function optimization problems. Many of its parameters need to be set by experience in DA,which might not be appropriate for all optimization problems. A self-adapting and efficient dandelion algorithm is proposed in this work to lower the number of DA's parameters and simplify DA's structure. Only the normal sowing operator is retained;while the other operators are discarded. An adaptive seeding radius strategy is designed for the core dandelion. The results show that the proposed algorithm achieves better performance on the standard test functions with less time consumption than its competitive peers. In addition, the proposed algorithm is applied to feature selection for credit card fraud detection(CCFD), and the results indicate that it can obtain higher classification and detection performance than the-state-of-the-art methods.展开更多
With the popularity of online payment, how to perform creditcard fraud detection more accurately has also become a hot issue. And withthe emergence of the adaptive boosting algorithm (Adaboost), credit cardfraud detec...With the popularity of online payment, how to perform creditcard fraud detection more accurately has also become a hot issue. And withthe emergence of the adaptive boosting algorithm (Adaboost), credit cardfraud detection has started to use this method in large numbers, but thetraditional Adaboost is prone to overfitting in the presence of noisy samples.Therefore, in order to alleviate this phenomenon, this paper proposes a newidea: using the number of consecutive sample misclassifications to determinethe noisy samples, while constructing a penalty factor to reconstruct thesample weight assignment. Firstly, the theoretical analysis shows that thetraditional Adaboost method is overfitting in a noisy training set, which leadsto the degradation of classification accuracy. To this end, the penalty factorconstructed by the number of consecutive misclassifications of samples isused to reconstruct the sample weight assignment to prevent the classifierfrom over-focusing on noisy samples, and its reasonableness is demonstrated.Then, by comparing the penalty strength of the three different penalty factorsproposed in this paper, a more reasonable penalty factor is selected.Meanwhile, in order to make the constructed model more in line with theactual requirements on training time consumption, the Adaboost algorithmwith adaptive weight trimming (AWTAdaboost) is used in this paper, so thepenalty factor-based AWTAdaboost (PF_AWTAdaboost) is finally obtained.Finally, PF_AWTAdaboost is experimentally validated against other traditionalmachine learning algorithms on credit card fraud datasets and otherdatasets. The results show that the PF_AWTAdaboost method has betterperformance, including detection accuracy, model recall and robustness, thanother methods on the credit card fraud dataset. And the PF_AWTAdaboostmethod also shows excellent generalization performance on other datasets.From the experimental results, it is shown that the PF_AWTAdaboost algorithmhas better classification performance.展开更多
Credit Card Fraud Detection(CCFD)is an essential technology for banking institutions to control fraud risks and safeguard their reputation.Class imbalance and insufficient representation of feature data relating to cr...Credit Card Fraud Detection(CCFD)is an essential technology for banking institutions to control fraud risks and safeguard their reputation.Class imbalance and insufficient representation of feature data relating to credit card transactions are two prevalent issues in the current study field of CCFD,which significantly impact classification models’performance.To address these issues,this research proposes a novel CCFD model based on Multifeature Fusion and Generative Adversarial Networks(MFGAN).The MFGAN model consists of two modules:a multi-feature fusion module for integrating static and dynamic behavior data of cardholders into a unified highdimensional feature space,and a balance module based on the generative adversarial network to decrease the class imbalance ratio.The effectiveness of theMFGAN model is validated on two actual credit card datasets.The impacts of different class balance ratios on the performance of the four resamplingmodels are analyzed,and the contribution of the two different modules to the performance of the MFGAN model is investigated via ablation experiments.Experimental results demonstrate that the proposed model does better than state-of-the-art models in terms of recall,F1,and Area Under the Curve(AUC)metrics,which means that the MFGAN model can help banks find more fraudulent transactions and reduce fraud losses.展开更多
Two men were getting ready to leave a pub. 'You can’t livewith them and you can’t live without them, 'one of the mengrumbled. 'That’s the way women are, pal,' agreed the other.'Who said anything...Two men were getting ready to leave a pub. 'You can’t livewith them and you can’t live without them, 'one of the mengrumbled. 'That’s the way women are, pal,' agreed the other.'Who said anything about women?' snarled the first fellow. 'I’mtalking about credit cards. '展开更多
The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud,particularly in credit card transactions.Advanced technologies such as machin...The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud,particularly in credit card transactions.Advanced technologies such as machine learning have been widely employed to enhance the early detection and prevention of losses arising frompotentially fraudulent activities.However,a prevalent approach in existing literature involves the use of extensive data sampling and feature selection algorithms as a precursor to subsequent investigations.While sampling techniques can significantly reduce computational time,the resulting dataset relies on generated data and the accuracy of the pre-processing machine learning models employed.Such datasets often lack true representativeness of realworld data,potentially introducing secondary issues that affect the precision of the results.For instance,undersampling may result in the loss of critical information,while over-sampling can lead to overfitting machine learning models.In this paper,we proposed a classification study of credit card fraud using fundamental machine learning models without the application of any sampling techniques on all the features present in the original dataset.The results indicate that Support Vector Machine(SVM)consistently achieves classification performance exceeding 90%across various evaluation metrics.This discovery serves as a valuable reference for future research,encouraging comparative studies on original dataset without the reliance on sampling techniques.Furthermore,we explore hybrid machine learning techniques,such as ensemble learning constructed based on SVM,K-Nearest Neighbor(KNN)and decision tree,highlighting their potential advancements in the field.The study demonstrates that the proposed machine learning models yield promising results,suggesting that pre-processing the dataset with sampling algorithm or additional machine learning technique may not always be necessary.This research contributes to the field of credit card fraud detection by emphasizing the potential of employing machine learning models directly on original datasets,thereby simplifying the workflow and potentially improving the accuracy and efficiency of fraud detection systems.展开更多
Credit card fraud remains a significant challenge, with financial losses and consumer protection at stake. This study addresses the need for practical, real-time fraud detection methodologies. Using a Kaggle credit ca...Credit card fraud remains a significant challenge, with financial losses and consumer protection at stake. This study addresses the need for practical, real-time fraud detection methodologies. Using a Kaggle credit card dataset, I tackle class imbalance using the Synthetic Minority Oversampling Technique (SMOTE) to enhance modeling efficiency. I compare several machine learning algorithms, including Logistic Regression, Linear Discriminant Analysis, K-nearest Neighbors, Classification and Regression Tree, Naive Bayes, Support Vector, Random Forest, XGBoost, and Light Gradient-Boosting Machine to classify transactions as fraud or genuine. Rigorous evaluation metrics, such as AUC, PRAUC, F1, KS, Recall, and Precision, identify the Random Forest as the best performer in detecting fraudulent activities. The Random Forest model successfully identifies approximately 92% of transactions scoring 90 and above as fraudulent, equating to a detection rate of over 70% for all fraudulent transactions in the test dataset. Moreover, the model captures more than half of the fraud in each bin of the test dataset. SHAP values provide model explainability, with the SHAP summary plot highlighting the global importance of individual features, such as “V12” and “V14”. SHAP force plots offer local interpretability, revealing the impact of specific features on individual predictions. This study demonstrates the potential of machine learning, particularly the Random Forest model, for real-time credit card fraud detection, offering a promising approach to mitigate financial losses and protect consumers.展开更多
Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the verac...Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the veracity of the detection algorithms become critical to the deployment of a model that accurately scores fraudulent transactions taking into account case imbalance, and the cost of identifying a case as genuine when, in fact, the case is a fraudulent transaction. In this paper, a new criterion to judge classification algorithms, which considers the cost of misclassification, is proposed, and several undersampling techniques are compared by this new criterion. At the same time, a weighted support vector machine (SVM) algorithm considering the financial cost of misclassification is introduced, proving to be more practical for credit card fraud detection than traditional methodologies. This weighted SVM uses transaction balances as weights for fraudulent transactions, and a uniformed weight for nonfraudulent transactions. The results show this strategy greatly improve performance of credit card fraud detection.展开更多
基金supported by the Institutional Fund Projects(IFPIP-1481-611-1443)the Key Projects of Natural Science Research in Anhui Higher Education Institutions(2022AH051909)+1 种基金the Provincial Quality Project of Colleges and Universities in Anhui Province(2022sdxx020,2022xqhz044)Bengbu University 2021 High-Level Scientific Research and Cultivation Project(2021pyxm04)。
文摘A dandelion algorithm(DA) is a recently developed intelligent optimization algorithm for function optimization problems. Many of its parameters need to be set by experience in DA,which might not be appropriate for all optimization problems. A self-adapting and efficient dandelion algorithm is proposed in this work to lower the number of DA's parameters and simplify DA's structure. Only the normal sowing operator is retained;while the other operators are discarded. An adaptive seeding radius strategy is designed for the core dandelion. The results show that the proposed algorithm achieves better performance on the standard test functions with less time consumption than its competitive peers. In addition, the proposed algorithm is applied to feature selection for credit card fraud detection(CCFD), and the results indicate that it can obtain higher classification and detection performance than the-state-of-the-art methods.
基金This research was funded by Innovation and Entrepreneurship Training Program for College Students in Hunan Province in 2022(3915).
文摘With the popularity of online payment, how to perform creditcard fraud detection more accurately has also become a hot issue. And withthe emergence of the adaptive boosting algorithm (Adaboost), credit cardfraud detection has started to use this method in large numbers, but thetraditional Adaboost is prone to overfitting in the presence of noisy samples.Therefore, in order to alleviate this phenomenon, this paper proposes a newidea: using the number of consecutive sample misclassifications to determinethe noisy samples, while constructing a penalty factor to reconstruct thesample weight assignment. Firstly, the theoretical analysis shows that thetraditional Adaboost method is overfitting in a noisy training set, which leadsto the degradation of classification accuracy. To this end, the penalty factorconstructed by the number of consecutive misclassifications of samples isused to reconstruct the sample weight assignment to prevent the classifierfrom over-focusing on noisy samples, and its reasonableness is demonstrated.Then, by comparing the penalty strength of the three different penalty factorsproposed in this paper, a more reasonable penalty factor is selected.Meanwhile, in order to make the constructed model more in line with theactual requirements on training time consumption, the Adaboost algorithmwith adaptive weight trimming (AWTAdaboost) is used in this paper, so thepenalty factor-based AWTAdaboost (PF_AWTAdaboost) is finally obtained.Finally, PF_AWTAdaboost is experimentally validated against other traditionalmachine learning algorithms on credit card fraud datasets and otherdatasets. The results show that the PF_AWTAdaboost method has betterperformance, including detection accuracy, model recall and robustness, thanother methods on the credit card fraud dataset. And the PF_AWTAdaboostmethod also shows excellent generalization performance on other datasets.From the experimental results, it is shown that the PF_AWTAdaboost algorithmhas better classification performance.
基金supported by the National Key R&D Program of China(Nos.2022YFB3104103,and 2019QY1406)the National Natural Science Foundation of China(Nos.61732022,61732004,61672020,and 62072131).
文摘Credit Card Fraud Detection(CCFD)is an essential technology for banking institutions to control fraud risks and safeguard their reputation.Class imbalance and insufficient representation of feature data relating to credit card transactions are two prevalent issues in the current study field of CCFD,which significantly impact classification models’performance.To address these issues,this research proposes a novel CCFD model based on Multifeature Fusion and Generative Adversarial Networks(MFGAN).The MFGAN model consists of two modules:a multi-feature fusion module for integrating static and dynamic behavior data of cardholders into a unified highdimensional feature space,and a balance module based on the generative adversarial network to decrease the class imbalance ratio.The effectiveness of theMFGAN model is validated on two actual credit card datasets.The impacts of different class balance ratios on the performance of the four resamplingmodels are analyzed,and the contribution of the two different modules to the performance of the MFGAN model is investigated via ablation experiments.Experimental results demonstrate that the proposed model does better than state-of-the-art models in terms of recall,F1,and Area Under the Curve(AUC)metrics,which means that the MFGAN model can help banks find more fraudulent transactions and reduce fraud losses.
文摘Two men were getting ready to leave a pub. 'You can’t livewith them and you can’t live without them, 'one of the mengrumbled. 'That’s the way women are, pal,' agreed the other.'Who said anything about women?' snarled the first fellow. 'I’mtalking about credit cards. '
文摘The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud,particularly in credit card transactions.Advanced technologies such as machine learning have been widely employed to enhance the early detection and prevention of losses arising frompotentially fraudulent activities.However,a prevalent approach in existing literature involves the use of extensive data sampling and feature selection algorithms as a precursor to subsequent investigations.While sampling techniques can significantly reduce computational time,the resulting dataset relies on generated data and the accuracy of the pre-processing machine learning models employed.Such datasets often lack true representativeness of realworld data,potentially introducing secondary issues that affect the precision of the results.For instance,undersampling may result in the loss of critical information,while over-sampling can lead to overfitting machine learning models.In this paper,we proposed a classification study of credit card fraud using fundamental machine learning models without the application of any sampling techniques on all the features present in the original dataset.The results indicate that Support Vector Machine(SVM)consistently achieves classification performance exceeding 90%across various evaluation metrics.This discovery serves as a valuable reference for future research,encouraging comparative studies on original dataset without the reliance on sampling techniques.Furthermore,we explore hybrid machine learning techniques,such as ensemble learning constructed based on SVM,K-Nearest Neighbor(KNN)and decision tree,highlighting their potential advancements in the field.The study demonstrates that the proposed machine learning models yield promising results,suggesting that pre-processing the dataset with sampling algorithm or additional machine learning technique may not always be necessary.This research contributes to the field of credit card fraud detection by emphasizing the potential of employing machine learning models directly on original datasets,thereby simplifying the workflow and potentially improving the accuracy and efficiency of fraud detection systems.
文摘Credit card fraud remains a significant challenge, with financial losses and consumer protection at stake. This study addresses the need for practical, real-time fraud detection methodologies. Using a Kaggle credit card dataset, I tackle class imbalance using the Synthetic Minority Oversampling Technique (SMOTE) to enhance modeling efficiency. I compare several machine learning algorithms, including Logistic Regression, Linear Discriminant Analysis, K-nearest Neighbors, Classification and Regression Tree, Naive Bayes, Support Vector, Random Forest, XGBoost, and Light Gradient-Boosting Machine to classify transactions as fraud or genuine. Rigorous evaluation metrics, such as AUC, PRAUC, F1, KS, Recall, and Precision, identify the Random Forest as the best performer in detecting fraudulent activities. The Random Forest model successfully identifies approximately 92% of transactions scoring 90 and above as fraudulent, equating to a detection rate of over 70% for all fraudulent transactions in the test dataset. Moreover, the model captures more than half of the fraud in each bin of the test dataset. SHAP values provide model explainability, with the SHAP summary plot highlighting the global importance of individual features, such as “V12” and “V14”. SHAP force plots offer local interpretability, revealing the impact of specific features on individual predictions. This study demonstrates the potential of machine learning, particularly the Random Forest model, for real-time credit card fraud detection, offering a promising approach to mitigate financial losses and protect consumers.
文摘Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the veracity of the detection algorithms become critical to the deployment of a model that accurately scores fraudulent transactions taking into account case imbalance, and the cost of identifying a case as genuine when, in fact, the case is a fraudulent transaction. In this paper, a new criterion to judge classification algorithms, which considers the cost of misclassification, is proposed, and several undersampling techniques are compared by this new criterion. At the same time, a weighted support vector machine (SVM) algorithm considering the financial cost of misclassification is introduced, proving to be more practical for credit card fraud detection than traditional methodologies. This weighted SVM uses transaction balances as weights for fraudulent transactions, and a uniformed weight for nonfraudulent transactions. The results show this strategy greatly improve performance of credit card fraud detection.