A dandelion algorithm(DA) is a recently developed intelligent optimization algorithm for function optimization problems. Many of its parameters need to be set by experience in DA,which might not be appropriate for all...A dandelion algorithm(DA) is a recently developed intelligent optimization algorithm for function optimization problems. Many of its parameters need to be set by experience in DA,which might not be appropriate for all optimization problems. A self-adapting and efficient dandelion algorithm is proposed in this work to lower the number of DA's parameters and simplify DA's structure. Only the normal sowing operator is retained;while the other operators are discarded. An adaptive seeding radius strategy is designed for the core dandelion. The results show that the proposed algorithm achieves better performance on the standard test functions with less time consumption than its competitive peers. In addition, the proposed algorithm is applied to feature selection for credit card fraud detection(CCFD), and the results indicate that it can obtain higher classification and detection performance than the-state-of-the-art methods.展开更多
Credit Card Fraud Detection(CCFD)is an essential technology for banking institutions to control fraud risks and safeguard their reputation.Class imbalance and insufficient representation of feature data relating to cr...Credit Card Fraud Detection(CCFD)is an essential technology for banking institutions to control fraud risks and safeguard their reputation.Class imbalance and insufficient representation of feature data relating to credit card transactions are two prevalent issues in the current study field of CCFD,which significantly impact classification models’performance.To address these issues,this research proposes a novel CCFD model based on Multifeature Fusion and Generative Adversarial Networks(MFGAN).The MFGAN model consists of two modules:a multi-feature fusion module for integrating static and dynamic behavior data of cardholders into a unified highdimensional feature space,and a balance module based on the generative adversarial network to decrease the class imbalance ratio.The effectiveness of theMFGAN model is validated on two actual credit card datasets.The impacts of different class balance ratios on the performance of the four resamplingmodels are analyzed,and the contribution of the two different modules to the performance of the MFGAN model is investigated via ablation experiments.Experimental results demonstrate that the proposed model does better than state-of-the-art models in terms of recall,F1,and Area Under the Curve(AUC)metrics,which means that the MFGAN model can help banks find more fraudulent transactions and reduce fraud losses.展开更多
With the popularity of online payment, how to perform creditcard fraud detection more accurately has also become a hot issue. And withthe emergence of the adaptive boosting algorithm (Adaboost), credit cardfraud detec...With the popularity of online payment, how to perform creditcard fraud detection more accurately has also become a hot issue. And withthe emergence of the adaptive boosting algorithm (Adaboost), credit cardfraud detection has started to use this method in large numbers, but thetraditional Adaboost is prone to overfitting in the presence of noisy samples.Therefore, in order to alleviate this phenomenon, this paper proposes a newidea: using the number of consecutive sample misclassifications to determinethe noisy samples, while constructing a penalty factor to reconstruct thesample weight assignment. Firstly, the theoretical analysis shows that thetraditional Adaboost method is overfitting in a noisy training set, which leadsto the degradation of classification accuracy. To this end, the penalty factorconstructed by the number of consecutive misclassifications of samples isused to reconstruct the sample weight assignment to prevent the classifierfrom over-focusing on noisy samples, and its reasonableness is demonstrated.Then, by comparing the penalty strength of the three different penalty factorsproposed in this paper, a more reasonable penalty factor is selected.Meanwhile, in order to make the constructed model more in line with theactual requirements on training time consumption, the Adaboost algorithmwith adaptive weight trimming (AWTAdaboost) is used in this paper, so thepenalty factor-based AWTAdaboost (PF_AWTAdaboost) is finally obtained.Finally, PF_AWTAdaboost is experimentally validated against other traditionalmachine learning algorithms on credit card fraud datasets and otherdatasets. The results show that the PF_AWTAdaboost method has betterperformance, including detection accuracy, model recall and robustness, thanother methods on the credit card fraud dataset. And the PF_AWTAdaboostmethod also shows excellent generalization performance on other datasets.From the experimental results, it is shown that the PF_AWTAdaboost algorithmhas better classification performance.展开更多
The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud,particularly in credit card transactions.Advanced technologies such as machin...The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud,particularly in credit card transactions.Advanced technologies such as machine learning have been widely employed to enhance the early detection and prevention of losses arising frompotentially fraudulent activities.However,a prevalent approach in existing literature involves the use of extensive data sampling and feature selection algorithms as a precursor to subsequent investigations.While sampling techniques can significantly reduce computational time,the resulting dataset relies on generated data and the accuracy of the pre-processing machine learning models employed.Such datasets often lack true representativeness of realworld data,potentially introducing secondary issues that affect the precision of the results.For instance,undersampling may result in the loss of critical information,while over-sampling can lead to overfitting machine learning models.In this paper,we proposed a classification study of credit card fraud using fundamental machine learning models without the application of any sampling techniques on all the features present in the original dataset.The results indicate that Support Vector Machine(SVM)consistently achieves classification performance exceeding 90%across various evaluation metrics.This discovery serves as a valuable reference for future research,encouraging comparative studies on original dataset without the reliance on sampling techniques.Furthermore,we explore hybrid machine learning techniques,such as ensemble learning constructed based on SVM,K-Nearest Neighbor(KNN)and decision tree,highlighting their potential advancements in the field.The study demonstrates that the proposed machine learning models yield promising results,suggesting that pre-processing the dataset with sampling algorithm or additional machine learning technique may not always be necessary.This research contributes to the field of credit card fraud detection by emphasizing the potential of employing machine learning models directly on original datasets,thereby simplifying the workflow and potentially improving the accuracy and efficiency of fraud detection systems.展开更多
Credit card fraud remains a significant challenge, with financial losses and consumer protection at stake. This study addresses the need for practical, real-time fraud detection methodologies. Using a Kaggle credit ca...Credit card fraud remains a significant challenge, with financial losses and consumer protection at stake. This study addresses the need for practical, real-time fraud detection methodologies. Using a Kaggle credit card dataset, I tackle class imbalance using the Synthetic Minority Oversampling Technique (SMOTE) to enhance modeling efficiency. I compare several machine learning algorithms, including Logistic Regression, Linear Discriminant Analysis, K-nearest Neighbors, Classification and Regression Tree, Naive Bayes, Support Vector, Random Forest, XGBoost, and Light Gradient-Boosting Machine to classify transactions as fraud or genuine. Rigorous evaluation metrics, such as AUC, PRAUC, F1, KS, Recall, and Precision, identify the Random Forest as the best performer in detecting fraudulent activities. The Random Forest model successfully identifies approximately 92% of transactions scoring 90 and above as fraudulent, equating to a detection rate of over 70% for all fraudulent transactions in the test dataset. Moreover, the model captures more than half of the fraud in each bin of the test dataset. SHAP values provide model explainability, with the SHAP summary plot highlighting the global importance of individual features, such as “V12” and “V14”. SHAP force plots offer local interpretability, revealing the impact of specific features on individual predictions. This study demonstrates the potential of machine learning, particularly the Random Forest model, for real-time credit card fraud detection, offering a promising approach to mitigate financial losses and protect consumers.展开更多
In recent years,the rapid development of e-commerce exposes great vulnerabilities in online transactions for fraudsters to exploit.Credit card transactions take a salient role in nowadays’online transactions for its ...In recent years,the rapid development of e-commerce exposes great vulnerabilities in online transactions for fraudsters to exploit.Credit card transactions take a salient role in nowadays’online transactions for its obvious advantages including discounts and earning credit card points.So credit card fraudulence has become a target of concern.In order to deal with the situation,credit card fraud detection based on machine learning is been studied recently.Yet,it is difficult to detect fraudulent transactions due to data imbalance(normal and fraudulent transactions),for which Smote algorithm is proposed in order to resolve data imbalance.The assessment of Light Gradient Boosting Machine model which proposed in the paper depends much on datasets collected from clients’daily transactions.Besides,to prove the new model’s superiority in detecting credit card fraudulence,Light Gradient Boosting Machine model is compared with Random Forest and Gradient Boosting Machine algorithm in the experiment.The results indicate that Light Gradient Boosting Machine model has a good performance.The experiment in credit card fraud detection based on Light Gradient Boosting Machine model achieved a total recall rate of 99%in real dataset and fast feedback,which proves the new model’s efficiency in detecting credit card fraudulence.展开更多
Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the verac...Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the veracity of the detection algorithms become critical to the deployment of a model that accurately scores fraudulent transactions taking into account case imbalance, and the cost of identifying a case as genuine when, in fact, the case is a fraudulent transaction. In this paper, a new criterion to judge classification algorithms, which considers the cost of misclassification, is proposed, and several undersampling techniques are compared by this new criterion. At the same time, a weighted support vector machine (SVM) algorithm considering the financial cost of misclassification is introduced, proving to be more practical for credit card fraud detection than traditional methodologies. This weighted SVM uses transaction balances as weights for fraudulent transactions, and a uniformed weight for nonfraudulent transactions. The results show this strategy greatly improve performance of credit card fraud detection.展开更多
Since 1995, major domestic commercial banks are beginning to have a variety of credit cards issued. However, at present, China's relatively low profitability of the credit card business, it accounts for a smaller pro...Since 1995, major domestic commercial banks are beginning to have a variety of credit cards issued. However, at present, China's relatively low profitability of the credit card business, it accounts for a smaller proportion of total bank income. By means of credit card revenue/cost structure analysis, the authors found spending and overdraft balances affecting credit card business, an important factor in profitability. At the same time, combined with a commercial bank's existing statistical data, using SPSS software correlation and regression analysis, the authors found that the key to improve the bank card revenue is to raise China's commercial banks, credit card revolving credit utilization, and expand the scale of overdraft balances.展开更多
Frauds don’t follow any recurring patterns.They require the use of unsupervised learning since their behaviour is continually changing.Fraud-sters have access to the most recent technology,which gives them the abilit...Frauds don’t follow any recurring patterns.They require the use of unsupervised learning since their behaviour is continually changing.Fraud-sters have access to the most recent technology,which gives them the ability to defraud people through online transactions.Fraudsters make assumptions about consumers’routine behaviour,and fraud develops swiftly.Unsupervised learning must be used by fraud detection systems to recognize online payments since some fraudsters start out using online channels before moving on to other techniques.Building a deep convolutional neural network model to identify anomalies from conventional competitive swarm optimization pat-terns with a focus on fraud situations that cannot be identified using historical data or supervised learning is the aim of this paper Artificial Bee Colony(ABC).Using real-time data and other datasets that are readily available,the ABC-Recurrent Neural Network(RNN)categorizes fraud behaviour and compares it to the current algorithms.When compared to the current approach,the findings demonstrate that the accuracy is high and the training error is minimal in ABC_RNN.In this paper,we measure the Accuracy,F1 score,Mean Square Error(MSE)and Mean Absolute Error(MAE).Our system achieves 97%accuracy,92%precision rate and F1 score 97%.Also we compare the simulation results with existing methods.展开更多
Purpose–The purpose of this paper is to generate customer clusters using self-organizing map(SOM)approach,a machine learning technique with a big data set of credit card consumptions.The authors aim to use the consum...Purpose–The purpose of this paper is to generate customer clusters using self-organizing map(SOM)approach,a machine learning technique with a big data set of credit card consumptions.The authors aim to use the consumption patterns of the customers in a period of three months deducted from the credit card transactions,specifically the consumption categories(e.g.food,entertainment,etc.).Design/methodology/approach–The authors use a big data set of almost 40,000 credit card transactions to cluster customers.To deal with the size of the data set and the eliminated the required parametric assumptions the authors use a machine learning technique,SOMs.The variables used are grouped into three as demographical variables,categorical consumption variables and summary consumption variables.The variables are first converted to factors using principal component analysis.Then,the number of clusters is specified by k-means clustering trials.Then,clustering with SOM is conducted by only including the demographical variables and allvariables.Then,a comparisonis made and the significance of the variablesis examined by analysis of variance.Findings–The appropriate number of clusters is found to be 8 using k-means clusters.Then,the differences in categorical consumption levels are investigated between the clusters.However,they have been found to be insignificant,whereas the summary consumption variables are found to be significant between the clusters,as well as the demographical variables.Originality/value–The originality of the study is to incorporate the credit card consumption variables of customers to cluster the bank customers.The authors use a big data set and dealt with it with a machine learning technique to deduct the consumption patterns to generate the clusters.Credit card transactions generate a vast amount of data to deduce valuable information.It is mainly used to detect fraud in the literature.To the best of the authors’knowledge,consumption patterns obtained from credit card transaction are first used for clustering the customers in this study.展开更多
基金supported by the Institutional Fund Projects(IFPIP-1481-611-1443)the Key Projects of Natural Science Research in Anhui Higher Education Institutions(2022AH051909)+1 种基金the Provincial Quality Project of Colleges and Universities in Anhui Province(2022sdxx020,2022xqhz044)Bengbu University 2021 High-Level Scientific Research and Cultivation Project(2021pyxm04)。
文摘A dandelion algorithm(DA) is a recently developed intelligent optimization algorithm for function optimization problems. Many of its parameters need to be set by experience in DA,which might not be appropriate for all optimization problems. A self-adapting and efficient dandelion algorithm is proposed in this work to lower the number of DA's parameters and simplify DA's structure. Only the normal sowing operator is retained;while the other operators are discarded. An adaptive seeding radius strategy is designed for the core dandelion. The results show that the proposed algorithm achieves better performance on the standard test functions with less time consumption than its competitive peers. In addition, the proposed algorithm is applied to feature selection for credit card fraud detection(CCFD), and the results indicate that it can obtain higher classification and detection performance than the-state-of-the-art methods.
基金supported by the National Key R&D Program of China(Nos.2022YFB3104103,and 2019QY1406)the National Natural Science Foundation of China(Nos.61732022,61732004,61672020,and 62072131).
文摘Credit Card Fraud Detection(CCFD)is an essential technology for banking institutions to control fraud risks and safeguard their reputation.Class imbalance and insufficient representation of feature data relating to credit card transactions are two prevalent issues in the current study field of CCFD,which significantly impact classification models’performance.To address these issues,this research proposes a novel CCFD model based on Multifeature Fusion and Generative Adversarial Networks(MFGAN).The MFGAN model consists of two modules:a multi-feature fusion module for integrating static and dynamic behavior data of cardholders into a unified highdimensional feature space,and a balance module based on the generative adversarial network to decrease the class imbalance ratio.The effectiveness of theMFGAN model is validated on two actual credit card datasets.The impacts of different class balance ratios on the performance of the four resamplingmodels are analyzed,and the contribution of the two different modules to the performance of the MFGAN model is investigated via ablation experiments.Experimental results demonstrate that the proposed model does better than state-of-the-art models in terms of recall,F1,and Area Under the Curve(AUC)metrics,which means that the MFGAN model can help banks find more fraudulent transactions and reduce fraud losses.
基金This research was funded by Innovation and Entrepreneurship Training Program for College Students in Hunan Province in 2022(3915).
文摘With the popularity of online payment, how to perform creditcard fraud detection more accurately has also become a hot issue. And withthe emergence of the adaptive boosting algorithm (Adaboost), credit cardfraud detection has started to use this method in large numbers, but thetraditional Adaboost is prone to overfitting in the presence of noisy samples.Therefore, in order to alleviate this phenomenon, this paper proposes a newidea: using the number of consecutive sample misclassifications to determinethe noisy samples, while constructing a penalty factor to reconstruct thesample weight assignment. Firstly, the theoretical analysis shows that thetraditional Adaboost method is overfitting in a noisy training set, which leadsto the degradation of classification accuracy. To this end, the penalty factorconstructed by the number of consecutive misclassifications of samples isused to reconstruct the sample weight assignment to prevent the classifierfrom over-focusing on noisy samples, and its reasonableness is demonstrated.Then, by comparing the penalty strength of the three different penalty factorsproposed in this paper, a more reasonable penalty factor is selected.Meanwhile, in order to make the constructed model more in line with theactual requirements on training time consumption, the Adaboost algorithmwith adaptive weight trimming (AWTAdaboost) is used in this paper, so thepenalty factor-based AWTAdaboost (PF_AWTAdaboost) is finally obtained.Finally, PF_AWTAdaboost is experimentally validated against other traditionalmachine learning algorithms on credit card fraud datasets and otherdatasets. The results show that the PF_AWTAdaboost method has betterperformance, including detection accuracy, model recall and robustness, thanother methods on the credit card fraud dataset. And the PF_AWTAdaboostmethod also shows excellent generalization performance on other datasets.From the experimental results, it is shown that the PF_AWTAdaboost algorithmhas better classification performance.
文摘The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud,particularly in credit card transactions.Advanced technologies such as machine learning have been widely employed to enhance the early detection and prevention of losses arising frompotentially fraudulent activities.However,a prevalent approach in existing literature involves the use of extensive data sampling and feature selection algorithms as a precursor to subsequent investigations.While sampling techniques can significantly reduce computational time,the resulting dataset relies on generated data and the accuracy of the pre-processing machine learning models employed.Such datasets often lack true representativeness of realworld data,potentially introducing secondary issues that affect the precision of the results.For instance,undersampling may result in the loss of critical information,while over-sampling can lead to overfitting machine learning models.In this paper,we proposed a classification study of credit card fraud using fundamental machine learning models without the application of any sampling techniques on all the features present in the original dataset.The results indicate that Support Vector Machine(SVM)consistently achieves classification performance exceeding 90%across various evaluation metrics.This discovery serves as a valuable reference for future research,encouraging comparative studies on original dataset without the reliance on sampling techniques.Furthermore,we explore hybrid machine learning techniques,such as ensemble learning constructed based on SVM,K-Nearest Neighbor(KNN)and decision tree,highlighting their potential advancements in the field.The study demonstrates that the proposed machine learning models yield promising results,suggesting that pre-processing the dataset with sampling algorithm or additional machine learning technique may not always be necessary.This research contributes to the field of credit card fraud detection by emphasizing the potential of employing machine learning models directly on original datasets,thereby simplifying the workflow and potentially improving the accuracy and efficiency of fraud detection systems.
文摘Credit card fraud remains a significant challenge, with financial losses and consumer protection at stake. This study addresses the need for practical, real-time fraud detection methodologies. Using a Kaggle credit card dataset, I tackle class imbalance using the Synthetic Minority Oversampling Technique (SMOTE) to enhance modeling efficiency. I compare several machine learning algorithms, including Logistic Regression, Linear Discriminant Analysis, K-nearest Neighbors, Classification and Regression Tree, Naive Bayes, Support Vector, Random Forest, XGBoost, and Light Gradient-Boosting Machine to classify transactions as fraud or genuine. Rigorous evaluation metrics, such as AUC, PRAUC, F1, KS, Recall, and Precision, identify the Random Forest as the best performer in detecting fraudulent activities. The Random Forest model successfully identifies approximately 92% of transactions scoring 90 and above as fraudulent, equating to a detection rate of over 70% for all fraudulent transactions in the test dataset. Moreover, the model captures more than half of the fraud in each bin of the test dataset. SHAP values provide model explainability, with the SHAP summary plot highlighting the global importance of individual features, such as “V12” and “V14”. SHAP force plots offer local interpretability, revealing the impact of specific features on individual predictions. This study demonstrates the potential of machine learning, particularly the Random Forest model, for real-time credit card fraud detection, offering a promising approach to mitigate financial losses and protect consumers.
文摘In recent years,the rapid development of e-commerce exposes great vulnerabilities in online transactions for fraudsters to exploit.Credit card transactions take a salient role in nowadays’online transactions for its obvious advantages including discounts and earning credit card points.So credit card fraudulence has become a target of concern.In order to deal with the situation,credit card fraud detection based on machine learning is been studied recently.Yet,it is difficult to detect fraudulent transactions due to data imbalance(normal and fraudulent transactions),for which Smote algorithm is proposed in order to resolve data imbalance.The assessment of Light Gradient Boosting Machine model which proposed in the paper depends much on datasets collected from clients’daily transactions.Besides,to prove the new model’s superiority in detecting credit card fraudulence,Light Gradient Boosting Machine model is compared with Random Forest and Gradient Boosting Machine algorithm in the experiment.The results indicate that Light Gradient Boosting Machine model has a good performance.The experiment in credit card fraud detection based on Light Gradient Boosting Machine model achieved a total recall rate of 99%in real dataset and fast feedback,which proves the new model’s efficiency in detecting credit card fraudulence.
文摘Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the veracity of the detection algorithms become critical to the deployment of a model that accurately scores fraudulent transactions taking into account case imbalance, and the cost of identifying a case as genuine when, in fact, the case is a fraudulent transaction. In this paper, a new criterion to judge classification algorithms, which considers the cost of misclassification, is proposed, and several undersampling techniques are compared by this new criterion. At the same time, a weighted support vector machine (SVM) algorithm considering the financial cost of misclassification is introduced, proving to be more practical for credit card fraud detection than traditional methodologies. This weighted SVM uses transaction balances as weights for fraudulent transactions, and a uniformed weight for nonfraudulent transactions. The results show this strategy greatly improve performance of credit card fraud detection.
文摘Since 1995, major domestic commercial banks are beginning to have a variety of credit cards issued. However, at present, China's relatively low profitability of the credit card business, it accounts for a smaller proportion of total bank income. By means of credit card revenue/cost structure analysis, the authors found spending and overdraft balances affecting credit card business, an important factor in profitability. At the same time, combined with a commercial bank's existing statistical data, using SPSS software correlation and regression analysis, the authors found that the key to improve the bank card revenue is to raise China's commercial banks, credit card revolving credit utilization, and expand the scale of overdraft balances.
文摘Frauds don’t follow any recurring patterns.They require the use of unsupervised learning since their behaviour is continually changing.Fraud-sters have access to the most recent technology,which gives them the ability to defraud people through online transactions.Fraudsters make assumptions about consumers’routine behaviour,and fraud develops swiftly.Unsupervised learning must be used by fraud detection systems to recognize online payments since some fraudsters start out using online channels before moving on to other techniques.Building a deep convolutional neural network model to identify anomalies from conventional competitive swarm optimization pat-terns with a focus on fraud situations that cannot be identified using historical data or supervised learning is the aim of this paper Artificial Bee Colony(ABC).Using real-time data and other datasets that are readily available,the ABC-Recurrent Neural Network(RNN)categorizes fraud behaviour and compares it to the current algorithms.When compared to the current approach,the findings demonstrate that the accuracy is high and the training error is minimal in ABC_RNN.In this paper,we measure the Accuracy,F1 score,Mean Square Error(MSE)and Mean Absolute Error(MAE).Our system achieves 97%accuracy,92%precision rate and F1 score 97%.Also we compare the simulation results with existing methods.
文摘Purpose–The purpose of this paper is to generate customer clusters using self-organizing map(SOM)approach,a machine learning technique with a big data set of credit card consumptions.The authors aim to use the consumption patterns of the customers in a period of three months deducted from the credit card transactions,specifically the consumption categories(e.g.food,entertainment,etc.).Design/methodology/approach–The authors use a big data set of almost 40,000 credit card transactions to cluster customers.To deal with the size of the data set and the eliminated the required parametric assumptions the authors use a machine learning technique,SOMs.The variables used are grouped into three as demographical variables,categorical consumption variables and summary consumption variables.The variables are first converted to factors using principal component analysis.Then,the number of clusters is specified by k-means clustering trials.Then,clustering with SOM is conducted by only including the demographical variables and allvariables.Then,a comparisonis made and the significance of the variablesis examined by analysis of variance.Findings–The appropriate number of clusters is found to be 8 using k-means clusters.Then,the differences in categorical consumption levels are investigated between the clusters.However,they have been found to be insignificant,whereas the summary consumption variables are found to be significant between the clusters,as well as the demographical variables.Originality/value–The originality of the study is to incorporate the credit card consumption variables of customers to cluster the bank customers.The authors use a big data set and dealt with it with a machine learning technique to deduct the consumption patterns to generate the clusters.Credit card transactions generate a vast amount of data to deduce valuable information.It is mainly used to detect fraud in the literature.To the best of the authors’knowledge,consumption patterns obtained from credit card transaction are first used for clustering the customers in this study.