The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud,particularly in credit card transactions.Advanced technologies such as machin...The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud,particularly in credit card transactions.Advanced technologies such as machine learning have been widely employed to enhance the early detection and prevention of losses arising frompotentially fraudulent activities.However,a prevalent approach in existing literature involves the use of extensive data sampling and feature selection algorithms as a precursor to subsequent investigations.While sampling techniques can significantly reduce computational time,the resulting dataset relies on generated data and the accuracy of the pre-processing machine learning models employed.Such datasets often lack true representativeness of realworld data,potentially introducing secondary issues that affect the precision of the results.For instance,undersampling may result in the loss of critical information,while over-sampling can lead to overfitting machine learning models.In this paper,we proposed a classification study of credit card fraud using fundamental machine learning models without the application of any sampling techniques on all the features present in the original dataset.The results indicate that Support Vector Machine(SVM)consistently achieves classification performance exceeding 90%across various evaluation metrics.This discovery serves as a valuable reference for future research,encouraging comparative studies on original dataset without the reliance on sampling techniques.Furthermore,we explore hybrid machine learning techniques,such as ensemble learning constructed based on SVM,K-Nearest Neighbor(KNN)and decision tree,highlighting their potential advancements in the field.The study demonstrates that the proposed machine learning models yield promising results,suggesting that pre-processing the dataset with sampling algorithm or additional machine learning technique may not always be necessary.This research contributes to the field of credit card fraud detection by emphasizing the potential of employing machine learning models directly on original datasets,thereby simplifying the workflow and potentially improving the accuracy and efficiency of fraud detection systems.展开更多
A group activity recognition algorithm is proposed to improve the recognition accuracy in video surveillance by using complex wavelet domain based Cayley-Klein metric learning.Non-sampled dual-tree complex wavelet pac...A group activity recognition algorithm is proposed to improve the recognition accuracy in video surveillance by using complex wavelet domain based Cayley-Klein metric learning.Non-sampled dual-tree complex wavelet packet transform(NS-DTCWPT)is used to decompose the human images in videos into multi-scale and multi-resolution.An improved local binary pattern(ILBP)and an inner-distance shape context(IDSC)combined with bag-of-words model is adopted to extract the decomposed high and low frequency coefficient features.The extracted coefficient features of the training samples are used to optimize Cayley-Klein metric matrix by solving a nonlinear optimization problem.The group activities in videos are recognized by using the method of feature extraction and Cayley-Klein metric learning.Experimental results on behave video set,group activity video set,and self-built video set show that the proposed algorithm has higher recognition accuracy than the existing algorithms.展开更多
Non-sampling errors can generally be divided into three types:sampling frame errors,non-response errors and measurement errors.Missing target units in the sam-pling frame,improper handling of non-responses,and misrepo...Non-sampling errors can generally be divided into three types:sampling frame errors,non-response errors and measurement errors.Missing target units in the sam-pling frame,improper handling of non-responses,and misreporting or underreport-ing of key variables in the questionnaire can all cause deviations in a survey’s results.The widespread application of Computer-Assisted Personal Interviewing(CAPI)systems and the inclusion of administrative records from government sources in sur-veys has strengthened the ability to control non-sampling errors.Taking a national fertility sampling survey as an example,this study summarizes the sources of var-ious non-sampling errors and explains how to harness big data resources such as administrative records to control non-sampling errors throughout the survey.The study analyzes the impact of three types of non-sampling errors on the results of the fertility survey and examines the strategies used to address the problems caused by these non-sampling errors.The findings indicate that non-sampling errors were the main source of total error in the survey,and that the errors found came mainly from sampling frame errors;non-response errors and measurement errors were controlled and had little impact on the survey results.展开更多
Mixing index is an important parameter to understand and assess the mixing state in various mixers including ribbon mixers,the typical food processing devices.Many mixing indices based on either sample variance method...Mixing index is an important parameter to understand and assess the mixing state in various mixers including ribbon mixers,the typical food processing devices.Many mixing indices based on either sample variance methods or non-sample variance methods have been proposed and used in the past,however,they were not well compared in the literature to evaluate their accuracy of assessing the final mixing state.In this study,discrete element method(DEM)modelling is used to investigate and compare the accuracy of these mixing indices for mixing of uniform particles in a horizontal cylindrical ribbon mixer.The sample variance methods for mixing indices are first compared both at particle-and macro-scale levels.In addition,non-sample variance methods,namely entropy and non-sampling indices are compared against the results from the sample variance methods.The simulation results indicate that,among the indices considered in this study,Lacey index shows the most accurate results.The Lacey index is regarded to be the most suitable mixing index to evaluate the steady-state mixing state of the ribbon mixer in the real-time(or without stopping the impeller)at both the particle-and macro-scale levels.The study is useful for the selection of a proper mixing index for a specific mixture in a given mixer.展开更多
文摘The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud,particularly in credit card transactions.Advanced technologies such as machine learning have been widely employed to enhance the early detection and prevention of losses arising frompotentially fraudulent activities.However,a prevalent approach in existing literature involves the use of extensive data sampling and feature selection algorithms as a precursor to subsequent investigations.While sampling techniques can significantly reduce computational time,the resulting dataset relies on generated data and the accuracy of the pre-processing machine learning models employed.Such datasets often lack true representativeness of realworld data,potentially introducing secondary issues that affect the precision of the results.For instance,undersampling may result in the loss of critical information,while over-sampling can lead to overfitting machine learning models.In this paper,we proposed a classification study of credit card fraud using fundamental machine learning models without the application of any sampling techniques on all the features present in the original dataset.The results indicate that Support Vector Machine(SVM)consistently achieves classification performance exceeding 90%across various evaluation metrics.This discovery serves as a valuable reference for future research,encouraging comparative studies on original dataset without the reliance on sampling techniques.Furthermore,we explore hybrid machine learning techniques,such as ensemble learning constructed based on SVM,K-Nearest Neighbor(KNN)and decision tree,highlighting their potential advancements in the field.The study demonstrates that the proposed machine learning models yield promising results,suggesting that pre-processing the dataset with sampling algorithm or additional machine learning technique may not always be necessary.This research contributes to the field of credit card fraud detection by emphasizing the potential of employing machine learning models directly on original datasets,thereby simplifying the workflow and potentially improving the accuracy and efficiency of fraud detection systems.
基金Supported by the National Natural Science Foundation of China(61672032,61401001)the Natural Science Foundation of Anhui Province(1408085MF121)the Opening Foundation of Anhui Key Laboratory of Polarization Imaging Detection Technology(2016-KFKT-003)
文摘A group activity recognition algorithm is proposed to improve the recognition accuracy in video surveillance by using complex wavelet domain based Cayley-Klein metric learning.Non-sampled dual-tree complex wavelet packet transform(NS-DTCWPT)is used to decompose the human images in videos into multi-scale and multi-resolution.An improved local binary pattern(ILBP)and an inner-distance shape context(IDSC)combined with bag-of-words model is adopted to extract the decomposed high and low frequency coefficient features.The extracted coefficient features of the training samples are used to optimize Cayley-Klein metric matrix by solving a nonlinear optimization problem.The group activities in videos are recognized by using the method of feature extraction and Cayley-Klein metric learning.Experimental results on behave video set,group activity video set,and self-built video set show that the proposed algorithm has higher recognition accuracy than the existing algorithms.
基金sponsored by the Follow-up Research on Fertility Level and Fertility Intentions with the Help of Big Data(No.21BRK001)a research project funded by the National Social Science Fund of China.
文摘Non-sampling errors can generally be divided into three types:sampling frame errors,non-response errors and measurement errors.Missing target units in the sam-pling frame,improper handling of non-responses,and misreporting or underreport-ing of key variables in the questionnaire can all cause deviations in a survey’s results.The widespread application of Computer-Assisted Personal Interviewing(CAPI)systems and the inclusion of administrative records from government sources in sur-veys has strengthened the ability to control non-sampling errors.Taking a national fertility sampling survey as an example,this study summarizes the sources of var-ious non-sampling errors and explains how to harness big data resources such as administrative records to control non-sampling errors throughout the survey.The study analyzes the impact of three types of non-sampling errors on the results of the fertility survey and examines the strategies used to address the problems caused by these non-sampling errors.The findings indicate that non-sampling errors were the main source of total error in the survey,and that the errors found came mainly from sampling frame errors;non-response errors and measurement errors were controlled and had little impact on the survey results.
基金This work is financially supported by the Australian Research Council(DP180101232).
文摘Mixing index is an important parameter to understand and assess the mixing state in various mixers including ribbon mixers,the typical food processing devices.Many mixing indices based on either sample variance methods or non-sample variance methods have been proposed and used in the past,however,they were not well compared in the literature to evaluate their accuracy of assessing the final mixing state.In this study,discrete element method(DEM)modelling is used to investigate and compare the accuracy of these mixing indices for mixing of uniform particles in a horizontal cylindrical ribbon mixer.The sample variance methods for mixing indices are first compared both at particle-and macro-scale levels.In addition,non-sample variance methods,namely entropy and non-sampling indices are compared against the results from the sample variance methods.The simulation results indicate that,among the indices considered in this study,Lacey index shows the most accurate results.The Lacey index is regarded to be the most suitable mixing index to evaluate the steady-state mixing state of the ribbon mixer in the real-time(or without stopping the impeller)at both the particle-and macro-scale levels.The study is useful for the selection of a proper mixing index for a specific mixture in a given mixer.