Supply Chain Finance(SCF)is important for improving the effectiveness of supply chain capital operations and reducing the overall management cost of a supply chain.In recent years,with the deep integration of supply c...Supply Chain Finance(SCF)is important for improving the effectiveness of supply chain capital operations and reducing the overall management cost of a supply chain.In recent years,with the deep integration of supply chain and Internet,Big Data,Artificial Intelligence,Internet of Things,Blockchain,etc.,the efficiency of supply chain financial services can be greatly promoted through building more customized risk pricing models and conducting more rigorous investment decision-making processes.However,with the rapid development of new technologies,the SCF data has been massively increased and new financial fraud behaviors or patterns are becoming more covertly scattered among normal ones.The lack of enough capability to handle the big data volumes and mitigate the financial frauds may lead to huge losses in supply chains.In this article,a distributed approach of big data mining is proposed for financial fraud detection in a supply chain,which implements the distributed deep learning model of Convolutional Neural Network(CNN)on big data infrastructure of Apache Spark and Hadoop to speed up the processing of the large dataset in parallel and reduce the processing time significantly.By training and testing on the continually updated SCF dataset,the approach can intelligently and automatically classify the massive data samples and discover the fraudulent financing behaviors,so as to enhance the financial fraud detection with high precision and recall rates,and reduce the losses of frauds in a supply chain.展开更多
To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based Multi-Label Classification(PDFMLC)algorithm.Initial...To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based Multi-Label Classification(PDFMLC)algorithm.Initially,open-source cybersecurity analysis reports are collected and converted into a standardized text format.Subsequently,five tactics category labels are annotated,creating a multi-label dataset for tactics classification.Addressing the limitations of low execution efficiency and scalability in the sequential deep forest algorithm,our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch(LZW)algorithm,significantly enhancing its acceleration ratio.Furthermore,our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input features.This captures latent label associations,significantly improving classification accuracy.Finally,we present the PDFMLC-based Threat Intelligence Mining(PDFMLC-TIM)method.Experimental results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution efficiency.Simultaneously,the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports,extracting tactics entities to construct comprehensive threat intelligence.As a result,successfully formatted STIX2.1 threat intelligence is established.展开更多
Recent emergence of diverse services have led to explosive traffic growth in cellular data networks. Understanding the service dynamics in large cellular networks is important for network design, trouble shooting, qua...Recent emergence of diverse services have led to explosive traffic growth in cellular data networks. Understanding the service dynamics in large cellular networks is important for network design, trouble shooting, quality of service(Qo E) support, and resource allocation. In this paper, we present our study to reveal the distributions and temporal patterns of different services in cellular data network from two different perspectives, namely service request times and service duration. Our study is based on big traffic data, which is parsed to readable records by our Hadoop-based packet parsing platform, captured over a week-long period from a tier-1 mobile operator's network in China. We propose a Zipf's ranked model to characterize the distributions of traffic volume, packet, request times and duration of cellular services. Two-stage method(Self-Organizing Map combined with kmeans) is first used to cluster time series of service into four request patterns and three duration patterns. These seven patterns are combined together to better understand the fine-grained temporal patterns of service in cellular network. Results of our distribution models and temporal patterns present cellular network operators with a better understanding of the request and duration characteristics of service, which of great importance in network design, service generation and resource allocation.展开更多
In the data retrieval process of the Data recommendation system,the matching prediction and similarity identification take place a major role in the ontology.In that,there are several methods to improve the retrieving...In the data retrieval process of the Data recommendation system,the matching prediction and similarity identification take place a major role in the ontology.In that,there are several methods to improve the retrieving process with improved accuracy and to reduce the searching time.Since,in the data recommendation system,this type of data searching becomes complex to search for the best matching for given query data and fails in the accuracy of the query recommendation process.To improve the performance of data validation,this paper proposed a novel model of data similarity estimation and clustering method to retrieve the relevant data with the best matching in the big data processing.In this paper advanced model of the Logarithmic Directionality Texture Pattern(LDTP)method with a Metaheuristic Pattern Searching(MPS)system was used to estimate the similarity between the query data in the entire database.The overall work was implemented for the application of the data recommendation process.These are all indexed and grouped as a cluster to form a paged format of database structure which can reduce the computation time while at the searching period.Also,with the help of a neural network,the relevancies of feature attributes in the database are predicted,and the matching index was sorted to provide the recommended data for given query data.This was achieved by using the Distributional Recurrent Neural Network(DRNN).This is an enhanced model of Neural Network technology to find the relevancy based on the correlation factor of the feature set.The training process of the DRNN classifier was carried out by estimating the correlation factor of the attributes of the dataset.These are formed as clusters and paged with proper indexing based on the MPS parameter of similarity metric.The overall performance of the proposed work can be evaluated by varying the size of the training database by 60%,70%,and 80%.The parameters that are considered for performance analysis are Precision,Recall,F1-score and the accuracy of data retrieval,the query recommendation output,and comparison with other state-of-art methods.展开更多
With the explosive increase in mobile apps, more and more threats migrate from traditional PC client to mobile device. Compared with traditional Win+Intel alliance in PC, Android+ARM alliance dominates in Mobile Int...With the explosive increase in mobile apps, more and more threats migrate from traditional PC client to mobile device. Compared with traditional Win+Intel alliance in PC, Android+ARM alliance dominates in Mobile Internet, the apps replace the PC client software as the major target of malicious usage. In this paper, to improve the security status of current mobile apps, we propose a methodology to evaluate mobile apps based on cloud computing platform and data mining. We also present a prototype system named MobSafe to identify the mobile app's virulence or benignancy. Compared with traditional method, such as permission pattern based method, MobSafe combines the dynamic and static analysis methods to comprehensively evaluate an Android app. In the implementation, we adopt Android Security Evaluation Framework (ASEF) and Static Android Analysis Framework (SAAF), the two representative dynamic and static analysis methods, to evaluate the Android apps and estimate the total time needed to evaluate all the apps stored in one mobile app market. Based on the real trace from a commercial mobile app market called AppChina, we can collect the statistics of the number of active Android apps, the average number apps installed in one Android device, and the expanding ratio of mobile apps. As mobile app market serves as the main line of defence against mobile malwares, our evaluation results show that it is practical to use cloud computing platform and data mining to verify all stored apps routinely to filter out malware apps from mobile app markets. As the future work, MobSafe can extensively use machine learning to conduct automotive forensic analysis of mobile apps based on the generated multifaceted data in this stage.展开更多
基金This research work is supported by Hunan Provincial Education Science 13th Five-Year Plan(Grant No.XJK016BXX001,Zhou,H.,http://jyt.hunan.gov.cn/jyt/sjyt/jky/index.html)Social Science Foundation of Hunan Province(Grant No.17YBA049,Zhou,H.,https://sk.rednet.cn/channel/7862.html)The work is also supported by Open Foundation for University Innovation Platform from Hunan Province,China(Grand No.18K103,Sun,G.,http://kxjsc.gov.hnedu.cn/).
文摘Supply Chain Finance(SCF)is important for improving the effectiveness of supply chain capital operations and reducing the overall management cost of a supply chain.In recent years,with the deep integration of supply chain and Internet,Big Data,Artificial Intelligence,Internet of Things,Blockchain,etc.,the efficiency of supply chain financial services can be greatly promoted through building more customized risk pricing models and conducting more rigorous investment decision-making processes.However,with the rapid development of new technologies,the SCF data has been massively increased and new financial fraud behaviors or patterns are becoming more covertly scattered among normal ones.The lack of enough capability to handle the big data volumes and mitigate the financial frauds may lead to huge losses in supply chains.In this article,a distributed approach of big data mining is proposed for financial fraud detection in a supply chain,which implements the distributed deep learning model of Convolutional Neural Network(CNN)on big data infrastructure of Apache Spark and Hadoop to speed up the processing of the large dataset in parallel and reduce the processing time significantly.By training and testing on the continually updated SCF dataset,the approach can intelligently and automatically classify the massive data samples and discover the fraudulent financing behaviors,so as to enhance the financial fraud detection with high precision and recall rates,and reduce the losses of frauds in a supply chain.
文摘To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based Multi-Label Classification(PDFMLC)algorithm.Initially,open-source cybersecurity analysis reports are collected and converted into a standardized text format.Subsequently,five tactics category labels are annotated,creating a multi-label dataset for tactics classification.Addressing the limitations of low execution efficiency and scalability in the sequential deep forest algorithm,our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch(LZW)algorithm,significantly enhancing its acceleration ratio.Furthermore,our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input features.This captures latent label associations,significantly improving classification accuracy.Finally,we present the PDFMLC-based Threat Intelligence Mining(PDFMLC-TIM)method.Experimental results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution efficiency.Simultaneously,the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports,extracting tactics entities to construct comprehensive threat intelligence.As a result,successfully formatted STIX2.1 threat intelligence is established.
基金supported by the National Basic Research Program of China (973 Program: 2013CB329004)
文摘Recent emergence of diverse services have led to explosive traffic growth in cellular data networks. Understanding the service dynamics in large cellular networks is important for network design, trouble shooting, quality of service(Qo E) support, and resource allocation. In this paper, we present our study to reveal the distributions and temporal patterns of different services in cellular data network from two different perspectives, namely service request times and service duration. Our study is based on big traffic data, which is parsed to readable records by our Hadoop-based packet parsing platform, captured over a week-long period from a tier-1 mobile operator's network in China. We propose a Zipf's ranked model to characterize the distributions of traffic volume, packet, request times and duration of cellular services. Two-stage method(Self-Organizing Map combined with kmeans) is first used to cluster time series of service into four request patterns and three duration patterns. These seven patterns are combined together to better understand the fine-grained temporal patterns of service in cellular network. Results of our distribution models and temporal patterns present cellular network operators with a better understanding of the request and duration characteristics of service, which of great importance in network design, service generation and resource allocation.
文摘In the data retrieval process of the Data recommendation system,the matching prediction and similarity identification take place a major role in the ontology.In that,there are several methods to improve the retrieving process with improved accuracy and to reduce the searching time.Since,in the data recommendation system,this type of data searching becomes complex to search for the best matching for given query data and fails in the accuracy of the query recommendation process.To improve the performance of data validation,this paper proposed a novel model of data similarity estimation and clustering method to retrieve the relevant data with the best matching in the big data processing.In this paper advanced model of the Logarithmic Directionality Texture Pattern(LDTP)method with a Metaheuristic Pattern Searching(MPS)system was used to estimate the similarity between the query data in the entire database.The overall work was implemented for the application of the data recommendation process.These are all indexed and grouped as a cluster to form a paged format of database structure which can reduce the computation time while at the searching period.Also,with the help of a neural network,the relevancies of feature attributes in the database are predicted,and the matching index was sorted to provide the recommended data for given query data.This was achieved by using the Distributional Recurrent Neural Network(DRNN).This is an enhanced model of Neural Network technology to find the relevancy based on the correlation factor of the feature set.The training process of the DRNN classifier was carried out by estimating the correlation factor of the attributes of the dataset.These are formed as clusters and paged with proper indexing based on the MPS parameter of similarity metric.The overall performance of the proposed work can be evaluated by varying the size of the training database by 60%,70%,and 80%.The parameters that are considered for performance analysis are Precision,Recall,F1-score and the accuracy of data retrieval,the query recommendation output,and comparison with other state-of-art methods.
基金the National Key Basic Research and Development (973) Program of China (Nos. 2012CB315801 and 2011CB302805)the National Natural Science Foundation of China (Nos. 61161140320 and 61233016)Intel Research Council with the title of Security Vulnerability Analysis based on Cloud Platform with Intel IA Architecture
文摘With the explosive increase in mobile apps, more and more threats migrate from traditional PC client to mobile device. Compared with traditional Win+Intel alliance in PC, Android+ARM alliance dominates in Mobile Internet, the apps replace the PC client software as the major target of malicious usage. In this paper, to improve the security status of current mobile apps, we propose a methodology to evaluate mobile apps based on cloud computing platform and data mining. We also present a prototype system named MobSafe to identify the mobile app's virulence or benignancy. Compared with traditional method, such as permission pattern based method, MobSafe combines the dynamic and static analysis methods to comprehensively evaluate an Android app. In the implementation, we adopt Android Security Evaluation Framework (ASEF) and Static Android Analysis Framework (SAAF), the two representative dynamic and static analysis methods, to evaluate the Android apps and estimate the total time needed to evaluate all the apps stored in one mobile app market. Based on the real trace from a commercial mobile app market called AppChina, we can collect the statistics of the number of active Android apps, the average number apps installed in one Android device, and the expanding ratio of mobile apps. As mobile app market serves as the main line of defence against mobile malwares, our evaluation results show that it is practical to use cloud computing platform and data mining to verify all stored apps routinely to filter out malware apps from mobile app markets. As the future work, MobSafe can extensively use machine learning to conduct automotive forensic analysis of mobile apps based on the generated multifaceted data in this stage.