The big data cloud computing is a new computing mode,which integrates the distributed processing,the parallel processing,the network computing,the virtualization technology,the load balancing and other network technol...The big data cloud computing is a new computing mode,which integrates the distributed processing,the parallel processing,the network computing,the virtualization technology,the load balancing and other network technologies.Under the operation of the big data cloud computing system,the computing resources can be distributed in a resource pool composed of a large number of the computers,allowing users to connect with the remote computer systems according to their own data information needs.展开更多
The cloud computing platform has the functions of efficiently allocating the dynamic resources, generating the dynamic computing and storage according to the user requests, and providing the good platform for the big ...The cloud computing platform has the functions of efficiently allocating the dynamic resources, generating the dynamic computing and storage according to the user requests, and providing the good platform for the big data feature analysis and mining. The big data feature mining in the cloud computing environment is an effective method for the elficient application of the massive data in the information age. In the process of the big data mining, the method o f the big data feature mining based on the gradient sampling has the poor logicality. It only mines the big data features from a single-level perspective, which reduces the precision of the big data feature mining.展开更多
The rapid development of Internet of Things imposes new requirements on the data mining system, due to the weak capability of traditional distributed networking data mining. To meet the needs of the Internet of Things...The rapid development of Internet of Things imposes new requirements on the data mining system, due to the weak capability of traditional distributed networking data mining. To meet the needs of the Internet of Things, this paper proposes a novel distributed data-mining model to realize the seamless access between cloud computing and distributed data mining. The model is based on the cloud computing architecture, which belongs to the type of incredible nodes.展开更多
Huge volume of structured and unstructured data which is called big data, nowadays, provides opportunities for companies especially those that use electronic commerce (e-commerce). The data is collected from customer...Huge volume of structured and unstructured data which is called big data, nowadays, provides opportunities for companies especially those that use electronic commerce (e-commerce). The data is collected from customer’s internal processes, vendors, markets and business environment. This paper presents a data mining (DM) process for e-commerce including the three common algorithms: association, clustering and prediction. It also highlights some of the benefits of DM to e-commerce companies in terms of merchandise planning, sale forecasting, basket analysis, customer relationship management and market segmentation which can be achieved with the three data mining algorithms. The main aim of this paper is to review the application of data mining in e-commerce by focusing on structured and unstructured data collected thorough various resources and cloud computing services in order to justify the importance of data mining. Moreover, this study evaluates certain challenges of data mining like spider identification, data transformations and making data model comprehensible to business users. Other challenges which are supporting the slow changing dimensions of data, making the data transformation and model building accessible to business users are also evaluated. A clear guide to e-commerce companies sitting on huge volume of data to easily manipulate the data for business improvement which in return will place them highly competitive among their competitors is also provided in this paper.展开更多
With the explosive increase in mobile apps, more and more threats migrate from traditional PC client to mobile device. Compared with traditional Win+Intel alliance in PC, Android+ARM alliance dominates in Mobile Int...With the explosive increase in mobile apps, more and more threats migrate from traditional PC client to mobile device. Compared with traditional Win+Intel alliance in PC, Android+ARM alliance dominates in Mobile Internet, the apps replace the PC client software as the major target of malicious usage. In this paper, to improve the security status of current mobile apps, we propose a methodology to evaluate mobile apps based on cloud computing platform and data mining. We also present a prototype system named MobSafe to identify the mobile app's virulence or benignancy. Compared with traditional method, such as permission pattern based method, MobSafe combines the dynamic and static analysis methods to comprehensively evaluate an Android app. In the implementation, we adopt Android Security Evaluation Framework (ASEF) and Static Android Analysis Framework (SAAF), the two representative dynamic and static analysis methods, to evaluate the Android apps and estimate the total time needed to evaluate all the apps stored in one mobile app market. Based on the real trace from a commercial mobile app market called AppChina, we can collect the statistics of the number of active Android apps, the average number apps installed in one Android device, and the expanding ratio of mobile apps. As mobile app market serves as the main line of defence against mobile malwares, our evaluation results show that it is practical to use cloud computing platform and data mining to verify all stored apps routinely to filter out malware apps from mobile app markets. As the future work, MobSafe can extensively use machine learning to conduct automotive forensic analysis of mobile apps based on the generated multifaceted data in this stage.展开更多
近年来大数据及云计算技术、人工智能技术的发展,使得K-Means聚类算法、DBSCAN聚类算法、BIRCH聚类算法、Cluster数据分布算法不断涌现,但不同算法在面对海量化、多样化网络数据样本时的性能存在差异。基于此,从不同数据文本关联性、数...近年来大数据及云计算技术、人工智能技术的发展,使得K-Means聚类算法、DBSCAN聚类算法、BIRCH聚类算法、Cluster数据分布算法不断涌现,但不同算法在面对海量化、多样化网络数据样本时的性能存在差异。基于此,从不同数据文本关联性、数据集资源的海量性角度出发,利用云计算Spark分布式架构、HDFS(Hadoop Distributed FileSystem)分布式文件系统、Spark SQL数据计算引擎、YARN(Yet Another Resource Negotiator)资源管理器等软件,建构起融合K-Means聚类算法、BIRCH(Balanced Iterative Reducing and Clustering using Hierarchies)聚类算法的数据挖掘模型,根据CF树聚类的判别函数确定被测试数据集的类别,由Spark计算模型将完成聚类的数据集分布式缓存至网络节点内存中,以实现对海量化网络数据的挖掘、聚类及存储操作。展开更多
This paper proposes an analytical mining tool for big graph data based on MapReduce and bulk synchronous parallel (BSP) com puting model. The tool is named Mapreduce and BSP based Graphmining tool (MBGM). The core...This paper proposes an analytical mining tool for big graph data based on MapReduce and bulk synchronous parallel (BSP) com puting model. The tool is named Mapreduce and BSP based Graphmining tool (MBGM). The core of this mining system are four sets of parallel graphmining algorithms programmed in the BSP parallel model and one set of data extractiontransformationload ing (ETE) algorithms implemented in MapReduce. To invoke these algorithm sets, we designed a workflow engine which optimized for cloud computing. Finally, a welldesigned data management function enables users to view, delete and input data in the Ha doop distributed file system (HDFS). Experiments on artificial data show that the components of graphmining algorithm in MBGM are efficient.展开更多
With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this pap...With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.展开更多
The increasing quantity of sensitive and personal data being gathered by data controllers has raised the security needs in the cloud environment.Cloud computing(CC)is used for storing as well as processing data.Theref...The increasing quantity of sensitive and personal data being gathered by data controllers has raised the security needs in the cloud environment.Cloud computing(CC)is used for storing as well as processing data.Therefore,security becomes important as the CC handles massive quantity of outsourced,and unprotected sensitive data for public access.This study introduces a novel chaotic chimp optimization with machine learning enabled information security(CCOML-IS)technique on cloud environment.The proposed CCOML-IS technique aims to accomplish maximum security in the CC environment by the identification of intrusions or anomalies in the network.The proposed CCOML-IS technique primarily normalizes the networking data by the use of data conversion and min-max normalization.Followed by,the CCOML-IS technique derives a feature selection technique using chaotic chimp optimization algorithm(CCOA).In addition,kernel ridge regression(KRR)classifier is used for the detection of security issues in the network.The design of CCOA technique assists in choosing optimal features and thereby boost the classification performance.A wide set of experimentations were carried out on benchmark datasets and the results are assessed under several measures.The comparison study reported the enhanced outcomes of the CCOML-IS technique over the recent approaches interms of several measures.展开更多
针对目前大型地下工程在复杂地质环境条件下硬岩隧道掘进机(tunnel boring machine,TBM)施工的安全保障工作存在“缺数据、缺平台、缺分析”以及海量数据难以挖掘与计算分析的问题,提出了TBM施工数据“Born by digit,Born in format,Bor...针对目前大型地下工程在复杂地质环境条件下硬岩隧道掘进机(tunnel boring machine,TBM)施工的安全保障工作存在“缺数据、缺平台、缺分析”以及海量数据难以挖掘与计算分析的问题,提出了TBM施工数据“Born by digit,Born in format,Born to the cloud”的理念,研究构建了基于云计算和大数据挖掘技术的复杂地质环境状态TBM施工信息综合管理平台,该平台通过环境地质分析以及数据采集、存储和计算分析,建立基于TBM锚杆钻机随钻测量系统和岩体渣片图像识别系统实时采集岩体数字化信息,并实时采集TBM施工过程中PLC(可编程逻辑控制器)数据,建立Mongo DB TMB多源施工信息大数据仓库,同时基于Openstack技术搭建私有云计算平台以及在SaaS层部署算法组件开展TBM大数据挖掘。该平台的搭建不仅能提供基础信息,帮助提升安全管控水平,更重要的是可为TBM在复杂地质环境条件下对于各类瓶颈问题的研究解决提供数据挖掘和计算分析的支撑,并通过其所建立的知识库和规则库指导TBM设计、选型、控制。该研究成果在中国吉林引松供水工程项目上得到应用,并取得了较好的效益。展开更多
文摘The big data cloud computing is a new computing mode,which integrates the distributed processing,the parallel processing,the network computing,the virtualization technology,the load balancing and other network technologies.Under the operation of the big data cloud computing system,the computing resources can be distributed in a resource pool composed of a large number of the computers,allowing users to connect with the remote computer systems according to their own data information needs.
文摘The cloud computing platform has the functions of efficiently allocating the dynamic resources, generating the dynamic computing and storage according to the user requests, and providing the good platform for the big data feature analysis and mining. The big data feature mining in the cloud computing environment is an effective method for the elficient application of the massive data in the information age. In the process of the big data mining, the method o f the big data feature mining based on the gradient sampling has the poor logicality. It only mines the big data features from a single-level perspective, which reduces the precision of the big data feature mining.
文摘The rapid development of Internet of Things imposes new requirements on the data mining system, due to the weak capability of traditional distributed networking data mining. To meet the needs of the Internet of Things, this paper proposes a novel distributed data-mining model to realize the seamless access between cloud computing and distributed data mining. The model is based on the cloud computing architecture, which belongs to the type of incredible nodes.
文摘Huge volume of structured and unstructured data which is called big data, nowadays, provides opportunities for companies especially those that use electronic commerce (e-commerce). The data is collected from customer’s internal processes, vendors, markets and business environment. This paper presents a data mining (DM) process for e-commerce including the three common algorithms: association, clustering and prediction. It also highlights some of the benefits of DM to e-commerce companies in terms of merchandise planning, sale forecasting, basket analysis, customer relationship management and market segmentation which can be achieved with the three data mining algorithms. The main aim of this paper is to review the application of data mining in e-commerce by focusing on structured and unstructured data collected thorough various resources and cloud computing services in order to justify the importance of data mining. Moreover, this study evaluates certain challenges of data mining like spider identification, data transformations and making data model comprehensible to business users. Other challenges which are supporting the slow changing dimensions of data, making the data transformation and model building accessible to business users are also evaluated. A clear guide to e-commerce companies sitting on huge volume of data to easily manipulate the data for business improvement which in return will place them highly competitive among their competitors is also provided in this paper.
基金the National Key Basic Research and Development (973) Program of China (Nos. 2012CB315801 and 2011CB302805)the National Natural Science Foundation of China (Nos. 61161140320 and 61233016)Intel Research Council with the title of Security Vulnerability Analysis based on Cloud Platform with Intel IA Architecture
文摘With the explosive increase in mobile apps, more and more threats migrate from traditional PC client to mobile device. Compared with traditional Win+Intel alliance in PC, Android+ARM alliance dominates in Mobile Internet, the apps replace the PC client software as the major target of malicious usage. In this paper, to improve the security status of current mobile apps, we propose a methodology to evaluate mobile apps based on cloud computing platform and data mining. We also present a prototype system named MobSafe to identify the mobile app's virulence or benignancy. Compared with traditional method, such as permission pattern based method, MobSafe combines the dynamic and static analysis methods to comprehensively evaluate an Android app. In the implementation, we adopt Android Security Evaluation Framework (ASEF) and Static Android Analysis Framework (SAAF), the two representative dynamic and static analysis methods, to evaluate the Android apps and estimate the total time needed to evaluate all the apps stored in one mobile app market. Based on the real trace from a commercial mobile app market called AppChina, we can collect the statistics of the number of active Android apps, the average number apps installed in one Android device, and the expanding ratio of mobile apps. As mobile app market serves as the main line of defence against mobile malwares, our evaluation results show that it is practical to use cloud computing platform and data mining to verify all stored apps routinely to filter out malware apps from mobile app markets. As the future work, MobSafe can extensively use machine learning to conduct automotive forensic analysis of mobile apps based on the generated multifaceted data in this stage.
文摘近年来大数据及云计算技术、人工智能技术的发展,使得K-Means聚类算法、DBSCAN聚类算法、BIRCH聚类算法、Cluster数据分布算法不断涌现,但不同算法在面对海量化、多样化网络数据样本时的性能存在差异。基于此,从不同数据文本关联性、数据集资源的海量性角度出发,利用云计算Spark分布式架构、HDFS(Hadoop Distributed FileSystem)分布式文件系统、Spark SQL数据计算引擎、YARN(Yet Another Resource Negotiator)资源管理器等软件,建构起融合K-Means聚类算法、BIRCH(Balanced Iterative Reducing and Clustering using Hierarchies)聚类算法的数据挖掘模型,根据CF树聚类的判别函数确定被测试数据集的类别,由Spark计算模型将完成聚类的数据集分布式缓存至网络节点内存中,以实现对海量化网络数据的挖掘、聚类及存储操作。
基金supported by ZTE Industry-Academia-Research Cooperaton Funds
文摘This paper proposes an analytical mining tool for big graph data based on MapReduce and bulk synchronous parallel (BSP) com puting model. The tool is named Mapreduce and BSP based Graphmining tool (MBGM). The core of this mining system are four sets of parallel graphmining algorithms programmed in the BSP parallel model and one set of data extractiontransformationload ing (ETE) algorithms implemented in MapReduce. To invoke these algorithm sets, we designed a workflow engine which optimized for cloud computing. Finally, a welldesigned data management function enables users to view, delete and input data in the Ha doop distributed file system (HDFS). Experiments on artificial data show that the components of graphmining algorithm in MBGM are efficient.
基金supported in part by the Fundamental Research Funds for the Central Universities under Grant No.2013RC0114111 Project of China under Grant No.B08004
文摘With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under Grant Number(RGP 2/49/42)Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R237),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘The increasing quantity of sensitive and personal data being gathered by data controllers has raised the security needs in the cloud environment.Cloud computing(CC)is used for storing as well as processing data.Therefore,security becomes important as the CC handles massive quantity of outsourced,and unprotected sensitive data for public access.This study introduces a novel chaotic chimp optimization with machine learning enabled information security(CCOML-IS)technique on cloud environment.The proposed CCOML-IS technique aims to accomplish maximum security in the CC environment by the identification of intrusions or anomalies in the network.The proposed CCOML-IS technique primarily normalizes the networking data by the use of data conversion and min-max normalization.Followed by,the CCOML-IS technique derives a feature selection technique using chaotic chimp optimization algorithm(CCOA).In addition,kernel ridge regression(KRR)classifier is used for the detection of security issues in the network.The design of CCOA technique assists in choosing optimal features and thereby boost the classification performance.A wide set of experimentations were carried out on benchmark datasets and the results are assessed under several measures.The comparison study reported the enhanced outcomes of the CCOML-IS technique over the recent approaches interms of several measures.
文摘针对目前大型地下工程在复杂地质环境条件下硬岩隧道掘进机(tunnel boring machine,TBM)施工的安全保障工作存在“缺数据、缺平台、缺分析”以及海量数据难以挖掘与计算分析的问题,提出了TBM施工数据“Born by digit,Born in format,Born to the cloud”的理念,研究构建了基于云计算和大数据挖掘技术的复杂地质环境状态TBM施工信息综合管理平台,该平台通过环境地质分析以及数据采集、存储和计算分析,建立基于TBM锚杆钻机随钻测量系统和岩体渣片图像识别系统实时采集岩体数字化信息,并实时采集TBM施工过程中PLC(可编程逻辑控制器)数据,建立Mongo DB TMB多源施工信息大数据仓库,同时基于Openstack技术搭建私有云计算平台以及在SaaS层部署算法组件开展TBM大数据挖掘。该平台的搭建不仅能提供基础信息,帮助提升安全管控水平,更重要的是可为TBM在复杂地质环境条件下对于各类瓶颈问题的研究解决提供数据挖掘和计算分析的支撑,并通过其所建立的知识库和规则库指导TBM设计、选型、控制。该研究成果在中国吉林引松供水工程项目上得到应用,并取得了较好的效益。