Objective To study the research status,research hotspots and development trends in the field of real-world data(RWD)through social network analysis and knowledge graph analysis.Methods RWD of the past 10 years were re...Objective To study the research status,research hotspots and development trends in the field of real-world data(RWD)through social network analysis and knowledge graph analysis.Methods RWD of the past 10 years were retrieved,and literature metrological analysis was made by using UCINET and CiteSpace from CNKI.Results and Conclusion The frequency and centrality of related keywords such as real-world study,hospital information system(HIS),drug combination,data mining and TCM are high.The clusters labeled as clinical medication and RWD contain more keywords.In recent 4 years,there are more articles involving the keywords of data specification,data authenticity,data security and information security.Among them,compound Kushen injection,HIS database and RWD are the top three keywords.It is a long-term research hotspot for Chinese and western medicine to use HIS to study clinical medication,clinical characteristics,diseases and injections.Besides,the research of RWD database has changed from construction to standardized collection and governance,which can make RWD effective.Data authenticity,data security and information security will become the new hotspots in the research of RWD.展开更多
The networks are fundamental to our modern world and they appear throughout science and society.Access to a massive amount of data presents a unique opportunity to the researcher’s community.As networks grow in size ...The networks are fundamental to our modern world and they appear throughout science and society.Access to a massive amount of data presents a unique opportunity to the researcher’s community.As networks grow in size the complexity increases and our ability to analyze them using the current state of the art is at severe risk of failing to keep pace.Therefore,this paper initiates a discussion on graph signal processing for large-scale data analysis.We first provide a comprehensive overview of core ideas in Graph signal processing(GSP)and their connection to conventional digital signal processing(DSP).We then summarize recent developments in developing basic GSP tools,including methods for graph filtering or graph learning,graph signal,graph Fourier transform(GFT),spectrum,graph frequency,etc.Graph filtering is a basic task that allows for isolating the contribution of individual frequencies and therefore enables the removal of noise.We then consider a graph filter as a model that helps to extend the application of GSP methods to large datasets.To show the suitability and the effeteness,we first created a noisy graph signal and then applied it to the filter.After several rounds of simulation results.We see that the filtered signal appears to be smoother and is closer to the original noise-free distance-based signal.By using this example application,we thoroughly demonstrated that graph filtration is efficient for big data analytics.展开更多
This paper proposes an analytical mining tool for big graph data based on MapReduce and bulk synchronous parallel (BSP) com puting model. The tool is named Mapreduce and BSP based Graphmining tool (MBGM). The core...This paper proposes an analytical mining tool for big graph data based on MapReduce and bulk synchronous parallel (BSP) com puting model. The tool is named Mapreduce and BSP based Graphmining tool (MBGM). The core of this mining system are four sets of parallel graphmining algorithms programmed in the BSP parallel model and one set of data extractiontransformationload ing (ETE) algorithms implemented in MapReduce. To invoke these algorithm sets, we designed a workflow engine which optimized for cloud computing. Finally, a welldesigned data management function enables users to view, delete and input data in the Ha doop distributed file system (HDFS). Experiments on artificial data show that the components of graphmining algorithm in MBGM are efficient.展开更多
Social networks(SNs)are sources with extreme number of users around the world who are all sharing data like images,audio,and video to their friends using IoT devices.This concept is the so-called Social Internet of Th...Social networks(SNs)are sources with extreme number of users around the world who are all sharing data like images,audio,and video to their friends using IoT devices.This concept is the so-called Social Internet of Things(SIot).The evolving nature of edge-cloud computing has enabled storage of a large volume of data from various sources,and this task demands an efficient storage procedure.For this kind of large volume of data storage,the usage of data replication using edge with geo-distributed cloud service area is suited to fulfill the user’s expectations with low latency.The major issue is the way to store the data and replicate these large data items optimally and allocate the request from the data center efficiently.For efficient storage of these data,we use edge server,which is part of the cloud server,in this study.Thus,the data are distributed and stored with quick access,which will reduce the latency with response.The proposed data placement approach learns with machine learning(ML)algorithm called radial basis kernel function assisted with support vector machine(RBF-SVM)to classify the data center for storing the user and friend’s data from the SIoT devices.These learning algorithms will be used to predict the workload of the data stored in the data center as either edge or cloud depending on the existing time slots.The data placement with dynamic nature is also optimized using the proposed dynamic graph partitioning(GP)method to meet the individual user’s demand of low latency with minimum costs.This way will keep the SIoT data placement efficient and effective over time.Accordingly,this proposed data placement and replication approach introduces three kinds of innovations compared with the existing data placement approach.(i)Rather than storing the user data in a single cloud,this study uses the edge server closest to the SIoT devices for faster access with reduced response time.(ii)The classification algorithm called RBF-SVM is used to find storage for user for reducing data replication.(iii)Dynamic GP is introduced for data placement with reduced latency and minimum cost to fulfil the dynamic nature of the SN.The simulation result of this approach obtains reduced latency of 130 ms and minimum cost compared with those of the existing data placement approaches.Therefore,our proposed data placement with ML-based learning on edge provides promising results in terms of efficiency,effectiveness,and performance with reduced latency and minimum cost.展开更多
The design and implementation of a scalable parallel mining system target for big graph analysis has proven to be challenging. In this study, we propose a parallel data mining system for analyzing big graph data gener...The design and implementation of a scalable parallel mining system target for big graph analysis has proven to be challenging. In this study, we propose a parallel data mining system for analyzing big graph data generated on a Bulk Synchronous Parallel (BSP) computing model named BSP-based Parallel Graph Mining (BPGM). This system has four sets of parallel graph mining algorithms programmed in the BSP parallel model and a well-designed workflow engine optimized for cloud computing to invoke these algorithms. Experimental results show that the graph mining algorithm components in BPGM are efficient and have better performance than big cloud-based parallel data miner and BC-BSP.展开更多
在线社会网络已经成为社会学和信息科学的数据宝库,但是直接分析社会网络数据会造成敏感信息泄漏,对用户隐私构成威胁。传统的基于数据匿名化技术的隐私保护技术面对不断提高的背景攻击显得无能为力。对此,差分隐私作为一种可以严格定...在线社会网络已经成为社会学和信息科学的数据宝库,但是直接分析社会网络数据会造成敏感信息泄漏,对用户隐私构成威胁。传统的基于数据匿名化技术的隐私保护技术面对不断提高的背景攻击显得无能为力。对此,差分隐私作为一种可以严格定义的可量化技术被引入到社会网络的隐私保护中。文中提出一种基于层次随机图(Hierarchical Random Graph)的满足ε-差分隐私的社会网络图发布算法DP-HRGP(Differential Privacy-Hierarchical Random Graph Publishing)。该算法的噪声增加机制分为两个阶段:首先通过指数机制计算HRG结构树的得分,并利用马尔科夫蒙特卡洛(Markov Chain Monte Carlo)方法进行采样得到HRG结构树候选集合,然后通过拉普拉斯机制对稳态采样集合中的HRG的内部节点进行加噪,将加噪后的HRG转化为下三角矩阵,并求出所有稳态采样HRG的下三角均值矩阵,最后,根据均值矩阵内元素值即层次随机图的内部节点的连接概率值生成净化后的社会网络发布图。实验证明了DP-HRGP算法在满足ε-差分隐私的同时具有较好的数据可用性。展开更多
文摘Objective To study the research status,research hotspots and development trends in the field of real-world data(RWD)through social network analysis and knowledge graph analysis.Methods RWD of the past 10 years were retrieved,and literature metrological analysis was made by using UCINET and CiteSpace from CNKI.Results and Conclusion The frequency and centrality of related keywords such as real-world study,hospital information system(HIS),drug combination,data mining and TCM are high.The clusters labeled as clinical medication and RWD contain more keywords.In recent 4 years,there are more articles involving the keywords of data specification,data authenticity,data security and information security.Among them,compound Kushen injection,HIS database and RWD are the top three keywords.It is a long-term research hotspot for Chinese and western medicine to use HIS to study clinical medication,clinical characteristics,diseases and injections.Besides,the research of RWD database has changed from construction to standardized collection and governance,which can make RWD effective.Data authenticity,data security and information security will become the new hotspots in the research of RWD.
基金supported in part by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(NRF-2019R1A2C1006159)and(NRF-2021R1A6A1A03039493)by the 2021 Yeungnam University Research Grant.
文摘The networks are fundamental to our modern world and they appear throughout science and society.Access to a massive amount of data presents a unique opportunity to the researcher’s community.As networks grow in size the complexity increases and our ability to analyze them using the current state of the art is at severe risk of failing to keep pace.Therefore,this paper initiates a discussion on graph signal processing for large-scale data analysis.We first provide a comprehensive overview of core ideas in Graph signal processing(GSP)and their connection to conventional digital signal processing(DSP).We then summarize recent developments in developing basic GSP tools,including methods for graph filtering or graph learning,graph signal,graph Fourier transform(GFT),spectrum,graph frequency,etc.Graph filtering is a basic task that allows for isolating the contribution of individual frequencies and therefore enables the removal of noise.We then consider a graph filter as a model that helps to extend the application of GSP methods to large datasets.To show the suitability and the effeteness,we first created a noisy graph signal and then applied it to the filter.After several rounds of simulation results.We see that the filtered signal appears to be smoother and is closer to the original noise-free distance-based signal.By using this example application,we thoroughly demonstrated that graph filtration is efficient for big data analytics.
基金supported by ZTE Industry-Academia-Research Cooperaton Funds
文摘This paper proposes an analytical mining tool for big graph data based on MapReduce and bulk synchronous parallel (BSP) com puting model. The tool is named Mapreduce and BSP based Graphmining tool (MBGM). The core of this mining system are four sets of parallel graphmining algorithms programmed in the BSP parallel model and one set of data extractiontransformationload ing (ETE) algorithms implemented in MapReduce. To invoke these algorithm sets, we designed a workflow engine which optimized for cloud computing. Finally, a welldesigned data management function enables users to view, delete and input data in the Ha doop distributed file system (HDFS). Experiments on artificial data show that the components of graphmining algorithm in MBGM are efficient.
文摘Social networks(SNs)are sources with extreme number of users around the world who are all sharing data like images,audio,and video to their friends using IoT devices.This concept is the so-called Social Internet of Things(SIot).The evolving nature of edge-cloud computing has enabled storage of a large volume of data from various sources,and this task demands an efficient storage procedure.For this kind of large volume of data storage,the usage of data replication using edge with geo-distributed cloud service area is suited to fulfill the user’s expectations with low latency.The major issue is the way to store the data and replicate these large data items optimally and allocate the request from the data center efficiently.For efficient storage of these data,we use edge server,which is part of the cloud server,in this study.Thus,the data are distributed and stored with quick access,which will reduce the latency with response.The proposed data placement approach learns with machine learning(ML)algorithm called radial basis kernel function assisted with support vector machine(RBF-SVM)to classify the data center for storing the user and friend’s data from the SIoT devices.These learning algorithms will be used to predict the workload of the data stored in the data center as either edge or cloud depending on the existing time slots.The data placement with dynamic nature is also optimized using the proposed dynamic graph partitioning(GP)method to meet the individual user’s demand of low latency with minimum costs.This way will keep the SIoT data placement efficient and effective over time.Accordingly,this proposed data placement and replication approach introduces three kinds of innovations compared with the existing data placement approach.(i)Rather than storing the user data in a single cloud,this study uses the edge server closest to the SIoT devices for faster access with reduced response time.(ii)The classification algorithm called RBF-SVM is used to find storage for user for reducing data replication.(iii)Dynamic GP is introduced for data placement with reduced latency and minimum cost to fulfil the dynamic nature of the SN.The simulation result of this approach obtains reduced latency of 130 ms and minimum cost compared with those of the existing data placement approaches.Therefore,our proposed data placement with ML-based learning on edge provides promising results in terms of efficiency,effectiveness,and performance with reduced latency and minimum cost.
基金supported by the National Key Basic Research and Department (973) Program of China (No. 2013CB329603)the National Natural Science Foundation of China (Nos. 61074128, 61375058, and 71231002)
文摘The design and implementation of a scalable parallel mining system target for big graph analysis has proven to be challenging. In this study, we propose a parallel data mining system for analyzing big graph data generated on a Bulk Synchronous Parallel (BSP) computing model named BSP-based Parallel Graph Mining (BPGM). This system has four sets of parallel graph mining algorithms programmed in the BSP parallel model and a well-designed workflow engine optimized for cloud computing to invoke these algorithms. Experimental results show that the graph mining algorithm components in BPGM are efficient and have better performance than big cloud-based parallel data miner and BC-BSP.
文摘在线社会网络已经成为社会学和信息科学的数据宝库,但是直接分析社会网络数据会造成敏感信息泄漏,对用户隐私构成威胁。传统的基于数据匿名化技术的隐私保护技术面对不断提高的背景攻击显得无能为力。对此,差分隐私作为一种可以严格定义的可量化技术被引入到社会网络的隐私保护中。文中提出一种基于层次随机图(Hierarchical Random Graph)的满足ε-差分隐私的社会网络图发布算法DP-HRGP(Differential Privacy-Hierarchical Random Graph Publishing)。该算法的噪声增加机制分为两个阶段:首先通过指数机制计算HRG结构树的得分,并利用马尔科夫蒙特卡洛(Markov Chain Monte Carlo)方法进行采样得到HRG结构树候选集合,然后通过拉普拉斯机制对稳态采样集合中的HRG的内部节点进行加噪,将加噪后的HRG转化为下三角矩阵,并求出所有稳态采样HRG的下三角均值矩阵,最后,根据均值矩阵内元素值即层次随机图的内部节点的连接概率值生成净化后的社会网络发布图。实验证明了DP-HRGP算法在满足ε-差分隐私的同时具有较好的数据可用性。