Operation control of power systems has become challenging with an increase in the scale and complexity of power distribution systems and extensive access to renewable energy.Therefore,improvement of the ability of dat...Operation control of power systems has become challenging with an increase in the scale and complexity of power distribution systems and extensive access to renewable energy.Therefore,improvement of the ability of data-driven operation management,intelligent analysis,and mining is urgently required.To investigate and explore similar regularities of the historical operating section of the power distribution system and assist the power grid in obtaining high-value historical operation,maintenance experience,and knowledge by rule and line,a neural information retrieval model with an attention mechanism is proposed based on graph data computing technology.Based on the processing flow of the operating data of the power distribution system,a technical framework of neural information retrieval is established.Combined with the natural graph characteristics of the power distribution system,a unified graph data structure and a data fusion method of data access,data complement,and multi-source data are constructed.Further,a graph node feature-embedding representation learning algorithm and a neural information retrieval algorithm model are constructed.The neural information retrieval algorithm model is trained and tested using the generated graph node feature representation vector set.The model is verified on the operating section of the power distribution system of a provincial grid area.The results show that the proposed method demonstrates high accuracy in the similarity matching of historical operation characteristics and effectively supports intelligent fault diagnosis and elimination in power distribution systems.展开更多
The wide application of intelligent terminals in microgrids has fueled the surge of data amount in recent years.In real-world scenarios,microgrids must store large amounts of data efficiently while also being able to ...The wide application of intelligent terminals in microgrids has fueled the surge of data amount in recent years.In real-world scenarios,microgrids must store large amounts of data efficiently while also being able to withstand malicious cyberattacks.To meet the high hardware resource requirements,address the vulnerability to network attacks and poor reliability in the tradi-tional centralized data storage schemes,this paper proposes a secure storage management method for microgrid data that considers node trust and directed acyclic graph(DAG)consensus mechanism.Firstly,the microgrid data storage model is designed based on the edge computing technology.The blockchain,deployed on the edge computing server and combined with cloud storage,ensures reliable data storage in the microgrid.Secondly,a blockchain consen-sus algorithm based on directed acyclic graph data structure is then proposed to effectively improve the data storage timeliness and avoid disadvantages in traditional blockchain topology such as long chain construction time and low consensus efficiency.Finally,considering the tolerance differences among the candidate chain-building nodes to network attacks,a hash value update mechanism of blockchain header with node trust identification to ensure data storage security is proposed.Experimental results from the microgrid data storage platform show that the proposed method can achieve a private key update time of less than 5 milliseconds.When the number of blockchain nodes is less than 25,the blockchain construction takes no more than 80 mins,and the data throughput is close to 300 kbps.Compared with the traditional chain-topology-based consensus methods that do not consider node trust,the proposed method has higher efficiency in data storage and better resistance to network attacks.展开更多
The study investigated user experience, display complexity, display type (tables versus graphs), and task difficulty as variables affecting the user’s ability to navigate through complex visual data. A total of 64 pa...The study investigated user experience, display complexity, display type (tables versus graphs), and task difficulty as variables affecting the user’s ability to navigate through complex visual data. A total of 64 participants, 39 undergraduate students (novice users) and 25 graduate students (intermediate-level users) participated in the study. The experimental design was 2 × 2 × 2 × 3 mixed design using two between-subject variables (display complexity, user experience) and two within-subject variables (display format, question difficulty). The results indicated that response time was superior for graphs (relative to tables), especially when the questions were difficult. The intermediate users seemed to adopt more extensive search strategies than novices, as revealed by an analysis of the number of changes they made to the display prior to answering questions. It was concluded that designers of data displays should consider the (a) type of display, (b) difficulty of the task, and (c) expertise level of the user to obtain optimal levels of performance.展开更多
With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this pap...With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.展开更多
This paper proposes a new primary lazy update protocol, PTCS (Primary Transaction Commit Schedule). In the PTCS protocol, a serializable primary transaction schedule is generated firstly and then the secondary trans...This paper proposes a new primary lazy update protocol, PTCS (Primary Transaction Commit Schedule). In the PTCS protocol, a serializable primary transaction schedule is generated firstly and then the secondary transactions are committed according to the serializable primary transaction schedule. PTCS protocol can guarantee serializability if the data copy graph contains no directed circles. It can also be ex tended to eliminate all requirements on the data copy graph. Compared to earlier works, PTCS protocol not only imposes a much weaker requirement on the data placement, but also avoids the deadlock caused by transaction waits and extra message overhead. The performance experiments show that the degradation of the performance caused by the replica man- agement of the PTCS protocol is tolerable.展开更多
Integrating marketing and distribution businesses is crucial for improving the coordination of equipment and the efficient management of multi-energy systems.New energy sources are continuously being connected to dist...Integrating marketing and distribution businesses is crucial for improving the coordination of equipment and the efficient management of multi-energy systems.New energy sources are continuously being connected to distribution grids;this,however,increases the complexity of the information structure of marketing and distribution businesses.The existing unified data model and the coordinated application of marketing and distribution suffer from various drawbacks.As a solution,this paper presents a data model of"one graph of marketing and distribution"and a framework for graph computing,by analyzing the current trends of business and data in the marketing and distribution fields and using graph data theory.Specifically,this work aims to determine the correlation between distribution transformers and marketing users,which is crucial for elucidating the connection between marketing and distribution.In this manner,a novel identification algorithm is proposed based on the collected data for marketing and distribution.Lastly,a forecasting application is developed based on the proposed algorithm to realize the coordinated prediction and consumption of distributed photovoltaic power generation and distribution loads.Furthermore,an operation and maintenance(O&M)knowledge graph reasoning application is developed to improve the intelligent O&M ability of marketing and distribution equipment.展开更多
This paper proposes an analytical mining tool for big graph data based on MapReduce and bulk synchronous parallel (BSP) com puting model. The tool is named Mapreduce and BSP based Graphmining tool (MBGM). The core...This paper proposes an analytical mining tool for big graph data based on MapReduce and bulk synchronous parallel (BSP) com puting model. The tool is named Mapreduce and BSP based Graphmining tool (MBGM). The core of this mining system are four sets of parallel graphmining algorithms programmed in the BSP parallel model and one set of data extractiontransformationload ing (ETE) algorithms implemented in MapReduce. To invoke these algorithm sets, we designed a workflow engine which optimized for cloud computing. Finally, a welldesigned data management function enables users to view, delete and input data in the Ha doop distributed file system (HDFS). Experiments on artificial data show that the components of graphmining algorithm in MBGM are efficient.展开更多
With the continuous expansion of software applications,people’s requirements for software quality are increasing.Software defect prediction is an important technology to improve software quality.It often encodes the ...With the continuous expansion of software applications,people’s requirements for software quality are increasing.Software defect prediction is an important technology to improve software quality.It often encodes the software into several features and applies the machine learning method to build defect prediction classifiers,which can estimate the software areas is clean or buggy.However,the current encoding methods are mainly based on the traditional manual features or the AST of source code.Traditional manual features are difficult to reflect the deep semantics of programs,and there is a lot of noise information in AST,which affects the expression of semantic features.To overcome the above deficiencies,we combined with the Convolutional Neural Networks(CNN)and proposed a novel compiler Intermediate Representation(IR)based program encoding method for software defect prediction(CIR-CNN).Specifically,our program encoding method is based on the compiler IR,which can eliminate a large amount of noise information in the syntax structure of the source code and facilitate the acquisition of more accurate semantic information.Secondly,with the help of data flow analysis,a Data Dependency Graph(DDG)is constructed on the compiler IR,which helps to capture the deeper semantic information of the program.Finally,we use the widely used CNN model to build a software defect prediction model,which can increase the adaptive ability of the method.To evaluate the performance of the CIR-CNN,we use seven projects from PROMISE datasets to set up comparative experiments.The experiments results show that,in WPDP,with our CIR-CNN method,the prediction accuracy was improved by 12%for the AST-encoded CNN-based model and by 20.9%for the traditional features-based LR model,respectively.And in CPDP,the AST-encoded DBNbased model was improved by 9.1%and the traditional features-based TCA+model by 19.2%,respectively.展开更多
This paper proposes an object oriented model scheduling for parallel computing in media MultiProcessors System on Chip(MPSoC).Firstly,the Coarse Grain Data Flow Graph(CGDFG) parallel programming model is used in this ...This paper proposes an object oriented model scheduling for parallel computing in media MultiProcessors System on Chip(MPSoC).Firstly,the Coarse Grain Data Flow Graph(CGDFG) parallel programming model is used in this approach.Secondly,this approach has the feature of unified abstraction for software objects implementing in processor and hardware objects implementing in ASICs,easy for mapping CGDFG programming on MPSoC.This approach cuts down the kernel overhead and reduces the code size effectively.The principle of the oriented object model,the method of scheduling,and how to map a parallel programming through CGDFG to the MPSoC are analyzed in this approach.This approach also compares the code size and execution cycles with conventional control flow scheduling,and presents respective management overhead for one application in me-dia-SoC.展开更多
The utilization of computation resources and reconfiguration time has a large impact on reconfiguration system performance. In order to promote the performance, a dynamical self-reconfigurable mechanism for data-drive...The utilization of computation resources and reconfiguration time has a large impact on reconfiguration system performance. In order to promote the performance, a dynamical self-reconfigurable mechanism for data-driven cell array is proposed. Cells can be fired only when the needed data arrives, and cell array can be worked on two modes: fixed execution and reconfiguration. On reconfiguration mode, cell function and data flow direction are changed automatically at run time according to contexts. Simultaneously using an H-tree interconnection network, through pre-storing multiple application mapping contexts in reconfiguration buffer, multiple applications can execute concurrently and context switching time is the minimal. For verifying system performance, some algorithms are selected for mapping onto the proposed structure, and the amount of configuration contexts and execution time are recorded for statistical analysis. The results show that the proposed self-reconfigurable mechanism can reduce the number of contexts efficiently, and has a low computing time.展开更多
In this paper, a Graph-based semantic Data Model (GDM) is proposed with the primary objective of bridging the gap between the human perception of an enterprise and the needs of computing infrastructure to organize i...In this paper, a Graph-based semantic Data Model (GDM) is proposed with the primary objective of bridging the gap between the human perception of an enterprise and the needs of computing infrastructure to organize information in some particular manner for efficient storage and retrieval. The Graph Data Model (GDM) has been proposed as an alternative data model to combine the advantages of the relational model with the positive features of semantic data models. The proposed GDM offers a structural representation for interacting to the designer, making it always easy to comprehend the complex relations amongst basic data items. GDM allows an entire database to be viewed as a Graph (V, E) in a layered organization. Here, a graph is created in a bottom up fashion where V represents the basic instances of data or a functionally abstracted module, called primary semantic group (PSG) and secondary semantic group (SSG). An edge in the model implies the relationship among the secondary semantic groups. The contents of the lowest layer are the semantically grouped data values in the form of primary semantic groups. The SSGs are nothing but the higher-level abstraction and are created by the method of encapsulation of various PSGs, SSGs and basic data elements. This encapsulation methodology to provide a higher-level abstraction continues generating various secondary semantic groups until the designer thinks that it is sufficient to declare the actual problem domain. GDM, thus, uses standard abstractions available in a semantic data model with a structural representation in terms of a graph. The operations on the data model are formalized in the proposed graph algebra. A Graph Query Language (GQL) is also developed, maintaining similarity with the widely accepted user-friendly SQL. Finally, the paper also presents the methodology to make this GDM compatible with the distributed environment, and a corresponding query processing technique for distributed environment is also suggested for the sake of completeness.展开更多
Due to limitations in geometric representation and semantic description, the current pedestrian route analysis models are inadequate. To express the geometry of geographic entities in a micro-spatial environment accur...Due to limitations in geometric representation and semantic description, the current pedestrian route analysis models are inadequate. To express the geometry of geographic entities in a micro-spatial environment accurately, the concept of a grid is presented, and grid-based methods for modeling geospatial objects are described. The semantic constitution of a building environment and the methods for modeling rooms, corridors, and staircases with grid objects are described. Based on the topology relationship between grid objects, a grid-based graph for a building environment is presented, and the corresponding route algorithm for pedestrians is proposed. The main advantages of the graph model proposed in this paper are as follows: 1) consideration of both semantic and geometric information, 2) consideration of the need for accurate geometric representation of the micro-spatial environment and the efficiency of pedestrian route analysis, 3) applicability of the graph model to route analysis in both static and dynamic environments, and 4) ability of the multi-hierarchical route analysis to integrate the multiple levels of pedestrian decision characteristics, from the high to the low, to determine the optimal path.展开更多
Graph data publication has been considered as an important step for data analysis and mining.Graph data,which provide knowledge on interactions among entities,can be locally generated and held by distributed data owne...Graph data publication has been considered as an important step for data analysis and mining.Graph data,which provide knowledge on interactions among entities,can be locally generated and held by distributed data owners.These data are usually sensitive and private,because they may be related to owners’personal activities and can be hijacked by adversaries to conduct inference attacks.Current solutions either consider private graph data as centralized contents or disregard the overlapping of graphs in distributed manners.Therefore,this work proposes a novel framework for distributed graph publication.In this framework,differential privacy is applied to justify the safety of the published contents.It includes four phases,i.e.,graph combination,plan construction sharing,data perturbation,and graph reconstruction.The published graph selection is guided by one data coordinator,and each graph is perturbed carefully with the Laplace mechanism.The problem of graph selection is formulated and proven to be NP-complete.Then,a heuristic algorithm is proposed for selection.The correctness of the combined graph and the differential privacy on all edges are analyzed.This study also discusses a scenario without a data coordinator and proposes some insights into graph publication.展开更多
This paper studies the most similar maximal clique query(MSMCQ).Given a graph G and a set of nodes Q,MSMCQ is to find the maximal clique of G having the largest similarity with Q.MSMCQ has many real applications inclu...This paper studies the most similar maximal clique query(MSMCQ).Given a graph G and a set of nodes Q,MSMCQ is to find the maximal clique of G having the largest similarity with Q.MSMCQ has many real applications including advertising industry,public security,task crowdsourcing and social network,etc.MSMCQ can be studied as a special case of the general set similarity query(SSQ).However,the MCs of G has several specialties from the general sets.Based on the specialties of MCs,we propose a novel index,namely MCIndex.MCIndex outperforms the state-of-the-art SSQ method significantly in terms of the number of candidates and the query time.Specifically,we first construct an inverted indexⅠfor all the MCs of G.Since the MCs in a posting list often have a lot of overlaps,MCIndex selects some pivots to cluster the MCs with a small radius.Given a query Q,we compute the distance from the pivots to Q.The clusters of the pivots assured not answer can be pruned by our distance based pruning rule.Since it is NP-hard to construct a minimum MCIndex,we propose to construct a minimal MCIndex onⅠ(v)with an approximation ratio 1+ln|Ⅰ(v)|.Since the MCs have properties that are inherent of graph structure,we further propose a S Index within each cluster of a MCIndex and a structure based pruning rule.S Index can significantly reduce the number of candidates.Since the sizes of intersections between Q and many MCs need to be computed during the query evaluation,we also propose a binary representation of MCs to improve the efficiency of the intersection size computation.Our extensive experiments confirm the effectiveness and efficiency of our proposed techniques on several real-world datasets.展开更多
Malware is emerging day by day.To evade detection,many malware obfuscation techniques have emerged.Dynamicmalware detectionmethods based on data flow graphs have attracted much attention since they can deal with the o...Malware is emerging day by day.To evade detection,many malware obfuscation techniques have emerged.Dynamicmalware detectionmethods based on data flow graphs have attracted much attention since they can deal with the obfuscation problem to a certain extent.Many malware classification methods based on data flow graphs have been proposed.Some of them are based on userdefined features or graph similarity of data flow graphs.Graph neural networks have also recently been used to implement malware classification recently.This paper provides an overview of current data flow graph-based malware classification methods.Their respective advantages and disadvantages are summarized as well.In addition,the future trend of the data flow graph-based malware classification method is analyzed,which is of great significance for promoting the development of malware detection technology.展开更多
Being a kind of non-Euclidean data,spatiotemporal graph data exists everywhere from trafficflow,air quality index to crime case,etc.Unlike the raster data,the irregular and disordered characteristics of spatiotemporal...Being a kind of non-Euclidean data,spatiotemporal graph data exists everywhere from trafficflow,air quality index to crime case,etc.Unlike the raster data,the irregular and disordered characteristics of spatiotemporal graph data have attracted the research interest of scholars,with the prediction of spatiotemporal graph data being one of the research hot spots.The emergence of spatiotemporal graph neural networks(ST-GNNs)provides a new insight for solving the problem of obtaining spatial correlation for spatiotemporal graph data prediction while achieving state-of-the-art performance.In this paper,comprehensive survey of research on ST-GNNs prediction domain isa presented,where the background of ST-GNNs is introduced before the computational paradigm of ST-GNN is thoroughly reviewed.From the perspective of model construction,59 well-known models in recent years are classified and discussed.Some of these models are further analyzed in terms of performance and efficiency.Subsequently,the categories and applicationfields of spatiotemporal graph data are summarized,providing a clear idea of technology selection for different applications.Finally,the evolution history and future direction of ST-GNNs are also summarized,to facilitate future researchers to timely understand the current state of prediction research by ST-GNNs.展开更多
The design and implementation of a scalable parallel mining system target for big graph analysis has proven to be challenging. In this study, we propose a parallel data mining system for analyzing big graph data gener...The design and implementation of a scalable parallel mining system target for big graph analysis has proven to be challenging. In this study, we propose a parallel data mining system for analyzing big graph data generated on a Bulk Synchronous Parallel (BSP) computing model named BSP-based Parallel Graph Mining (BPGM). This system has four sets of parallel graph mining algorithms programmed in the BSP parallel model and a well-designed workflow engine optimized for cloud computing to invoke these algorithms. Experimental results show that the graph mining algorithm components in BPGM are efficient and have better performance than big cloud-based parallel data miner and BC-BSP.展开更多
This paper focuses on the design process for reconfigurable architecture. Our contribution focuses on introducing a new temporal partitioning algorithm. Our algorithm is based on typical mathematic flow to solve the t...This paper focuses on the design process for reconfigurable architecture. Our contribution focuses on introducing a new temporal partitioning algorithm. Our algorithm is based on typical mathematic flow to solve the temporal partitioning problem. This algorithm optimizes the transfer of data required between design partitions and the reconfiguration overhead. Results show that our algorithm considerably decreases the communication cost and the latency compared with other well known algorithms.展开更多
The design of the infrastructure for Chinese Web(CWI),a prototype system aimed at forum data analysis,is introduced.CWI takes a best effort approach.1)It tries its best to extract or annotate semantics over the web da...The design of the infrastructure for Chinese Web(CWI),a prototype system aimed at forum data analysis,is introduced.CWI takes a best effort approach.1)It tries its best to extract or annotate semantics over the web data.2)It provides flexible schemes for users to transform the web data into eXtensible Markup Language(XML)forms with more semantic annotations that are more friendly for further analytical tasks.3)A distributed graph repository,called DISGR is used as backend for management of web data.The paper introduces the design issues,reports the progress of the implementation,and discusses the research issues that are under study.展开更多
By skeptics and undecided we refer to nodes in clustered social networks that cannot be assigned easily to any of the clusters.Such nodes are typically found either at the interface between clusters(the undecided)or a...By skeptics and undecided we refer to nodes in clustered social networks that cannot be assigned easily to any of the clusters.Such nodes are typically found either at the interface between clusters(the undecided)or at their boundaries(the skeptics).Identifying these nodes is relevant in marketing applications like voter targeting,because the persons represented by such nodes are often more likely to be affected in marketing campaigns than nodes deeply within clusters.So far this identification task is not as well studied as other network analysis tasks like clustering,identifying central nodes,and detecting motifs.We approach this task by deriving novel geometric features from the network structure that naturally lend themselves to an interactive visual approach for identifying interface and boundary nodes.展开更多
基金supported by the National Key R&D Program of China(2020YFB0905900).
文摘Operation control of power systems has become challenging with an increase in the scale and complexity of power distribution systems and extensive access to renewable energy.Therefore,improvement of the ability of data-driven operation management,intelligent analysis,and mining is urgently required.To investigate and explore similar regularities of the historical operating section of the power distribution system and assist the power grid in obtaining high-value historical operation,maintenance experience,and knowledge by rule and line,a neural information retrieval model with an attention mechanism is proposed based on graph data computing technology.Based on the processing flow of the operating data of the power distribution system,a technical framework of neural information retrieval is established.Combined with the natural graph characteristics of the power distribution system,a unified graph data structure and a data fusion method of data access,data complement,and multi-source data are constructed.Further,a graph node feature-embedding representation learning algorithm and a neural information retrieval algorithm model are constructed.The neural information retrieval algorithm model is trained and tested using the generated graph node feature representation vector set.The model is verified on the operating section of the power distribution system of a provincial grid area.The results show that the proposed method demonstrates high accuracy in the similarity matching of historical operation characteristics and effectively supports intelligent fault diagnosis and elimination in power distribution systems.
文摘The wide application of intelligent terminals in microgrids has fueled the surge of data amount in recent years.In real-world scenarios,microgrids must store large amounts of data efficiently while also being able to withstand malicious cyberattacks.To meet the high hardware resource requirements,address the vulnerability to network attacks and poor reliability in the tradi-tional centralized data storage schemes,this paper proposes a secure storage management method for microgrid data that considers node trust and directed acyclic graph(DAG)consensus mechanism.Firstly,the microgrid data storage model is designed based on the edge computing technology.The blockchain,deployed on the edge computing server and combined with cloud storage,ensures reliable data storage in the microgrid.Secondly,a blockchain consen-sus algorithm based on directed acyclic graph data structure is then proposed to effectively improve the data storage timeliness and avoid disadvantages in traditional blockchain topology such as long chain construction time and low consensus efficiency.Finally,considering the tolerance differences among the candidate chain-building nodes to network attacks,a hash value update mechanism of blockchain header with node trust identification to ensure data storage security is proposed.Experimental results from the microgrid data storage platform show that the proposed method can achieve a private key update time of less than 5 milliseconds.When the number of blockchain nodes is less than 25,the blockchain construction takes no more than 80 mins,and the data throughput is close to 300 kbps.Compared with the traditional chain-topology-based consensus methods that do not consider node trust,the proposed method has higher efficiency in data storage and better resistance to network attacks.
文摘The study investigated user experience, display complexity, display type (tables versus graphs), and task difficulty as variables affecting the user’s ability to navigate through complex visual data. A total of 64 participants, 39 undergraduate students (novice users) and 25 graduate students (intermediate-level users) participated in the study. The experimental design was 2 × 2 × 2 × 3 mixed design using two between-subject variables (display complexity, user experience) and two within-subject variables (display format, question difficulty). The results indicated that response time was superior for graphs (relative to tables), especially when the questions were difficult. The intermediate users seemed to adopt more extensive search strategies than novices, as revealed by an analysis of the number of changes they made to the display prior to answering questions. It was concluded that designers of data displays should consider the (a) type of display, (b) difficulty of the task, and (c) expertise level of the user to obtain optimal levels of performance.
基金supported in part by the Fundamental Research Funds for the Central Universities under Grant No.2013RC0114111 Project of China under Grant No.B08004
文摘With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.
基金Supported by Visiting Scholar Foundation of KeyLabin University and National Lab of Switching Technology and Tele-communication Networks ([2000]123)
文摘This paper proposes a new primary lazy update protocol, PTCS (Primary Transaction Commit Schedule). In the PTCS protocol, a serializable primary transaction schedule is generated firstly and then the secondary transactions are committed according to the serializable primary transaction schedule. PTCS protocol can guarantee serializability if the data copy graph contains no directed circles. It can also be ex tended to eliminate all requirements on the data copy graph. Compared to earlier works, PTCS protocol not only imposes a much weaker requirement on the data placement, but also avoids the deadlock caused by transaction waits and extra message overhead. The performance experiments show that the degradation of the performance caused by the replica man- agement of the PTCS protocol is tolerable.
基金This work was supported by the National Key R&D Program of China(2020YFB0905900).
文摘Integrating marketing and distribution businesses is crucial for improving the coordination of equipment and the efficient management of multi-energy systems.New energy sources are continuously being connected to distribution grids;this,however,increases the complexity of the information structure of marketing and distribution businesses.The existing unified data model and the coordinated application of marketing and distribution suffer from various drawbacks.As a solution,this paper presents a data model of"one graph of marketing and distribution"and a framework for graph computing,by analyzing the current trends of business and data in the marketing and distribution fields and using graph data theory.Specifically,this work aims to determine the correlation between distribution transformers and marketing users,which is crucial for elucidating the connection between marketing and distribution.In this manner,a novel identification algorithm is proposed based on the collected data for marketing and distribution.Lastly,a forecasting application is developed based on the proposed algorithm to realize the coordinated prediction and consumption of distributed photovoltaic power generation and distribution loads.Furthermore,an operation and maintenance(O&M)knowledge graph reasoning application is developed to improve the intelligent O&M ability of marketing and distribution equipment.
基金supported by ZTE Industry-Academia-Research Cooperaton Funds
文摘This paper proposes an analytical mining tool for big graph data based on MapReduce and bulk synchronous parallel (BSP) com puting model. The tool is named Mapreduce and BSP based Graphmining tool (MBGM). The core of this mining system are four sets of parallel graphmining algorithms programmed in the BSP parallel model and one set of data extractiontransformationload ing (ETE) algorithms implemented in MapReduce. To invoke these algorithm sets, we designed a workflow engine which optimized for cloud computing. Finally, a welldesigned data management function enables users to view, delete and input data in the Ha doop distributed file system (HDFS). Experiments on artificial data show that the components of graphmining algorithm in MBGM are efficient.
基金This work was supported by the Universities Natural Science Research Project of Jiangsu Province under Grant 20KJB520026 and 20KJA520002the Foundation for Young Teachers of Nanjing Auditing University under Grant 19QNPY018the National Nature Science Foundation of China under Grant 71972102 and 61902189.
文摘With the continuous expansion of software applications,people’s requirements for software quality are increasing.Software defect prediction is an important technology to improve software quality.It often encodes the software into several features and applies the machine learning method to build defect prediction classifiers,which can estimate the software areas is clean or buggy.However,the current encoding methods are mainly based on the traditional manual features or the AST of source code.Traditional manual features are difficult to reflect the deep semantics of programs,and there is a lot of noise information in AST,which affects the expression of semantic features.To overcome the above deficiencies,we combined with the Convolutional Neural Networks(CNN)and proposed a novel compiler Intermediate Representation(IR)based program encoding method for software defect prediction(CIR-CNN).Specifically,our program encoding method is based on the compiler IR,which can eliminate a large amount of noise information in the syntax structure of the source code and facilitate the acquisition of more accurate semantic information.Secondly,with the help of data flow analysis,a Data Dependency Graph(DDG)is constructed on the compiler IR,which helps to capture the deeper semantic information of the program.Finally,we use the widely used CNN model to build a software defect prediction model,which can increase the adaptive ability of the method.To evaluate the performance of the CIR-CNN,we use seven projects from PROMISE datasets to set up comparative experiments.The experiments results show that,in WPDP,with our CIR-CNN method,the prediction accuracy was improved by 12%for the AST-encoded CNN-based model and by 20.9%for the traditional features-based LR model,respectively.And in CPDP,the AST-encoded DBNbased model was improved by 9.1%and the traditional features-based TCA+model by 19.2%,respectively.
基金Supported by National Natural Science Foundation ofChina (No.60873112)
文摘This paper proposes an object oriented model scheduling for parallel computing in media MultiProcessors System on Chip(MPSoC).Firstly,the Coarse Grain Data Flow Graph(CGDFG) parallel programming model is used in this approach.Secondly,this approach has the feature of unified abstraction for software objects implementing in processor and hardware objects implementing in ASICs,easy for mapping CGDFG programming on MPSoC.This approach cuts down the kernel overhead and reduces the code size effectively.The principle of the oriented object model,the method of scheduling,and how to map a parallel programming through CGDFG to the MPSoC are analyzed in this approach.This approach also compares the code size and execution cycles with conventional control flow scheduling,and presents respective management overhead for one application in me-dia-SoC.
基金the National Natural Science Foundation of China (Nos. 61802304, 61834005, 61772417, 61634004, and 61602377)the Shaanxi Provincial Co-ordination Innovation Project of Science and Technology (No. 2016KTZDGY02-04-02)。
文摘The utilization of computation resources and reconfiguration time has a large impact on reconfiguration system performance. In order to promote the performance, a dynamical self-reconfigurable mechanism for data-driven cell array is proposed. Cells can be fired only when the needed data arrives, and cell array can be worked on two modes: fixed execution and reconfiguration. On reconfiguration mode, cell function and data flow direction are changed automatically at run time according to contexts. Simultaneously using an H-tree interconnection network, through pre-storing multiple application mapping contexts in reconfiguration buffer, multiple applications can execute concurrently and context switching time is the minimal. For verifying system performance, some algorithms are selected for mapping onto the proposed structure, and the amount of configuration contexts and execution time are recorded for statistical analysis. The results show that the proposed self-reconfigurable mechanism can reduce the number of contexts efficiently, and has a low computing time.
文摘In this paper, a Graph-based semantic Data Model (GDM) is proposed with the primary objective of bridging the gap between the human perception of an enterprise and the needs of computing infrastructure to organize information in some particular manner for efficient storage and retrieval. The Graph Data Model (GDM) has been proposed as an alternative data model to combine the advantages of the relational model with the positive features of semantic data models. The proposed GDM offers a structural representation for interacting to the designer, making it always easy to comprehend the complex relations amongst basic data items. GDM allows an entire database to be viewed as a Graph (V, E) in a layered organization. Here, a graph is created in a bottom up fashion where V represents the basic instances of data or a functionally abstracted module, called primary semantic group (PSG) and secondary semantic group (SSG). An edge in the model implies the relationship among the secondary semantic groups. The contents of the lowest layer are the semantically grouped data values in the form of primary semantic groups. The SSGs are nothing but the higher-level abstraction and are created by the method of encapsulation of various PSGs, SSGs and basic data elements. This encapsulation methodology to provide a higher-level abstraction continues generating various secondary semantic groups until the designer thinks that it is sufficient to declare the actual problem domain. GDM, thus, uses standard abstractions available in a semantic data model with a structural representation in terms of a graph. The operations on the data model are formalized in the proposed graph algebra. A Graph Query Language (GQL) is also developed, maintaining similarity with the widely accepted user-friendly SQL. Finally, the paper also presents the methodology to make this GDM compatible with the distributed environment, and a corresponding query processing technique for distributed environment is also suggested for the sake of completeness.
基金supported by National Natural Science Foundation of China(Nos.41571387,41201375 and 41501440)Tianjin Research Program of Application Foundation and Advanced Technology(No.14JCQNJC07900)+1 种基金Tianjin Science and Technology Planning Project(Nos.15ZCZDSF00390 and 14TXGCCX00015)Opening Fund of Tianjin Engineering Research Center of Geospatial Information Technology"Modeling and analysis of path graph in 3D indoor spatial environment"
文摘Due to limitations in geometric representation and semantic description, the current pedestrian route analysis models are inadequate. To express the geometry of geographic entities in a micro-spatial environment accurately, the concept of a grid is presented, and grid-based methods for modeling geospatial objects are described. The semantic constitution of a building environment and the methods for modeling rooms, corridors, and staircases with grid objects are described. Based on the topology relationship between grid objects, a grid-based graph for a building environment is presented, and the corresponding route algorithm for pedestrians is proposed. The main advantages of the graph model proposed in this paper are as follows: 1) consideration of both semantic and geometric information, 2) consideration of the need for accurate geometric representation of the micro-spatial environment and the efficiency of pedestrian route analysis, 3) applicability of the graph model to route analysis in both static and dynamic environments, and 4) ability of the multi-hierarchical route analysis to integrate the multiple levels of pedestrian decision characteristics, from the high to the low, to determine the optimal path.
基金supported by the National Natural Science Foundation of China(Nos.U19A2059 and 61802050)Ministry of Science and Technology of Sichuan Province Program(Nos.2021YFG0018 and 20ZDYF0343)。
文摘Graph data publication has been considered as an important step for data analysis and mining.Graph data,which provide knowledge on interactions among entities,can be locally generated and held by distributed data owners.These data are usually sensitive and private,because they may be related to owners’personal activities and can be hijacked by adversaries to conduct inference attacks.Current solutions either consider private graph data as centralized contents or disregard the overlapping of graphs in distributed manners.Therefore,this work proposes a novel framework for distributed graph publication.In this framework,differential privacy is applied to justify the safety of the published contents.It includes four phases,i.e.,graph combination,plan construction sharing,data perturbation,and graph reconstruction.The published graph selection is guided by one data coordinator,and each graph is perturbed carefully with the Laplace mechanism.The problem of graph selection is formulated and proven to be NP-complete.Then,a heuristic algorithm is proposed for selection.The correctness of the combined graph and the differential privacy on all edges are analyzed.This study also discusses a scenario without a data coordinator and proposes some insights into graph publication.
基金support of NSF of China(61502258,61806105)Major Technology Innovation Project of Shandong(2018CXGC0703)+2 种基金NSF of Shandong Province(ZR2014FQ007)the Project of Shandong Finance Society(2018SDJR31)Soft Science Fund of Shandong Province(2018RKB01373,2016RKB01043).
文摘This paper studies the most similar maximal clique query(MSMCQ).Given a graph G and a set of nodes Q,MSMCQ is to find the maximal clique of G having the largest similarity with Q.MSMCQ has many real applications including advertising industry,public security,task crowdsourcing and social network,etc.MSMCQ can be studied as a special case of the general set similarity query(SSQ).However,the MCs of G has several specialties from the general sets.Based on the specialties of MCs,we propose a novel index,namely MCIndex.MCIndex outperforms the state-of-the-art SSQ method significantly in terms of the number of candidates and the query time.Specifically,we first construct an inverted indexⅠfor all the MCs of G.Since the MCs in a posting list often have a lot of overlaps,MCIndex selects some pivots to cluster the MCs with a small radius.Given a query Q,we compute the distance from the pivots to Q.The clusters of the pivots assured not answer can be pruned by our distance based pruning rule.Since it is NP-hard to construct a minimum MCIndex,we propose to construct a minimal MCIndex onⅠ(v)with an approximation ratio 1+ln|Ⅰ(v)|.Since the MCs have properties that are inherent of graph structure,we further propose a S Index within each cluster of a MCIndex and a structure based pruning rule.S Index can significantly reduce the number of candidates.Since the sizes of intersections between Q and many MCs need to be computed during the query evaluation,we also propose a binary representation of MCs to improve the efficiency of the intersection size computation.Our extensive experiments confirm the effectiveness and efficiency of our proposed techniques on several real-world datasets.
文摘Malware is emerging day by day.To evade detection,many malware obfuscation techniques have emerged.Dynamicmalware detectionmethods based on data flow graphs have attracted much attention since they can deal with the obfuscation problem to a certain extent.Many malware classification methods based on data flow graphs have been proposed.Some of them are based on userdefined features or graph similarity of data flow graphs.Graph neural networks have also recently been used to implement malware classification recently.This paper provides an overview of current data flow graph-based malware classification methods.Their respective advantages and disadvantages are summarized as well.In addition,the future trend of the data flow graph-based malware classification method is analyzed,which is of great significance for promoting the development of malware detection technology.
基金supported by National Social Science Fund of China[grant number 21JCA004]Soft Science Research Project of Ministry of Housing and Urban-Rural Development of China[grant number R20200287]Open Research Fund of Key Laboratory of Digital Cartography and Land Information Application,Ministry of Natural Resources[grant number ZRZYBWD202102].
文摘Being a kind of non-Euclidean data,spatiotemporal graph data exists everywhere from trafficflow,air quality index to crime case,etc.Unlike the raster data,the irregular and disordered characteristics of spatiotemporal graph data have attracted the research interest of scholars,with the prediction of spatiotemporal graph data being one of the research hot spots.The emergence of spatiotemporal graph neural networks(ST-GNNs)provides a new insight for solving the problem of obtaining spatial correlation for spatiotemporal graph data prediction while achieving state-of-the-art performance.In this paper,comprehensive survey of research on ST-GNNs prediction domain isa presented,where the background of ST-GNNs is introduced before the computational paradigm of ST-GNN is thoroughly reviewed.From the perspective of model construction,59 well-known models in recent years are classified and discussed.Some of these models are further analyzed in terms of performance and efficiency.Subsequently,the categories and applicationfields of spatiotemporal graph data are summarized,providing a clear idea of technology selection for different applications.Finally,the evolution history and future direction of ST-GNNs are also summarized,to facilitate future researchers to timely understand the current state of prediction research by ST-GNNs.
基金supported by the National Key Basic Research and Department (973) Program of China (No. 2013CB329603)the National Natural Science Foundation of China (Nos. 61074128, 61375058, and 71231002)
文摘The design and implementation of a scalable parallel mining system target for big graph analysis has proven to be challenging. In this study, we propose a parallel data mining system for analyzing big graph data generated on a Bulk Synchronous Parallel (BSP) computing model named BSP-based Parallel Graph Mining (BPGM). This system has four sets of parallel graph mining algorithms programmed in the BSP parallel model and a well-designed workflow engine optimized for cloud computing to invoke these algorithms. Experimental results show that the graph mining algorithm components in BPGM are efficient and have better performance than big cloud-based parallel data miner and BC-BSP.
文摘This paper focuses on the design process for reconfigurable architecture. Our contribution focuses on introducing a new temporal partitioning algorithm. Our algorithm is based on typical mathematic flow to solve the temporal partitioning problem. This algorithm optimizes the transfer of data required between design partitions and the reconfiguration overhead. Results show that our algorithm considerably decreases the communication cost and the latency compared with other well known algorithms.
基金This work was partially supported by the National Natural Science Foundation of China(Grant Nos.60833003 and 61070051)the National Basic Research Program of China(Grant No.2010CB731402).
文摘The design of the infrastructure for Chinese Web(CWI),a prototype system aimed at forum data analysis,is introduced.CWI takes a best effort approach.1)It tries its best to extract or annotate semantics over the web data.2)It provides flexible schemes for users to transform the web data into eXtensible Markup Language(XML)forms with more semantic annotations that are more friendly for further analytical tasks.3)A distributed graph repository,called DISGR is used as backend for management of web data.The paper introduces the design issues,reports the progress of the implementation,and discusses the research issues that are under study.
文摘By skeptics and undecided we refer to nodes in clustered social networks that cannot be assigned easily to any of the clusters.Such nodes are typically found either at the interface between clusters(the undecided)or at their boundaries(the skeptics).Identifying these nodes is relevant in marketing applications like voter targeting,because the persons represented by such nodes are often more likely to be affected in marketing campaigns than nodes deeply within clusters.So far this identification task is not as well studied as other network analysis tasks like clustering,identifying central nodes,and detecting motifs.We approach this task by deriving novel geometric features from the network structure that naturally lend themselves to an interactive visual approach for identifying interface and boundary nodes.