The construction of new power systems presents higher requirements for the Power Internet of Things(PIoT)technology.The“source-grid-load-storage”architecture of a new power system requires PIoT to have a stronger mu...The construction of new power systems presents higher requirements for the Power Internet of Things(PIoT)technology.The“source-grid-load-storage”architecture of a new power system requires PIoT to have a stronger multi-source heterogeneous data fusion ability.Native graph databases have great advantages in dealing with multi-source heterogeneous data,which make them suitable for an increasing number of analytical computing tasks.However,only few existing graph database products have native support for matrix operation-related interfaces or functions,resulting in low efficiency when handling matrix calculations that are commonly encountered in power grids.In this paper,the matrix computation process is expressed by a strategy called graph description,which relies on the natural connection between the matrix and structure of the graph.Based on that,we implement matrix operations on graph database,including matrix multiplication,matrix decomposition,etc.Specifically,only the nodes relevant to the computation and their neighbors are concerned in the process,which prunes the influence of zero elements in the matrix and avoids useless iterations compared to the conventional matrix computation.Based on the graph description,a series of power grid computations can be implemented on graph database,which reduces redundant data import and export operations while leveraging the parallel computing capability of graph database.It promotes the efficiency of PIoT when handling multi-source heterogeneous data.An comprehensive experimental study over two different scale power system datasets compares the proposed method with Python and MATLAB baselines.The results reveal the superior performance of our proposed method in both power flow and N-1 contingency computations.展开更多
The query optimizer uses cost-based optimization to create an execution plan with the least cost,which also consumes the least amount of resources.The challenge of query optimization for relational database systems is...The query optimizer uses cost-based optimization to create an execution plan with the least cost,which also consumes the least amount of resources.The challenge of query optimization for relational database systems is a combinatorial optimization problem,which renders exhaustive search impossible as query sizes rise.Increases in CPU performance have surpassed main memory,and disk access speeds in recent decades,allowing data compression to be used—strategies for improving database performance systems.For performance enhancement,compression and query optimization are the two most factors.Compression reduces the volume of data,whereas query optimization minimizes execution time.Compressing the database reduces memory requirement,data takes less time to load into memory,fewer buffer missing occur,and the size of intermediate results is more diminutive.This paper performed query optimization on the graph database in a cloud dew environment by considering,which requires less time to execute a query.The factors compression and query optimization improve the performance of the databases.This research compares the performance of MySQL and Neo4j databases in terms of memory usage and execution time running on cloud dew servers.展开更多
In this systems paper,we present MillenniumDB:a novel graph database engine that is modular,persistent,and open source.MillenniumDB is based on a graph data model,which we call domain graphs,that provides a simple abs...In this systems paper,we present MillenniumDB:a novel graph database engine that is modular,persistent,and open source.MillenniumDB is based on a graph data model,which we call domain graphs,that provides a simple abstraction upon which a variety of popular graph models can be supported,thus providing a flexible data management engine for diverse types of knowledge graph.The engine itself is founded on a combination of tried and tested techniques from relational data management,state-of-the-art algorithms for worst-case-optimal joins,as well as graph-specific algorithms for evaluating path queries.In this paper,we present the main design principles underlying MillenniumDB,describing the abstract graph model and query semantics supported,the concrete data model and query syntax implemented,as well as the storage,indexing,query planning and query evaluation techniques used.We evaluate MillenniumDB over real-world data and queries from the Wikidata knowledge graph,where we find that it outperforms other popular persistent graph database engines(including both enterprise and open source alternatives)that support similarqueryfeatures.展开更多
The power communication network can be abstracted as a graph based on its topology. In this paper, we propose an approach to conduct simulations of power communication network based on its graph representation. In par...The power communication network can be abstracted as a graph based on its topology. In this paper, we propose an approach to conduct simulations of power communication network based on its graph representation. In particular, the nodes and edges in the graph refer to the ports and channels in the grid topology. Different applications on the grid can be transformed into queries over the graph. Hence, in this paper, we build our grid simulation model based on the Neo4 j graph database. We also propose a fault extension algorithm based on predicate calculus. Our experiment evaluations show that the proposed approach can effectively improve the efficiency of the power grid.展开更多
With the rapid growth in the availability of digital health-related data,there is a great demand for the utilization of intelligent information systems within the healthcare sector.These systems can manage and manipul...With the rapid growth in the availability of digital health-related data,there is a great demand for the utilization of intelligent information systems within the healthcare sector.These systems can manage and manipulate this massive amount of health-related data and encourage different decision-making tasks.They can also provide various sustainable health services such as medical error reduction,diagnosis acceleration,and clinical services quality improvement.The intensive care unit(ICU)is one of the most important hospital units.However,there are limited rooms and resources in most hospitals.During times of seasonal diseases and pandemics,ICUs face high admission demand.In line with this increasing number of admissions,determining health risk levels has become an essential and imperative task.It creates a heightened demand for the implementation of an expert decision support system,enabling doctors to accurately and swiftly determine the risk level of patients.Therefore,this study proposes a fuzzy logic inference system built on domain-specific knowledge graphs,as a proof-of-concept,for tackling this healthcare-related issue.The system employs a combination of two sets of fuzzy input parameters to classify health risk levels of new admissions to hospitals.The proposed system implemented utilizes MATLAB Fuzzy Logic Toolbox via several experiments showing the validity of the proposed system.展开更多
Objective To establish the knowledge graph of“disease-syndrome-symptom-method-formula”in Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)for reducing the fuzziness and uncertainty of data,and for laying a foun...Objective To establish the knowledge graph of“disease-syndrome-symptom-method-formula”in Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)for reducing the fuzziness and uncertainty of data,and for laying a foundation for later knowledge reasoning and its application.Methods Under the guidance of experts in the classical formula of traditional Chinese medicine(TCM),the method of“top-down as the main,bottom-up as the auxiliary”was adopted to carry out knowledge extraction,knowledge fusion,and knowledge storage from the five aspects of the disease,syndrome,symptom,method,and formula for the original text of Treatise on Febrile Diseases,and so the knowledge graph of Treatise on Febrile Diseases was constructed.On this basis,the knowledge structure query and the knowledge relevance query were realized in a visual manner.Results The knowledge graph of“disease-syndrome-symptom-method-formula”in the Treatise on Febrile Diseases was constructed,containing 6469 entities and 10911 relational triples,on which the query of entities and their relationships can be carried out and the query result can be visualized.Conclusion The knowledge graph of Treatise on Febrile Diseases systematically realizes its digitization of the knowledge system,and improves the completeness and accuracy of the knowledge representation,and the connection between“disease-syndrome-symptom-treatment-formula”,which is conducive to the sharing and reuse of knowledge can be obtained in a clear and efficient way.展开更多
This work presents the design of an Internet of Things(IoT)edge-based system based on model transformation and complete weighted graph to detect violations of social distancing measures in indoor public places.Awirele...This work presents the design of an Internet of Things(IoT)edge-based system based on model transformation and complete weighted graph to detect violations of social distancing measures in indoor public places.Awireless sensor network based on Bluetooth Low Energy is introduced as the infrastructure of the proposed design.A hybrid model transformation strategy for generating a graph database to represent groups of people is presented as a core middleware layer of the detecting system’s proposed architectural design.A Neo4j graph database is used as a target implementation generated from the proposed transformational system to store all captured real-time IoT data about the distances between individuals in an indoor area and answer user predefined queries,expressed using Neo4j Cypher,to provide insights from the stored data for decision support.As proof of concept,a discrete-time simulation model was adopted for the design of a COVID-19 physical distancing measures case study to evaluate the introduced system architecture.Twenty-one weighted graphs were generated randomly and the degrees of violation of distancing measures were inspected.The experimental results demonstrate the capability of the proposed system design to detect violations of COVID-19 physical distancing measures within an enclosed area.展开更多
Graph databases have gained widespread adoption in various industries and have been utilized in a range of applications,including financial risk assessment,commodity recommendation,and data lineage tracking.While the ...Graph databases have gained widespread adoption in various industries and have been utilized in a range of applications,including financial risk assessment,commodity recommendation,and data lineage tracking.While the principles and design of these databases have been the subject of some investigation,there remains a lack of comprehensive examination of aspects such as storage layout,query language,and deployment.The present study focuses on the design and implementation of graph storage layout,with a particular emphasis on tree-structured key-value stores.We also examine different design choices in the graph storage layer and present our findings through the development of TuGraph,a highly efficient single-machine graph database that significantly outperforms well-known Graph DataBase Management System(GDBMS).Additionally,TuGraph demonstrates superior performance in the Linked Data Benchmark Council(LDBC)Social Network Benchmark(SNB)interactive benchmark.展开更多
Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the presen...Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the present digital era of big data,there is a lack of visual representations of the knowledge present in scholarly articles,and a time-saving approach for a literature search and visual navigation is warranted.The majority of knowledge display tools cannot cope with current big data trends and pose limitations in meeting the requirements of automatic knowledge representation,storage,and dynamic visualization.To address this limitation,the main aim of this paper is to model the visualization of unstructured data and explore the feasibility of achieving visual navigation for researchers to gain insight into the knowledge hidden in scientific articles of digital repositories.Contemporary topics of research and practice,including modifiable risk factors leading to a dramatic increase in Alzheimer’s disease and other forms of dementia,warrant deeper insight into the evidence-based knowledge available in the literature.The goal is to provide researchers with a visual-based easy traversal through a digital repository of research articles.This paper takes the first step in proposing a novel integrated model using knowledge maps and next-generation graph datastores to achieve a semantic visualization with domain-specific knowledge,such as dementia risk factors.The model facilitates a deep conceptual understanding of the literature by automatically establishing visual relationships among the extracted knowledge from the big data resources of research articles.It also serves as an automated tool for a visual navigation through the knowledge repository for faster identification of dementia risk factors reported in scholarly articles.Further,it facilitates a semantic visualization and domain-specific knowledge discovery from a large digital repository and their associations.In this study,the implementation of the proposed model in the Neo4j graph data repository,along with the results achieved,is presented as a proof of concept.Using scholarly research articles on dementia risk factors as a case study,automatic knowledge extraction,storage,intelligent search,and visual navigation are illustrated.The implementation of contextual knowledge and its relationship for a visual exploration by researchers show promising results in the knowledge discovery of dementia risk factors.Overall,this study demonstrates the significance of a semantic visualization with the effective use of knowledge maps and paves the way for extending visual modeling capabilities in the future.展开更多
Information on the Internet is fragmented and presented in different data sources, which makes automatic knowledge harvesting and understanding formidable for ma- chines, and even for humans. Knowledge graphs have be-...Information on the Internet is fragmented and presented in different data sources, which makes automatic knowledge harvesting and understanding formidable for ma- chines, and even for humans. Knowledge graphs have be- come prevalent in both of industry and academic circles these years, to be one of the most efficient and effective knowledge integration approaches. Techniques for knowledge graph construction can mine information from either structured, semi-structured, or even unstructured data sources, and fi- nally integrate the information into knowledge, represented in a graph. Furthermore, knowledge graph is able to organize information in an easy-to-maintain, easy-to-understand and easy-to-use manner. In this paper, we give a summarization of techniques for constructing knowledge graphs. We review the existing knowledge graph systems developed by both academia and industry. We discuss in detail about the process of building knowledge graphs, and survey state-of-the-art techniques for automatic knowledge graph checking and expansion via log- ical inferring and reasoning. We also review the issues of graph data management by introducing the knowledge data models and graph databases, especially from a NoSQL point of view. Finally, we overview current knowledge graph sys- tems and discuss the future research directions.展开更多
Graphs are widely used for modeling complicated data such as social networks,chemical compounds,protein interactions and semantic web.To effiectively understand and utilize any collection of graphs,a graph database th...Graphs are widely used for modeling complicated data such as social networks,chemical compounds,protein interactions and semantic web.To effiectively understand and utilize any collection of graphs,a graph database that efficiently supports elementary querying mechanisms is crucially required.For example,Subgraph and Supergraph queries are important types of graph queries which have many applications in practice.A primary challenge in computing the answers of graph queries is that pair-wise comparisons of graphs are usually hard problems.Relational database management systems(RDBMSs) have repeatedly been shown to be able to efficiently host different types of data such as complex objects and XML data.RDBMSs derive much of their performance from sophisticated optimizer components which make use of physical properties that are specific to the relational model such as sortedness,proper join ordering and powerful indexing mechanisms.In this article,we study the problem of indexing and querying graph databases using the relational infrastructure.We present a purely relational framework for processing graph queries.This framework relies on building a layer of graph features knowledge which capture metadata and summary features of the underlying graph database.We describe different querying mechanisms which make use of the layer of graph features knowledge to achieve scalable performance for processing graph queries.Finally,we conduct an extensive set of experiments on real and synthetic datasets to demonstrate the efficiency and the scalability of our techniques.展开更多
gStore is an open-source native Resource Description Framework (RDF) triple store that answers SPARQL queries by subgraph matching over RDF graphs. However, there are some deficiencies in the original system design,...gStore is an open-source native Resource Description Framework (RDF) triple store that answers SPARQL queries by subgraph matching over RDF graphs. However, there are some deficiencies in the original system design, such as answering simple queries (including one-triple pattern queries). To improve the efficiency of the system, we reconsider the system design in this paper. Specifically, we propose a new query plan generation module that generates different query plans according to the structures of query graphs. Furthermore, we re-design our vertex encoding strategy to achieve more pruning power and a new multi-join algorithm to speed up the subgraph matching process. Extensive experiments on synthetic and real RDF datasets show that our method outperforms the state-of-the-art algorithms significantly.展开更多
Traditional geographic information system models for map representation use superposition of layers to model physical reality,neglecting the integrity of the environment and limiting the ability to express interaction...Traditional geographic information system models for map representation use superposition of layers to model physical reality,neglecting the integrity of the environment and limiting the ability to express interactions between features in complex phenomenon.This results in limitations regarding dynamic simulation and geographic causality reasoning.In this paper,we extend the framework of the geographic scene by formalizing the relationship between geographic processes and events to construct a dynamic data model:the process-event-centred dynamic data model.The key element of this data model is relationships between processes,events,and states of the natural or man-made phenomenon of interest.The identified relationships can be translated into a network of hierarchical,developmental,and causal graphs and realized in the Neo4j graph database.The implementation in the graph database supports spatio-temporal reasoning in geographic scenes and achieves an organizational framework for simulating spatio-temporal dynamics and complex calculations.The example of a 2019 mega-typhoon process is used to demonstrate the introduced process-event-centred model and its implementation in the graph database.A series of queries to the graph database show the capabilities of the data model for spatial reasoning and dynamic modeling.展开更多
基金supported by the National Key R&D Program of China(2020YFB0905900).
文摘The construction of new power systems presents higher requirements for the Power Internet of Things(PIoT)technology.The“source-grid-load-storage”architecture of a new power system requires PIoT to have a stronger multi-source heterogeneous data fusion ability.Native graph databases have great advantages in dealing with multi-source heterogeneous data,which make them suitable for an increasing number of analytical computing tasks.However,only few existing graph database products have native support for matrix operation-related interfaces or functions,resulting in low efficiency when handling matrix calculations that are commonly encountered in power grids.In this paper,the matrix computation process is expressed by a strategy called graph description,which relies on the natural connection between the matrix and structure of the graph.Based on that,we implement matrix operations on graph database,including matrix multiplication,matrix decomposition,etc.Specifically,only the nodes relevant to the computation and their neighbors are concerned in the process,which prunes the influence of zero elements in the matrix and avoids useless iterations compared to the conventional matrix computation.Based on the graph description,a series of power grid computations can be implemented on graph database,which reduces redundant data import and export operations while leveraging the parallel computing capability of graph database.It promotes the efficiency of PIoT when handling multi-source heterogeneous data.An comprehensive experimental study over two different scale power system datasets compares the proposed method with Python and MATLAB baselines.The results reveal the superior performance of our proposed method in both power flow and N-1 contingency computations.
文摘The query optimizer uses cost-based optimization to create an execution plan with the least cost,which also consumes the least amount of resources.The challenge of query optimization for relational database systems is a combinatorial optimization problem,which renders exhaustive search impossible as query sizes rise.Increases in CPU performance have surpassed main memory,and disk access speeds in recent decades,allowing data compression to be used—strategies for improving database performance systems.For performance enhancement,compression and query optimization are the two most factors.Compression reduces the volume of data,whereas query optimization minimizes execution time.Compressing the database reduces memory requirement,data takes less time to load into memory,fewer buffer missing occur,and the size of intermediate results is more diminutive.This paper performed query optimization on the graph database in a cloud dew environment by considering,which requires less time to execute a query.The factors compression and query optimization improve the performance of the databases.This research compares the performance of MySQL and Neo4j databases in terms of memory usage and execution time running on cloud dew servers.
基金supported by ANID-Millennium Science Initiative Program-Code ICN17_002。
文摘In this systems paper,we present MillenniumDB:a novel graph database engine that is modular,persistent,and open source.MillenniumDB is based on a graph data model,which we call domain graphs,that provides a simple abstraction upon which a variety of popular graph models can be supported,thus providing a flexible data management engine for diverse types of knowledge graph.The engine itself is founded on a combination of tried and tested techniques from relational data management,state-of-the-art algorithms for worst-case-optimal joins,as well as graph-specific algorithms for evaluating path queries.In this paper,we present the main design principles underlying MillenniumDB,describing the abstract graph model and query semantics supported,the concrete data model and query syntax implemented,as well as the storage,indexing,query planning and query evaluation techniques used.We evaluate MillenniumDB over real-world data and queries from the Wikidata knowledge graph,where we find that it outperforms other popular persistent graph database engines(including both enterprise and open source alternatives)that support similarqueryfeatures.
基金supported by the Science and Technology Project of State Grid Corporation of China(Grant No.5211XT17001N)
文摘The power communication network can be abstracted as a graph based on its topology. In this paper, we propose an approach to conduct simulations of power communication network based on its graph representation. In particular, the nodes and edges in the graph refer to the ports and channels in the grid topology. Different applications on the grid can be transformed into queries over the graph. Hence, in this paper, we build our grid simulation model based on the Neo4 j graph database. We also propose a fault extension algorithm based on predicate calculus. Our experiment evaluations show that the proposed approach can effectively improve the efficiency of the power grid.
基金funded by the Deanship of Scientific Research at Umm Al-Qura University,Makkah,Kingdom of Saudi Arabia.Under Grant Code:22UQU4281755DSR05.
文摘With the rapid growth in the availability of digital health-related data,there is a great demand for the utilization of intelligent information systems within the healthcare sector.These systems can manage and manipulate this massive amount of health-related data and encourage different decision-making tasks.They can also provide various sustainable health services such as medical error reduction,diagnosis acceleration,and clinical services quality improvement.The intensive care unit(ICU)is one of the most important hospital units.However,there are limited rooms and resources in most hospitals.During times of seasonal diseases and pandemics,ICUs face high admission demand.In line with this increasing number of admissions,determining health risk levels has become an essential and imperative task.It creates a heightened demand for the implementation of an expert decision support system,enabling doctors to accurately and swiftly determine the risk level of patients.Therefore,this study proposes a fuzzy logic inference system built on domain-specific knowledge graphs,as a proof-of-concept,for tackling this healthcare-related issue.The system employs a combination of two sets of fuzzy input parameters to classify health risk levels of new admissions to hospitals.The proposed system implemented utilizes MATLAB Fuzzy Logic Toolbox via several experiments showing the validity of the proposed system.
基金The Open Fund of Hunan University of Traditional Chinese Medicine for the First-Class Discipline of Traditional Chinese Medicine(2018ZYX66)the Science Research Project of Hunan Provincial Department of Education(20C1391)the Natural Science Foundation of Hunan Province(2020JJ4461)。
文摘Objective To establish the knowledge graph of“disease-syndrome-symptom-method-formula”in Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)for reducing the fuzziness and uncertainty of data,and for laying a foundation for later knowledge reasoning and its application.Methods Under the guidance of experts in the classical formula of traditional Chinese medicine(TCM),the method of“top-down as the main,bottom-up as the auxiliary”was adopted to carry out knowledge extraction,knowledge fusion,and knowledge storage from the five aspects of the disease,syndrome,symptom,method,and formula for the original text of Treatise on Febrile Diseases,and so the knowledge graph of Treatise on Febrile Diseases was constructed.On this basis,the knowledge structure query and the knowledge relevance query were realized in a visual manner.Results The knowledge graph of“disease-syndrome-symptom-method-formula”in the Treatise on Febrile Diseases was constructed,containing 6469 entities and 10911 relational triples,on which the query of entities and their relationships can be carried out and the query result can be visualized.Conclusion The knowledge graph of Treatise on Febrile Diseases systematically realizes its digitization of the knowledge system,and improves the completeness and accuracy of the knowledge representation,and the connection between“disease-syndrome-symptom-treatment-formula”,which is conducive to the sharing and reuse of knowledge can be obtained in a clear and efficient way.
文摘This work presents the design of an Internet of Things(IoT)edge-based system based on model transformation and complete weighted graph to detect violations of social distancing measures in indoor public places.Awireless sensor network based on Bluetooth Low Energy is introduced as the infrastructure of the proposed design.A hybrid model transformation strategy for generating a graph database to represent groups of people is presented as a core middleware layer of the detecting system’s proposed architectural design.A Neo4j graph database is used as a target implementation generated from the proposed transformational system to store all captured real-time IoT data about the distances between individuals in an indoor area and answer user predefined queries,expressed using Neo4j Cypher,to provide insights from the stored data for decision support.As proof of concept,a discrete-time simulation model was adopted for the design of a COVID-19 physical distancing measures case study to evaluate the introduced system architecture.Twenty-one weighted graphs were generated randomly and the degrees of violation of distancing measures were inspected.The experimental results demonstrate the capability of the proposed system design to detect violations of COVID-19 physical distancing measures within an enclosed area.
文摘Graph databases have gained widespread adoption in various industries and have been utilized in a range of applications,including financial risk assessment,commodity recommendation,and data lineage tracking.While the principles and design of these databases have been the subject of some investigation,there remains a lack of comprehensive examination of aspects such as storage layout,query language,and deployment.The present study focuses on the design and implementation of graph storage layout,with a particular emphasis on tree-structured key-value stores.We also examine different design choices in the graph storage layer and present our findings through the development of TuGraph,a highly efficient single-machine graph database that significantly outperforms well-known Graph DataBase Management System(GDBMS).Additionally,TuGraph demonstrates superior performance in the Linked Data Benchmark Council(LDBC)Social Network Benchmark(SNB)interactive benchmark.
文摘Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the present digital era of big data,there is a lack of visual representations of the knowledge present in scholarly articles,and a time-saving approach for a literature search and visual navigation is warranted.The majority of knowledge display tools cannot cope with current big data trends and pose limitations in meeting the requirements of automatic knowledge representation,storage,and dynamic visualization.To address this limitation,the main aim of this paper is to model the visualization of unstructured data and explore the feasibility of achieving visual navigation for researchers to gain insight into the knowledge hidden in scientific articles of digital repositories.Contemporary topics of research and practice,including modifiable risk factors leading to a dramatic increase in Alzheimer’s disease and other forms of dementia,warrant deeper insight into the evidence-based knowledge available in the literature.The goal is to provide researchers with a visual-based easy traversal through a digital repository of research articles.This paper takes the first step in proposing a novel integrated model using knowledge maps and next-generation graph datastores to achieve a semantic visualization with domain-specific knowledge,such as dementia risk factors.The model facilitates a deep conceptual understanding of the literature by automatically establishing visual relationships among the extracted knowledge from the big data resources of research articles.It also serves as an automated tool for a visual navigation through the knowledge repository for faster identification of dementia risk factors reported in scholarly articles.Further,it facilitates a semantic visualization and domain-specific knowledge discovery from a large digital repository and their associations.In this study,the implementation of the proposed model in the Neo4j graph data repository,along with the results achieved,is presented as a proof of concept.Using scholarly research articles on dementia risk factors as a case study,automatic knowledge extraction,storage,intelligent search,and visual navigation are illustrated.The implementation of contextual knowledge and its relationship for a visual exploration by researchers show promising results in the knowledge discovery of dementia risk factors.Overall,this study demonstrates the significance of a semantic visualization with the effective use of knowledge maps and paves the way for extending visual modeling capabilities in the future.
文摘Information on the Internet is fragmented and presented in different data sources, which makes automatic knowledge harvesting and understanding formidable for ma- chines, and even for humans. Knowledge graphs have be- come prevalent in both of industry and academic circles these years, to be one of the most efficient and effective knowledge integration approaches. Techniques for knowledge graph construction can mine information from either structured, semi-structured, or even unstructured data sources, and fi- nally integrate the information into knowledge, represented in a graph. Furthermore, knowledge graph is able to organize information in an easy-to-maintain, easy-to-understand and easy-to-use manner. In this paper, we give a summarization of techniques for constructing knowledge graphs. We review the existing knowledge graph systems developed by both academia and industry. We discuss in detail about the process of building knowledge graphs, and survey state-of-the-art techniques for automatic knowledge graph checking and expansion via log- ical inferring and reasoning. We also review the issues of graph data management by introducing the knowledge data models and graph databases, especially from a NoSQL point of view. Finally, we overview current knowledge graph sys- tems and discuss the future research directions.
文摘Graphs are widely used for modeling complicated data such as social networks,chemical compounds,protein interactions and semantic web.To effiectively understand and utilize any collection of graphs,a graph database that efficiently supports elementary querying mechanisms is crucially required.For example,Subgraph and Supergraph queries are important types of graph queries which have many applications in practice.A primary challenge in computing the answers of graph queries is that pair-wise comparisons of graphs are usually hard problems.Relational database management systems(RDBMSs) have repeatedly been shown to be able to efficiently host different types of data such as complex objects and XML data.RDBMSs derive much of their performance from sophisticated optimizer components which make use of physical properties that are specific to the relational model such as sortedness,proper join ordering and powerful indexing mechanisms.In this article,we study the problem of indexing and querying graph databases using the relational infrastructure.We present a purely relational framework for processing graph queries.This framework relies on building a layer of graph features knowledge which capture metadata and summary features of the underlying graph database.We describe different querying mechanisms which make use of the layer of graph features knowledge to achieve scalable performance for processing graph queries.Finally,we conduct an extensive set of experiments on real and synthetic datasets to demonstrate the efficiency and the scalability of our techniques.
文摘gStore is an open-source native Resource Description Framework (RDF) triple store that answers SPARQL queries by subgraph matching over RDF graphs. However, there are some deficiencies in the original system design, such as answering simple queries (including one-triple pattern queries). To improve the efficiency of the system, we reconsider the system design in this paper. Specifically, we propose a new query plan generation module that generates different query plans according to the structures of query graphs. Furthermore, we re-design our vertex encoding strategy to achieve more pruning power and a new multi-join algorithm to speed up the subgraph matching process. Extensive experiments on synthetic and real RDF datasets show that our method outperforms the state-of-the-art algorithms significantly.
基金supported by National Natural Science Foundation of China[grant number 41631175,42077003].
文摘Traditional geographic information system models for map representation use superposition of layers to model physical reality,neglecting the integrity of the environment and limiting the ability to express interactions between features in complex phenomenon.This results in limitations regarding dynamic simulation and geographic causality reasoning.In this paper,we extend the framework of the geographic scene by formalizing the relationship between geographic processes and events to construct a dynamic data model:the process-event-centred dynamic data model.The key element of this data model is relationships between processes,events,and states of the natural or man-made phenomenon of interest.The identified relationships can be translated into a network of hierarchical,developmental,and causal graphs and realized in the Neo4j graph database.The implementation in the graph database supports spatio-temporal reasoning in geographic scenes and achieves an organizational framework for simulating spatio-temporal dynamics and complex calculations.The example of a 2019 mega-typhoon process is used to demonstrate the introduced process-event-centred model and its implementation in the graph database.A series of queries to the graph database show the capabilities of the data model for spatial reasoning and dynamic modeling.