In recent years, a growing number of math contents are available on the Web. When conventional search engines deal with mathematical expressions, the two-dimen- sion-al structure of mathematical expressions is lost, w...In recent years, a growing number of math contents are available on the Web. When conventional search engines deal with mathematical expressions, the two-dimen- sion-al structure of mathematical expressions is lost, which results in a low performance of math retrieval. While the retrieval technology specifically designed for mathematical expressions is not mature currently. Aiming at these problems, an improved mathematical expression indexing and matching method was proposed through employing full text index method to deal with the two-dimensional structure of mathematical expressions. Firstly, through the fully consideration of LaTeX formulae’ characteristics, a feature representation method of mathematical expressions and a clustering method of feature keywords were put forward. Then, an improved inter-relevant successive trees index model was applied to the construction of the mathematical expression index, in which the cluster algorithm of mathematical expression features was employed to solve the problem of the quantity growth of the trees in processing large amount of formulae. Finally, the matching algorithms of mathematical expressions were given which provide four query modes called exact matching, compatible matching, sub-expression matching and fuzzy matching. In browser/server mode, 110027 formulae were used as experimental samples. The index file size was 29.02 Mb. The average time of retrieval was 1.092 seconds. The experimental result shows the effectiveness of the method.展开更多
The problem of subgraph matching is one fundamental issue in graph search,which is NP-Complete problem.Recently,subgraph matching has become a popular research topic in the field of knowledge graph analysis,which has ...The problem of subgraph matching is one fundamental issue in graph search,which is NP-Complete problem.Recently,subgraph matching has become a popular research topic in the field of knowledge graph analysis,which has a wide range of applications including question answering and semantic search.In this paper,we study the problem of subgraph matching on knowledge graph.Specifically,given a query graph q and a data graph G,the problem of subgraph matching is to conduct all possible subgraph isomorphic mappings of q on G.Knowledge graph is formed as a directed labeled multi-graph having multiple edges between a pair of vertices and it has more dense semantic and structural features than general graph.To accelerate subgraph matching on knowledge graph,we propose a novel subgraph matching algorithm based on subgraph index for knowledge graph,called as FGqT-Match.The subgraph matching algorithm consists of two key designs.One design is a subgraph index of matching-driven flow graph(FGqT),which reduces redundant calculations in advance.Another design is a multi-label weight matrix,which evaluates a near-optimal matching tree for minimizing the intermediate candidates.With the aid of these two key designs,all subgraph isomorphic mappings are quickly conducted only by traversing FGqj.Extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms.展开更多
文摘In recent years, a growing number of math contents are available on the Web. When conventional search engines deal with mathematical expressions, the two-dimen- sion-al structure of mathematical expressions is lost, which results in a low performance of math retrieval. While the retrieval technology specifically designed for mathematical expressions is not mature currently. Aiming at these problems, an improved mathematical expression indexing and matching method was proposed through employing full text index method to deal with the two-dimensional structure of mathematical expressions. Firstly, through the fully consideration of LaTeX formulae’ characteristics, a feature representation method of mathematical expressions and a clustering method of feature keywords were put forward. Then, an improved inter-relevant successive trees index model was applied to the construction of the mathematical expression index, in which the cluster algorithm of mathematical expression features was employed to solve the problem of the quantity growth of the trees in processing large amount of formulae. Finally, the matching algorithms of mathematical expressions were given which provide four query modes called exact matching, compatible matching, sub-expression matching and fuzzy matching. In browser/server mode, 110027 formulae were used as experimental samples. The index file size was 29.02 Mb. The average time of retrieval was 1.092 seconds. The experimental result shows the effectiveness of the method.
基金the National Natural Science Foundation of China(Grant Nos.61976032,62002039).
文摘The problem of subgraph matching is one fundamental issue in graph search,which is NP-Complete problem.Recently,subgraph matching has become a popular research topic in the field of knowledge graph analysis,which has a wide range of applications including question answering and semantic search.In this paper,we study the problem of subgraph matching on knowledge graph.Specifically,given a query graph q and a data graph G,the problem of subgraph matching is to conduct all possible subgraph isomorphic mappings of q on G.Knowledge graph is formed as a directed labeled multi-graph having multiple edges between a pair of vertices and it has more dense semantic and structural features than general graph.To accelerate subgraph matching on knowledge graph,we propose a novel subgraph matching algorithm based on subgraph index for knowledge graph,called as FGqT-Match.The subgraph matching algorithm consists of two key designs.One design is a subgraph index of matching-driven flow graph(FGqT),which reduces redundant calculations in advance.Another design is a multi-label weight matrix,which evaluates a near-optimal matching tree for minimizing the intermediate candidates.With the aid of these two key designs,all subgraph isomorphic mappings are quickly conducted only by traversing FGqj.Extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms.