摘要
数学表达式结构复杂多样,给检索带来困难。为此,提出一种数学表达式索引与检索方法。在索引阶段,通过对LaTeX数学表达式特点的分析与归纳,定义面向表达式二维结构特性的数学表达式特征表示方式,将互关联后继树索引模型应用于数学表达式索引的构建,以解决树结构表示表达式的层次增长问题。在匹配阶段,设计包括精确匹配、相容匹配、子式匹配、模糊匹配等查询模式的匹配算法。在浏览器/服务器模式下采用51 076条数学表达式进行索引与匹配。实验结果表明,提出的方法可加快查询速度,减小索引存储空间,能够适应数学表达式的结构特点,取得较好的检索效果。
Aiming at the difficulties in achieving retrieval that result from the diversity of the mathematical expression structure, a method of mathematical expression indexing and retrieval is proposed. Through analysis and induction of LaTeX mathematical expression' s characteristics, a mathematical expression feature representation way is defined for the two-dimensional structure characteristic in the indexing stage. And the inter-relevant successive tree indexing pattern is applied to the construction of the mathematical expression indexing, so as to solve the problem of the hierarchical growth of the tree structure representation. In the matching stage, the matching algorithm of query pattern which includes exact matching ,compatible matching, sub-expression matching and fuzzy matching is designed. In the browser/server mode, 51 076 mathematical expressions are used in the experiment of indexing and matching. The results show the designed indexing and retrieval method accelerates the query speed and reduces the storage space, which can adapt the structure characteristics of the mathematical expression and achieve better retrieval effect.
出处
《计算机工程》
CAS
CSCD
北大核心
2017年第6期129-135,共7页
Computer Engineering
基金
国家自然科学基金(61375075)
河北省高等学校科学技术研究重点项目(ZD2017208)
关键词
数学表达式
索引
检索
LaTeX格式
互关联后继树
mathematical expression
indexing
retrieval
LaTeX format
inter-relevant successive tree