摘要
为了解决大型XML文档检索时间长、响应速度慢、内存资源消耗大等问题,设计了类B树形结构的双索引结构,提出了基于双索引结构快速定位目标内容的查询方法。采用基于路径的倒排索引结构,降低了检索内容之间逐个比较Dewey编码的时间消耗。同时针对XML文档内容进行分词处理构建数据单元,通过数据单元间的逻辑关系建立Path Guide索引库,避免对查询内容无关节点的访问。多组对比实验结果表明,基于内容的双索引结构查询方法及优化方案在查询效率上表现出明显的优越性。
In order to solve problems about large XML documents, such as time-consuming retrieval, slow response speed and excessive resource consumption, the dual index structure based on B tree is designed, and a query method based on dual index structure is proposed to quickly locate the target content. The inverted index structure based on the path is adopted for reducing effectively time consumption of the content retrieval by comparing the Dewey encoding. At the same time, for XML document contents, the data units are constructed by the process of word segmentation, and the PathGuide index data- base is established through the logical relationship between the data units. The index database can effectively avoid the meaningless access to the irrelevant nodes of the query content. Through multiple sets of comparative experiments, the re- sults indicate that the proposed method and the optimization solution show obvious superiority in the query efficiency.
出处
《桂林电子科技大学学报》
2017年第2期111-115,共5页
Journal of Guilin University of Electronic Technology
基金
国家自然科学基金(61362021
61661017)
广西科技创新能力与条件建设计划(桂科能1598025-21)
广西自然科学基金(2013GXNSFDA019030
2014GXNSFDA118035
2016GXNSFAA380149)
认知无线电教育部重点实验室基金(CRKL150103
2011KF11)