期刊文献+

XML信息检索中最小子树根节点问题的分层算法 被引量:23

Layered Solution for SLCA Problem in XML Information Retrieval
下载PDF
导出
摘要 最小子树根节点问题(smallest lowest common ancestor,简称SLCA)是实现XML信息检索研究中关键字查询的一个基本问题,其主旨就是求解所有包含给定关键字的紧致子树的根节点.XU等人给出了3种算法—基于索引的搜索算法(indexed lookup eager,简称ILE)、基于堆栈的算法以及基于扫描的算法(scan eager,简称SE),并通过实验证明ILE算法具有最好的表现.与基于B+树索引结构的ILE算法不同,所给出的新算法,称为LISA(layered intersection scan algorithm)方法.该方法基于SLCA节点按“层”分布的规律,采取了逐层求解SLCA节点的思路,即在获取了包含关键字的节点的Dewey码集合后,通过计算对应于不同关键字、不同层次的Dewey码前缀集合的交集,可以得到对应不同层的SLCA节点.与ILE相比,LISA除了只需对应于关键字的节点集合信息以外,不再需要其他复杂的辅助数据结构——全部的信息只是对应不同关键字的Dewey码集合以及排序操作.同时,给出了两种实际的算法:LISAI和LISAII,二者的区别在于是否采用Dewey编码到整数的转换.其中,LISAII更具有满意的性能. SLCA (smallest lowest common ancestor) problem is a basic task of keyword search in XML information retrieval. It means to find all the nodes corresponding to the tightest subtrees in XML data, which involves the given keywords. Xu, et al., illustrate three algorithms--Indexed lookup eager (ILE), stack algorithm and scan eager (SE), and manifest that ILE has the best performance. Different from the complicated-B+-tree-based ILE algorithm, this paper proposes a layered solution for SLCA problem, named as LISA (layered intersection scan algorithm). It benefits from the distribution rule of SLCA nodes in XML tree, and calculates the SLCA nodes level by level (the deepest level runs first). That is, based on the retrieved Dewey codes corresponding to given keywords, the Dewey codes of SLCA nodes can be gotten by intersecting the prefix Dewey codes of each level. Compared with the ILE algorithm, LISA solutions need not sophisticated data structures, and have comparatively runtime performance. There are two instances following the LISA idea, called LISA I and LISA II respectively. They are distinguished from each other according to whether keeping Dewey codes in computation or transforming Dewey codes into integer sequences. Extensive experiments evaluate the performance of algorithms and prove the efficiency of LISA II.
出处 《软件学报》 EI CSCD 北大核心 2007年第4期919-932,共14页 Journal of Software
基金 SupportedbytheNationalNaturalScienceFoundationofChinaunderGrantNo.60503037(国家自然科学基金) theNationalHigh-TechResearchandDevelopmentPlanofChinaunderGrantNo.2005AA4Z307(国家高技术研究发展计划(863)) theBeijingNaturalScienceFoundationofChinaunderGrantNo.4062018(北京市自然科学基金)
关键词 XML索引 DEWEY编码 XML信息检索 关键字查询 SLCA ILE XML index Dewey code XML information retrieval keyword search SLCA (smallest lowest common ancestor) ILE (indexed lookup eager)
  • 相关文献

参考文献1

二级参考文献3

共引文献175

同被引文献148

引证文献23

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部