摘要
关系数据库的关键词搜索面临的最大挑战在于满足需求的答案可能来自多个关系的元组的组合。现有主流方法通过定位每个关键词对应的元组并动态发现元组之间的关联来得到搜索结果。然而当数据库规模较大或模式复杂时,这些方法存在搜索效率低的问题;此外,这些方法因只能支持简单的关键词查询而实用性受到限制。为此,提出对元组的组合进行预先索引从而加快搜索,此外还对其索引效率及查询能力进行改进以提高系统的可用性。首先,为了提高搜索和索引效率,提出基于模式图的元组连接枚举技术,该技术利用无环模式图枚举合适的关系连接,将其转换为SQL语句在数据库中执行以得到可能的元组连接;其次,为了保证结果的紧致性,提出了1到m元组连接的预先索引与顺序搜索机制,该机制对元组连接进行由小到大的搜索,并限制所有包含已有结果的元组连接都不再参与搜索;最后,为了支持复杂查询,提出基于域的索引结构,为每个元组连接建立面向不同查询类型的域,通过查找多个域并对结果进行逻辑组合得到最终结果。实验表明,相比于已有技术,本技术具有较快的索引速度与较高的查询效率,并能提供如布尔查询、属性查询等的复杂查询能力。
The biggest challenge for keyword search over relational databases is that results are often assembled from tuples in several tables. Dominant approaches find tuples bit by keywords and identify their joins on the fly to form re- sults, which are rather inefficient for databases with large scale or complex schema. Besides, these approaches only sup- port simple keyword query, which limits their practical usage. Regarding this, we proposed an alternative way by in- dexing joinable tuples offline to speed online search, and strove to improve its index efficiency and query capability for practical usage. Firstly, to improve search and index efficiency, we proposed an approach to utilize schema graph infor- mation to enumerate joinable tuples. This approach discovers all suitable joinable tables and translates them to SQL que- ries, which are sent to database interfaces for getting possible joinable tuples. Secondly, to ensure the compactness of re- sults,we proposed an approach to index joinable tuples and searched them by order of their size. The approach selects rele-vant joinable tuples with small scales in advance and excludes those joinable tuples containing them from the next round of selection. Finally, to support complex query, we proposed a field-based index structure, which uses different fields for different search types. At search time, these fields are queried and their results are sent to perform logical operations to get the final results. Experiments show that,compared to existing approaches,our approach outstands in index and search efficiency and provides query capability close to the SQL.
出处
《计算机科学》
CSCD
北大核心
2016年第4期182-187,共6页
Computer Science
基金
国家自然科学基金(61402426)
软件新技术与产业化协同创新中心资助
关键词
关系数据库
关键词搜索
预先索引
紧致性
复杂查询
Relational database,Keywords search,Offline index,Compactness,Complex query