摘要
现有的空间关键词搜索方法通常采用以R树为主的混合索引,根据查询位置找到相关文本,查询时通过编辑距离或统计语言模型进行简单的文本匹配。然而多维R树的空间区域重叠率较高,且简单的文本匹配易造成语义相关的文本丢失。为了提高空间查询效率和文本匹配的准确率,构建了一种有效的混合索引结构希尔伯特信息检索树(Hilbert Retrieving information-Tree,HRI-Tree)并进行top-k查询,在Hilbert R树的节点中加入关键词的倒排索引,并采用LDA主题模型,通过主题分类更准确地查询到语义相关的文本,返回与查询文本近似匹配且空间距离相近的top-k结果。上述算法在实验中与当前的方法在查询所需时间、节点重叠覆盖率、文本匹配的准确率等方面进行了比较,显示出其优越的性能。
The recent report points out a kind of query called spatial-keyword retrieval which establishes a hybrid index based on R-tree to find the relevant text according to the position,and find out the similar text through editing distance or statistical language model.However,the multi-dimensional R-tree might achieve a high spatial area overlap ratio and simple text matching will result in semantically related text loss.In order to improve the efficiency of spatial query and the accuracy of text matching,the effective hybrid index structure named Hilbert Retrieving Information Tree(HRI-Tree)was constructed,which adds the inverted index of the keyword to the node of the Hilbert R-Tree and top-k query.Moreover,the LDA topic model was used to dig out the text more accurately through subject classification and return a top-k result that closely matches the query.The algorithm was experimentally compared with the current method in the aspects of runtime,overlapping coverage of MBR,the accuracy of text matching and so on,then shows its superior performances.
作者
徐艺丹
韩京宇
XU Yi-dan;HAN Jing-yu(College of Computer Science and Technology,Nanjing University of Posts and Telecommunications,Nanjing Jiangsu 210000,China;Jiangsu Key Laboratory of Big Data Security&Intelligent Processing,Nanjing Jiangsu 210000,China;Key Laboratory of Computer Network and Information Integration,Ministry of Education,Southeast University,Nanjing Jiangsu 210000,China)
出处
《计算机仿真》
北大核心
2019年第12期415-420,共6页
Computer Simulation
基金
国家自然科学基金项目(61602260)
东南大学计算机网络和信息集成教育部重点实验室(K93-9-2015-07C)
江苏省自然科学基金面上项目(BK20171447)
江苏省高校自然科学研究面上项目(17KJB520024)
关键词
空间关键词
主题模型
混合索引
查询
Spatial-keyword
Topic model
Hybrid index structure
Query