Keyword Searches in Data-Centric XML Documents Using Tree Partitioning 被引量：1

Keyword Searches in Data-Centric XML Documents Using Tree Partitioning

导出

摘要 This paper presents an effective keyword search method for data-centric extensive markup language （XML） documents. The method divides an XML document into compact connected integral subtrees, called self-integral trees （SI-Trees）, to capture the structural information in the XML document. The SI-Trees are generated based on a schema guide. Meaningful self-integral trees （MSI-Trees） are identified, which contain all or some of the input keywords for the keyword search in the XML documents. Indexing is used to accelerate the retrieval of MSI-Trees related to the input keywords. The MSI-Trees are ranked to identify the top-k results with the highest ranks. Extensive tests demonstrate that this method costs 10-100 ms to answer a keyword query, and outperforms existing approaches by 1-2 orders of magnitude. This paper presents an effective keyword search method for data-centric extensive markup language （XML） documents. The method divides an XML document into compact connected integral subtrees, called self-integral trees （SI-Trees）, to capture the structural information in the XML document. The SI-Trees are generated based on a schema guide. Meaningful self-integral trees （MSI-Trees） are identified, which contain all or some of the input keywords for the keyword search in the XML documents. Indexing is used to accelerate the retrieval of MSI-Trees related to the input keywords. The MSI-Trees are ranked to identify the top-k results with the highest ranks. Extensive tests demonstrate that this method costs 10-100 ms to answer a keyword query, and outperforms existing approaches by 1-2 orders of magnitude.

作者李国良冯建华周立柱

机构地区 Department of Computer Science and Technology

出处《Tsinghua Science and Technology》 SCIE EI CAS 2009年第1期7-18,共12页 清华大学学报（自然科学版（英文版）

基金 Partly Supported by the National High-Tech Research and Development (863) Program of China (No. 2007AA01Z152) the Basic Research Foundation of Tsinghua National Laboratory for Information Science and Technology (TNList) 2008 HP Labs Innovation Research Program

关键词 keyword searches extensive markup language （XML） self-integral trees RANKING INDEXING keyword searches extensive markup language （XML） self-integral trees ranking indexing

分类号 TP312.2 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1Li G,,Feng J,Wang J, et al.RACE: Finding and rankingcompact connected trees for keyword proximity search over XML documents[].Proceedings of the th Interna- tional Conference on World Wide Web.2008
2Li G,Feng J,Wang J, et al.SAILER: An effective search engine for unified retrieval of heterogeneous XML and web documents[].Proceedings of the th International Conference on World Wide Web.2008
3Li G,Feng J,Wang J, et al.Efficient keyword search for valuable lcas over XML documents[].Proceedings of the Sixteenth ACM Conference on Information and Knowl- edge Management.2007
4Li G,Feng J,Zhou L.Efficient keyword search over data-centric XML documents[].Proceedings of Advances in Data and Web Management Joint th Asia-Pacific Web Conference and th International Conference on Web-Age Information Management.2007
5Botev C,Amer-Yahia S,Shanmugasundaram J.Expres- siveness and performance of full-text search languages[].Proceedings of Advances in Database Technology - EDBT th International Conference on Extending Data- base Technology.2006
6Cohen S,Kanza Y,Kimelfeld B, et al.Interconnection semantics for keyword search in XML[].Proceedings of the ACM CIKM International Conference on Infor- mation and Knowledge Management.2005
7Hristidis V,Koudas N,Papakonstantinou Y, et al.Keyword proximity search in XML trees[].IEEE Transaction Knowl- edge Data Engineering.2006
8Pradhan S.An algebraic query model for effective and efficient retrieval of XML fragments[].Proceedings of the nd International Conference on Very Large Data Bases.2006
9Agrawal S,Chaudhuri S,Das G.Dbxplorer: A system for keyword-based search over relational databases[].Pro- ceedings of the th International Conference on Data En- gineering.2002
10Bhalotia G,Hulgeri A,Nakhe C, et al.Keyword searching and browsing in databases using banks[].Proceedings of the th International Conference on Data Engineering.2002

同被引文献13

1徐德智,郭玉珂,孙莹,陈学工.XML数据B树存储索引研究[J].计算机工程与应用,2004,40(22):168-170. 被引量：3
2刘彩苹,李仁发,刘喜苹.面向嵌入式数据库的改进B^+-树索引机制[J].计算机工程与科学,2007,29(1):101-102. 被引量：7
3王英强,石永生.B+树在数据库索引中的应用[J].长江大学学报（自科版）（上旬）,2008,5(1):233-235. 被引量：6
4Jaekwan PARK,Bonghee HONG,Chaehoon BAN.A query index for continuous queries on RFID streaming data[J].Science in China(Series F),2008,51(12):2047-2061. 被引量：2
5YE XiaoPing,TANG Yong,CHEN LuoWu,GUO Huan,ZHU Jun,CHEN KaiYuan.Study and application of temporal index technology[J].Science in China(Series F),2009,52(6):899-913. 被引量：6
6Utku Kalay,Oya Kalipsiz.A Comparison Study of Moving Object Index Structures[J].Journal of Computer Science & Technology,2009,24(6):1098-1108. 被引量：2
7张学琴.嵌入式数据库B+_树索引机制研究及其改进[J].计算机与现代化,2009(12):68-71. 被引量：3
8安世通,胡海波,李宇,徐建良.Flash-Optimized B+-Tree[J].Journal of Computer Science & Technology,2010,25(3):509-522. 被引量：2
9张华,顾红飞,刘涛.基于B+树的文本信息检索技术[J].皖西学院学报,2010,26(2):31-35. 被引量：6
10吴伟民,卢琦,王振华,苏庆.NTFS目录下索引B+树结构动态解析[J].计算机工程与设计,2010,31(22):4843-4846. 被引量：11

引证文献1

1时亚南.B+树算法的Java实现方法研究[J].计算机技术与发展,2015,25(1):111-114. 被引量：2

二级引证文献2

1罗尹奇,汤伟.快速构建在线课程资源平台研究——以电子科技大学图书馆在线课程资源为例[J].软件,2020,41(10):204-207. 被引量：2
2林荣杭,刘小英.MySQL索引改进的B+树的研究[J].电脑知识与技术,2022,18(16):12-13. 被引量：1

1廖金梅,赵千川.无线传感器网络的一种数据管理底层协议[J].计算机工程与应用,2006,42(31):127-132. 被引量：1
2SHI Lei YANG Xiao-chun YU Ge WANG Bin ZHOU Hua-hui.An Optimized Approach for Extracting Approximate Functional Dependencies in XML Documents[J].Wuhan University Journal of Natural Sciences,2006,11(1):127-132.
3Liru Zhang Tadashi Ohmori Mamoru Hoshi.Keyword Search on Both XML and Relational Data[J].通讯和计算机（中英文版）,2011,8(4):264-275.
4Wan,Chang-xuan,Liu,Yun-Sheng.X-RESTORE: Middleware for XML's Relational Storage and Retrieve[J].Wuhan University Journal of Natural Sciences,2003,8(01A):28-34. 被引量：4
5王珊,张俊,彭朝晖,战疆,杜小勇,Zhao-hui Xiao-yong.基于本体的关系数据库语义检索[J].计算机科学与探索,2007,1(1):59-78. 被引量：15
6李力鸿,邵敏,郑震坤,何川,都志辉.XML文档信息的几种转换方法分析与应用实例[J].计算机科学,2003,30(2):40-44. 被引量：19
7MENGXiao-feng LUODao-feng OUJian-bo.An Extended Role Based Access Control Method for XML Documents[J].Wuhan University Journal of Natural Sciences,2004,9(5):740-744.
8安南,张申生.DESIGN OF A SECURE FRAMEWORK TO SECURELY EXCHANGE(XML-BASED)DOCUMENTS OVER THE INTERNET[J].Journal of Shanghai Jiaotong university(Science),2002,7(2):227-230.
9潘中强,常新峰.无线传感器中一种分布式密钥更新管理方案[J].传感技术学报,2014,27(9):1287-1292. 被引量：1
10清华博士生首次在数据库领域顶级国际会议发表论文[J].科学中国人,2008(3):48-48.

Tsinghua Science and Technology

2009年第1期

浏览历史

内容加载中请稍等...

Keyword Searches in Data-Centric XML Documents Using Tree Partitioning 被引量：1

参考文献10

同被引文献13

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史