期刊文献+

混合的大规模数据库自动模式抽象方法 被引量:4

Hybrid Schema Summarization Method of Large Scale Database
下载PDF
导出
摘要 随着数据库规模的扩大,其模式的复杂度也不断地增加,复杂的模式和文档的缺乏使得理解和操作数据库更加困难.现有的模式抽象方法大多通过关系表中的主外键信息查找出模式中最重要的表,然后使用这些最重要的表来构成单层次的模式总结.在现实应用中,这些模式总结的主题并不明确.文中陈述了现有方法的不足,然后给出了一种为大规模数据库生成多层次模式抽象的方法.在此方法中,首先使用不同类型的社区社团检测算法来将数据库模式划分为"团",然后使用元聚类方法将这些"团"集成为数据库的主题组,每一个主题组代表数据库的一个主题.最后将这些主题组进行进一步的聚类以生成主题组类,并为每一个主题组类挑选标签以生成多层次的模式抽象.在Freebase——开源的大规模数据库上验证了文中算法的有效性.实验证明文中算法不仅能够精确地识别大规模数据库的主题,同时可以依据数据库的主题生成易于理解、能够帮助用户浏览和检索数据库的多层次模式抽象. The complexity of database schemas and the lack of documentations usually make databases difficult to use.Some existing solutions attempt to identify the most important tables based on the foreign key relationships and use these tables as a summary of the database schema.However,in real world scenarios,the schema summaries generated by these approaches may fail to capture the subjects of the databases.In this paper,we describe the limitations of the previous approaches,and propose a principled method to summarize large-scale database schemas.Firstly,we partition a database schema into communities through a number of community detection algorithms.Then,we integrate these results into a set of groups,each presenting a subject.Finally,we cluster the subject groups into Abstract domains to form a multi-level navigation structure.Our approach is evaluated on Freebase,a real world large-scale database.The results show that our approach can identify subject groups precisely and the generated Abstract schema layers are very helpful for users to explore a database.
出处 《计算机学报》 EI CSCD 北大核心 2013年第8期1616-1625,共10页 Chinese Journal of Computers
基金 国家教育部"新世纪优秀人才支持计划" 国家自然科学基金(61272138) 中国人民大学科学研究基金(12XNLJ01)资助~~
关键词 模式 抽象 大规模数据库 主题组 混合 schema summarization large-scale database subject group hybrid
  • 相关文献

参考文献14

  • 1Wang Xue, Zhou Xuan, Wang Shan. Summarizing large- scale database schema using comrnunity detection. Journal of Computer Science and Technology, 2012, 27(3): 515-526.
  • 2Newman M E J. Fast algorithm for detecting community structure in networks. Physical Review E, 2004, 69 (2) : 066133.
  • 3Campbell L J, Halpin T A, Proper H A. Conceptual sche- mas with abstractions--Making flat conceptual schemas more comprehensible. Data & Knowledge Engineering, 1996, 20(1): 39-85.
  • 4Feldman P, Miller D. Entity model clustering: Structuring a data model by abstraction. The Computer Journal, 1986, 29(4) : 348-360.
  • 5Teorey T, Wei G, Bolton D, et al. ER model clustering as an aid for user communication and documentation in database design. Communication of Association for Computing Machinery, 1989, 32(8): 975-987.
  • 6Huffman S B, Zoeller R V. A rule-based system tool for automated ER model clustering//Proceedings of the 8th International Conference on Entity-Relationship Approach to Database Design and Querying. Toronto, Canada, 1990: 221-236.
  • 7Yu C, Jagadish H V. Schema summarization//Proceedings of the 32nd Very Large Database (VLDB2006). Seoul, Korea, 2006: 319-330.
  • 8Yang X, Procopiuc C M, Srivastava D. Summarizing rela- tional datahases//Proceedings of the 35th Very Large Data Bases(VLDB2009). Lyon, France, 2009:634-645.
  • 9Wu W, Reinwald B, Sismannis Y, et al. Discovering topical structures of databases//Proceedings of the 2008 Spe'cial Interest Group on Management of Data (SIGMOD 2008). Vancouver, BC, Canada, 2008:1019-1030.
  • 10Newman M E J, Girvan M. Finding and evaluating commu- nity structure in networks. Physical Review E, 2004, 69(2) : 026113.

同被引文献31

引证文献4

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部