期刊文献+

分布式大数据函数依赖发现 被引量:9

Functional Dependencies Discovering in Distributed Big Data
下载PDF
导出
摘要 在关系数据库中,函数依赖发现是一种十分重要的数据库分析技术,在知识发现、数据库语义分析、数据质量评估以及数据库设计等领域有着广泛的应用.现有的函数依赖发现算法主要针对集中式数据,通常仅适用于数据规模比较小的情况.在大数据背景下,分布式环境函数依赖发现更富有挑战性.提出了一种分布式环境下大数据的函数依赖发现算法,其基本思想是首先在各个节点利用本地数据并行进行函数依赖发现,基于以上发现的结果对函数依赖候选集进行剪枝,然后进一步利用函数依赖的左部(left hand side,LHS)的特征,对函数依赖候选集进行分组,针对每一组候选函数依赖并行执行分布式环境发现算法,最终得到所有函数依赖.对不同分组情况下所能检测的候选函数依赖数量进行了分析,在算法的执行过程中,综合考虑了数据迁移量和负载均衡的问题.在真实的大数据集上的实验表明,提出的检测算法在检测效率方面与已有方法相比有明显的提升. Discovering functional dependencies (FDs) from relational databases is an important database analysis technique, which hasa wide range of applications in knowledge discovery, database semantics analysis, data quality assessment and database design. Existing functional dependencies discovery algorithms are mainly applied in centralized data, and are suitable to the case of small data size only. However, it is far more cha!lenging to discover functional dependencies in distributed databases, especially with big data. In this paper, we propose a novel functional dependencies discovering approach in distributed big data. Firstly we execute functional dependencies discovering algorithm in parallel in each node, then prune the candidate set of functional dependencies based on the regults of discovery. Secondly we group the candidate set of functional dependencies according to the features of candidate functional dependencies' left hand side, and execute functional dependencies discovery algorithm based on each candidate set in parallel, and get all the functional dependency eventually. We analyze the number of candidate functions with regard to different groups, and data shipment and load balance are taken into account when discovering functional dependencies. Experiments on real-world big datasets demonstrate that compared with previous discovering methods, our approach is more effective in efficiency.
出处 《计算机研究与发展》 EI CSCD 北大核心 2015年第2期282-294,共13页 Journal of Computer Research and Development
基金 国家"九七三"重点基础研究发展计划基金项目(2012CB316203) 国家自然科学基金项目(61472321 61033007) 国家"八六三"高技术研究发展计划基金项目(2012AA011004) 西北工业大学基础研究基金项目(3102014JSJ0005 3102014JSJ0013)
关键词 函数依赖发现 函数依赖 大数据 知识发现 并行计算 discovering functional dependencies functional dependencies big data knowledge discovery parallel computing
  • 相关文献

参考文献2

二级参考文献326

  • 1Nature. Big Data [EB/OL]. [2012-10-02]. http,//www. nature, com/news/specials/bigdata/index, html.
  • 2Bryant R E, Katz R H, Lazowska E D. Big-Data computing : Creating revolutionary breakthroughs in commerce, science, and society [R]. [2012-10-02]. http:// www. cra. org/ccc/docs/init/Big_Data, pdf.
  • 3Science. Special online collection: Dealing with data [EB/OL]. [2012-10-02]. http://www, sciencemag, org/site/ special/data/, 2011.
  • 4Agrawal D, Bernstein P, Bertino E, et al. Challenges and opportunities with big data A community white paper developed by leading researchers across the United States [R/OL]. [2012-10-02]. http://cra, org/ccc/docs/init/bigdata whitepaper, pdf.
  • 5Manyika J, Chui M, Brown B, et al. Big data: The next frontier for innovation, competition, and productivity [R/OL]. [ 2012-10-02 ]. http://www, mekinsey, corn/ Insights]MGI[Research/Teehnology _ and _ Innovation]Big _ data The next frontier for innovation.
  • 6World Economic Forum. Big data, big impact: New possibilities for international development [R/OL]. [2012- 10-02]. http://www3, weforum, org/docs/WEF TC MFS BigDataBigImpact_Briefing 2012. pdf.
  • 7Big Data Across the Federal Government [EB/OL]. [2012-10-02]. http://www, whitehouse, gov/sites/default/ files/microsites/ostp/big_data fact sheet_final_ 1. pdf.
  • 8UN Global Pulse. Big Data for Development:Challenges Opportunities [R/OL]. [ 2012-10-02 ]. http://www. unglobalpulse, org/proj ects/BigDataforDevelopment.
  • 9Times N Y. The age of big data fEB/OLd. [2012-10 -02]. http://www, nytimes, com/2012/02/12/sunday review/big- datas-impact in-the-world, html?pagewanted=all.
  • 10Grobelnik M. Big-data computing: Creating revolutionary breakthroughs in commerce, science, and society [R/OL]. [2012-10 -02]. http://videolectures, net/cswc2012_grobelnik_ big_data/.

共引文献2573

同被引文献50

  • 1刘念祖.数据库系统中视图问题的研究[J].计算机工程,1997,23(S1):89-91. 被引量:3
  • 2杨沁,唐伟,李建国.产品定制中客户需求冲突的高效消解研究[J].机械科学与技术,2015,34(1):94-98. 被引量:2
  • 3Garcia-Molina H,Labio W J.Efficient Snapshot Differential Algorithms for Data Warehousing[C]//Proceedings of IEEE International Conference on Very Large Data Bases.Washington D.C.,USA:IEEE Press,1996:63-74.
  • 4Labio W J,Garcia-Molina H.Comparing Very Large Database Snapshots[C]//Proceedings of IEEE International Conference on Very Large Data Bases.Washington D.C.,USA:IEEE Press,1995:995-1001.
  • 5Xie Baoyong.An Effective Algorithm for Calculating a Candidate Key of Relational Scheme[J].Modern Computer,2002,(5).
  • 6Fuxman A D,Miller R J.First-order Query Rewriting for Inconsistent Databases[C]//Proceedings of Database Theory-ICDT’05.Berlin,Germany:Springer,2005:337-351.
  • 7Huhtala Y,Krkkinen J,Porkka P,et al.TANE:An Efficient Algorithm for Discovering Functional and Approximate Dependencies[J].The Computer Journal,1999,42(2):100-111.
  • 8de Marchi F,Lopes S,Petit J M.Efficient Algorithms for Mining Inclusion Dependencies[C]//Proceedings of Advances in Database Technology-EDBT’02.Berlin,Germany:Springer,2002:464-476.
  • 9Feng Yucai.Algorithm for Solving Candidate Key by Graph Theory[J].Chinese Journal of Computers,1988,11(1):18.
  • 10冯玉才.候选关键字的图论求解法[J].计算机学报,1988,11(9):556-558.

引证文献9

二级引证文献40

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部