期刊文献+

基于Twitter Storm平台并行挖掘最稠密子图 被引量:1

Parallel Mining of Densest Subgraph Based on Twitter Storm
下载PDF
导出
摘要 在大规模图结构数据中发现最稠密子图具有极其广泛的应用,如社区发现、垃圾邮件检测和论文引用关系抽取等。基于带标签的无向图,提出了查询标签集的概念,设计了一个可以快速发现最稠密子图的近似算法DSFLC(Densest Subgraph Finding based on Labelset Constraint):用户提交自定义的查询标签集,算法便可保证在用户可以接受的时间内返回满足查询标签集约束的最稠密子图。对于任何参数ε(ε>0),DSFLC算法只需扫描大规模数据集O(log1+εn)次,同时可保证算法的近似因子是2(1+ε)。对DSFLC算法进行分析后,发现该算法在预处理阶段易于并行化,因此选择Twitter Storm平台,并行化地实现了DSFLC算法。最后对从DBLP数据库中抽取的合作关系图进行测试,一方面研究Storm平台对算法的加速程度;另一方面分析挖掘出的子图的稠密度与参数ε之间的关系,最终验证了DSFLC算法的实用性和可扩展性。 In large scale graph, finding densest subgraph has a wide range of applications, such as community discovery, sparn detection and reference relation extraction. Based on tagged undirected graph, we introduced the concept of QLS and designed an approximation algorithm DSFLC which can quickly find the densest subgraph: users submit a QLS and the algorithm will return the densest subgraph under QLS within the time that user can accept. For any ε〉0, DSFLC only needs to scan large-scale data sets O(logl+en)times, and can ensure the approximation algorithm is a 2 (1+ε)- approximation algorithm. After analyzing DSFLC, we found this algorithm is easy to parallelize, so we chose Twitter Storm platform to parallel DSFLC algorithm. Finally, the test data sets extracted from the DBLP database verify DS- FLC' practicality and scalability.
出处 《计算机科学》 CSCD 北大核心 2014年第1期274-278,共5页 Computer Science
关键词 最稠密子图发现 查询标签集 DSFLC算法 TWITTER Storm平台 Densest subgraph finding, QLS,DSFLC algorithm,Twitter storm platform
  • 相关文献

参考文献15

  • 1Yon D, Filippo G, Marco P. Extraction and classification of dense communities in the Web[C]//WWW 2007. 2007:461-470.
  • 2Newman M E J. Modularity and community structure in net- works[C]//PNAS 2006. 2006: 8577-8582.
  • 3William F G, Steve L, Lee G C. Efficient identification of Webcommunities[C]//KDD 2000. 2000:150-160.
  • 4https: / /github. eom/nathanmarz/storm.
  • 5http://www, dblp. org/db/.
  • 6Gotdberg A V. Finding a maximum density Subgraph[R]. UCB/ CSD-84-171. EECS Department, University of California, Berke- ley,1984.
  • 7Charikar M. Greedy approximation algorithms for finding dense components in a graph[C] // APPROX 2000. 2000:84-90.
  • 8LaMer F. Combinatorial Optimization : Networks and Matroids [M]. Holt, Rinehart, and Winston, 1976.
  • 9Gibson D, Kumar R, Tomkins A. Discovering large dense sub- graphs in massive graphs[C]//VLDB 2005. 2005:721-732.
  • 10Bahmani B, umar R. Sergei Vassilvitskii: Densest Subgraph in Streaming and MapReduce[C]//VLDB 2012. 2012:454-465.

同被引文献13

  • 1钱肖鲁,朱建秋,朱扬勇.DMVisualMiner:一个可视化数据挖掘分析平台[J].计算机工程,2003,29(z1):148-150. 被引量:5
  • 2Mayer-SehnbergerV.大数据时代:生活,工作与思维的大变革[M].盛杨燕,周涛,译.杭州:浙江人民出版社,2012.
  • 3中国人民银行.2014年第四季度支付体系运行总体情况lEB/OL].12015-02-13].http://www.pcac.org.cn/file/File/1424119020.pdf.
  • 4Yunus M. Building social business: The new kind of capitalism that serves humanity' s most pressing needs l M 1 ~ Philadelphia : Public Affairs, 2011:2 - 17.
  • 5Leung L. Generational differences in content generation in social media: The roles of the gratifications sought and of narcissism l J ]. Computers in Human Behavior, 2013,29(3) :997 - 1006.
  • 6Jonathan L, Gabriel E, Dario S. Getting started with Storml M ]. Athens : O' Reily Media, Inc ,2012.
  • 7Borthakur D, Sarma J S, Gray J. et al. Apache hadoop goes real- time at Facebook l C ]//Proceedings of the ACM SIGMOD Interna- tional Conference on Management of Data (SIGMOD 2011 and PODS 2011 ). Athens: ACM Press, 2011:1071 - 1080.
  • 8张玉峰,何超.基于Web评论挖掘的动态竞争情报分析研究(上)——问题分析与模型构建[J].情报理论与实践,2012,35(6):63-66. 被引量:10
  • 9孙大为,张广艳,郑纬民.大数据流式计算:关键技术及系统实例[J].软件学报,2014,25(4):839-862. 被引量:311
  • 10李德仁,姚远,邵振峰.智慧城市中的大数据[J].武汉大学学报(信息科学版),2014,39(6):631-640. 被引量:407

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部