基于Twitter Storm平台并行挖掘最稠密子图被引量：1

Parallel Mining of Densest Subgraph Based on Twitter Storm

下载PDF

导出

摘要在大规模图结构数据中发现最稠密子图具有极其广泛的应用,如社区发现、垃圾邮件检测和论文引用关系抽取等。基于带标签的无向图,提出了查询标签集的概念,设计了一个可以快速发现最稠密子图的近似算法DSFLC(Densest Subgraph Finding based on Labelset Constraint):用户提交自定义的查询标签集,算法便可保证在用户可以接受的时间内返回满足查询标签集约束的最稠密子图。对于任何参数ε(ε>0),DSFLC算法只需扫描大规模数据集O(log1+εn)次,同时可保证算法的近似因子是2(1+ε)。对DSFLC算法进行分析后,发现该算法在预处理阶段易于并行化,因此选择Twitter Storm平台,并行化地实现了DSFLC算法。最后对从DBLP数据库中抽取的合作关系图进行测试,一方面研究Storm平台对算法的加速程度;另一方面分析挖掘出的子图的稠密度与参数ε之间的关系,最终验证了DSFLC算法的实用性和可扩展性。 In large scale graph, finding densest subgraph has a wide range of applications, such as community discovery, sparn detection and reference relation extraction. Based on tagged undirected graph, we introduced the concept of QLS and designed an approximation algorithm DSFLC which can quickly find the densest subgraph： users submit a QLS and the algorithm will return the densest subgraph under QLS within the time that user can accept. For any ε〉0, DSFLC only needs to scan large-scale data sets O（logl＋en）times, and can ensure the approximation algorithm is a 2 （1＋ε）- approximation algorithm. After analyzing DSFLC, we found this algorithm is easy to parallelize, so we chose Twitter Storm platform to parallel DSFLC algorithm. Finally, the test data sets extracted from the DBLP database verify DS- FLC＇ practicality and scalability.

作者王金明王远方

机构地区东南大学计算机科学与工程学院

出处《计算机科学》 CSCD 北大核心 2014年第1期274-278,共5页 Computer Science

关键词最稠密子图发现查询标签集 DSFLC算法 TWITTER Storm平台 Densest subgraph finding, QLS,DSFLC algorithm,Twitter storm platform

分类号 TP392 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献15

1Yon D, Filippo G, Marco P. Extraction and classification of dense communities in the Web[C]//WWW 2007. 2007:461-470.
2Newman M E J. Modularity and community structure in net- works[C]//PNAS 2006. 2006: 8577-8582.
3William F G, Steve L, Lee G C. Efficient identification of Webcommunities[C]//KDD 2000. 2000:150-160.
4https: / /github. eom/nathanmarz/storm.
5http://www, dblp. org/db/.
6Gotdberg A V. Finding a maximum density Subgraph[R]. UCB/ CSD-84-171. EECS Department, University of California, Berke- ley,1984.
7Charikar M. Greedy approximation algorithms for finding dense components in a graph[C] // APPROX 2000. 2000:84-90.
8LaMer F. Combinatorial Optimization : Networks and Matroids [M]. Holt, Rinehart, and Winston, 1976.
9Gibson D, Kumar R, Tomkins A. Discovering large dense sub- graphs in massive graphs[C]//VLDB 2005. 2005:721-732.
10Bahmani B, umar R. Sergei Vassilvitskii: Densest Subgraph in Streaming and MapReduce[C]//VLDB 2012. 2012:454-465.

同被引文献13

1钱肖鲁,朱建秋,朱扬勇.DMVisualMiner:一个可视化数据挖掘分析平台[J].计算机工程,2003,29(z1):148-150. 被引量：5
2Mayer-SehnbergerV.大数据时代:生活,工作与思维的大变革[M].盛杨燕,周涛,译.杭州:浙江人民出版社,2012.
3中国人民银行.2014年第四季度支付体系运行总体情况lEB/OL].12015-02-13].http://www.pcac.org.cn/file/File/1424119020.pdf.
4Yunus M. Building social business: The new kind of capitalism that serves humanity' s most pressing needs l M 1 ~ Philadelphia : Public Affairs, 2011:2 - 17.
5Leung L. Generational differences in content generation in social media: The roles of the gratifications sought and of narcissism l J ]. Computers in Human Behavior, 2013,29(3) :997 - 1006.
6Jonathan L, Gabriel E, Dario S. Getting started with Storml M ]. Athens : O' Reily Media, Inc ,2012.
7Borthakur D, Sarma J S, Gray J. et al. Apache hadoop goes real- time at Facebook l C ]//Proceedings of the ACM SIGMOD Interna- tional Conference on Management of Data (SIGMOD 2011 and PODS 2011 ). Athens: ACM Press, 2011:1071 - 1080.
8张玉峰,何超.基于Web评论挖掘的动态竞争情报分析研究(上)——问题分析与模型构建[J].情报理论与实践,2012,35(6):63-66. 被引量：10
9孙大为,张广艳,郑纬民.大数据流式计算:关键技术及系统实例[J].软件学报,2014,25(4):839-862. 被引量：311
10李德仁,姚远,邵振峰.智慧城市中的大数据[J].武汉大学学报（信息科学版）,2014,39(6):631-640. 被引量：407

引证文献1

1黄晓斌,张兴旺.网络动态数据挖掘模式及其关键技术[J].图书情报工作,2015,59(10):21-28. 被引量：4

二级引证文献4

1耿元骊.基于数据挖掘的历史学者社交网络行为分析与学思历程发现[J].科研信息化技术与应用,2015,6(6):33-39.
2袁雅倩,张斌.基于四维度模型的视频网站服务创新研究——以爱奇艺为例[J].信息资源管理学报,2017,7(1):97-105. 被引量：1
3杨展.数据流挖掘及其主要解决问题[J].电子技术与软件工程,2018(1):154-154.
4王路路.基于改进遗传算法的孪生工厂AGV路径规划方法[J].软件,2023,44(5):76-81. 被引量：1

1张师林,李和平,张树武.稠密子图发现的视频语义挖掘方法[J].计算机工程与应用,2011,47(33):13-14. 被引量：3
2宋喜忠,刘康明.基于k-subgraph算法的社交网络隐私保护研究[J].科技通报,2015,31(7):119-121. 被引量：2
3王龙,李晓光,钟绍春.基于K-近邻法及移动agent技术的垃圾邮件检测系统研究[J].计算机应用研究,2009,26(7):2630-2632. 被引量：3
4秦玉平,耿姝,孙宗宝.基于C-SVM和KPCA的垃圾邮件检测研究[J].计算机工程与应用,2010,46(19):94-96. 被引量：3
5王新艳.图像型垃圾邮件检测技术的研究[J].电脑编程技巧与维护,2015(17):86-87.
6邱明明,吴国新.基于P2P的协作式垃圾邮件检测系统[J].计算机工程与设计,2007,28(11):2559-2562. 被引量：5
7刘加财,尚学群,孟雅,王淼.基于不确定性PPI网络的最大稠密子图挖掘[J].计算机应用研究,2011,28(11):4134-4137. 被引量：1
8胡婕,业宁,罗晓波,崔静,董程玲.多序列的近似LCS改进算法[J].计算机工程,2011,37(2):166-168. 被引量：4
9王先超,王康喆,王春生,孙娓娓.三值光计算机运算器网的拓扑性质[J].计算机工程与应用,2016,52(4):84-87.
10鲁继文.基于Scrapy的论文引用爬虫的设计与实现[J].现代计算机,2017,23(6):131-133. 被引量：2

计算机科学

2014年第1期

浏览历史

内容加载中请稍等...

基于Twitter Storm平台并行挖掘最稠密子图被引量：1

参考文献15

同被引文献13

引证文献1

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

基于Twitter Storm平台并行挖掘最稠密子图 被引量：1

参考文献15

同被引文献13

引证文献1

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

基于Twitter Storm平台并行挖掘最稠密子图被引量：1