摘要
在大规模图结构数据中发现最稠密子图具有极其广泛的应用,如社区发现、垃圾邮件检测和论文引用关系抽取等。基于带标签的无向图,提出了查询标签集的概念,设计了一个可以快速发现最稠密子图的近似算法DSFLC(Densest Subgraph Finding based on Labelset Constraint):用户提交自定义的查询标签集,算法便可保证在用户可以接受的时间内返回满足查询标签集约束的最稠密子图。对于任何参数ε(ε>0),DSFLC算法只需扫描大规模数据集O(log1+εn)次,同时可保证算法的近似因子是2(1+ε)。对DSFLC算法进行分析后,发现该算法在预处理阶段易于并行化,因此选择Twitter Storm平台,并行化地实现了DSFLC算法。最后对从DBLP数据库中抽取的合作关系图进行测试,一方面研究Storm平台对算法的加速程度;另一方面分析挖掘出的子图的稠密度与参数ε之间的关系,最终验证了DSFLC算法的实用性和可扩展性。
In large scale graph, finding densest subgraph has a wide range of applications, such as community discovery, sparn detection and reference relation extraction. Based on tagged undirected graph, we introduced the concept of QLS and designed an approximation algorithm DSFLC which can quickly find the densest subgraph: users submit a QLS and the algorithm will return the densest subgraph under QLS within the time that user can accept. For any ε〉0, DSFLC only needs to scan large-scale data sets O(logl+en)times, and can ensure the approximation algorithm is a 2 (1+ε)- approximation algorithm. After analyzing DSFLC, we found this algorithm is easy to parallelize, so we chose Twitter Storm platform to parallel DSFLC algorithm. Finally, the test data sets extracted from the DBLP database verify DS- FLC' practicality and scalability.
出处
《计算机科学》
CSCD
北大核心
2014年第1期274-278,共5页
Computer Science