期刊文献+

基于FLINK的滑动窗口内三角形计数算法研究 被引量:2

Study of Triangle Counting Algorithm with Sliding Windows Based on FLINK
下载PDF
导出
摘要 三角形计数旨在计算图中全局三角形和局部三角形的数量,是图数据挖掘中的一类重要工作。三角形的数量被广泛应用于角色识别、推荐系统、社区发现、垃圾邮件和欺诈检测等领域。在以流形式给出的图中,边具有时间性,同时现实生活中的图存在着大量的重复边。为充分利用图中的时间信息以挖掘网络知识,研究在多图流上计算滑动窗口内全局和局部三角形数量的问题,使用窗口机制同时研究多个窗口以利用隐含的时间关系获取更多信息。文中提出基于FLINK窗口操作的三角形计数算法和基于滑动窗口的三角形增量计数算法,以现有的边采样工作为基础,使用边集存储窗口历史数据实现一遍流计算,从而准确地计算面向多图流的滑动窗口内全局和局部三角形数量。基于FLINK窗口操作的三角形计数算法使用FLINK提供的窗口机制,基于滑动窗口的三角形增量计数算法,通过计算窗口滑入和滑出数据来实现窗口计数,避免了相邻两个窗口间重合边的大量重复计算,无缝地处理多个时间窗口,对于滑入和滑出数据中的重复数据,使用去重机制来进一步减小计算量。理论证明两种算法可以实现滑动窗口内三角形准确计数,并通过实验分析了窗口大小、滑动距离、数据分布和数据流速等因素对窗口处理时间的影响。与TRIEST算法相比,当窗口较小时,基于FLINK窗口操作的三角形计数算法和基于滑动窗口的三角形增量计数算法速度更快;当窗口较大时,保证了计算结果的准确性。 Triangle Counting,calculating global and local triangle counts,is an important work in data mining,whose number is widely used in important role identification,recommendation systems,community discovery,spam and fraud detection etc.In the graphs presented as a stream of edges,edges are temporal,and there are a large of duplicate edges in real-world graphs.For full use of the time information in the graph and mining the network knowledge,this work studied the problem of estimating global and local triangle counts on a multigraph stream with sliding windows,that simultaneously studied multiple windows by window mechanism to obtain more information in implicit time relationships.A triangle counting algorithm based on FLINK window ope-ration and a triangle increment counting algorithm based on sliding window are proposed.Like the existing edge sampling work,the edge set is used to store window history data for accurately calculating global and local triangle counts on a multigraph stream with sliding windows in one-pass.The triangle counting algorithm based on FLINK window operator uses the window mechanism provided by FLINK.While the triangle increment counting algorithm based on sliding window realizes window counting by calculating slide-in and slide-out data through the window,reducing a large number of repeated calculations of coincident edges between adjacent windows,seamlessly processing multiple time windows,and for duplicate data in slide-in and slide-out,uses a deduplication mechanism to further reduce the calculation.The theory has been proven that both the algorithms can accurately count the triangles in the sliding window,and the effects of window size,sliding distance,data distribution and data flow rate on window processing time were analyzed through experiments.Compared with the TRIEST algorithm,when the window is small,the triangle counting algorithm based on FLINK window operation and the triangle increment counting algorithm based on sliding window have faster speed.When the window is large,the accuracy of the calculation result is guaranteed.
作者 王旭 杨晓春 WANG Xu;YANG Xiao-chun(School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China;School of Computer and Engineering,Northeastern University,Shenyang 110819,China)
出处 《计算机科学》 CSCD 北大核心 2020年第10期83-90,共8页 Computer Science
基金 国家自然科学基金(61572122,61532021)。
关键词 三角形计数 滑动窗口 FLINK 图流挖掘 准确流算法 Triangle counting Sliding window FLINK Graph stream mining Accurate stream algorithm
  • 相关文献

同被引文献11

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部