摘要
针对大规模网络图中存在的大量4节点子图数量难以精确统计分析的问题,提出了一种大规模网络图中4节点子图数量快速估计算法(SmartMoss)。该算法通过随机变量方差分析技术对比3路径采样算法(3PS)和中心3路径采样算法(C3PS)两种前沿算法的估计误差得出其各自不同的适用范围,进而通过计算被测网络图权重密度分布与误差实时选择使用3PS算法或C3PS算法对网络图中4节点子图进行快速采样,通过采样比例混合3PS算法与C3PS算法的估计结果实现对网络图中各4节点子图出现数量的快速估计。实验结果表明,在同等估计误差下提出的SmartMoss算法比已有3PS算法和C3PS算法快10倍以上。SmartMoss算法可以实现对大规模网络图中4节点子图数量进行快速准确的估计,同时为网络社团演化和恶意代码检测等实际应用提供一定的理论参考。
A fast method(SmartMoss)for efficiently estimating the frequencies of 4-node graph-lets in large-scale networks is proposed to address the challenge of exactly counting the frequencies of 4-node graphlets occurring in large-scale networks.SmartMoss compares the estimation errors of the two state-of-the-art algorithms,3PS(3-Path Sampling)and C3PS(Center 3-Path Sampling),through random variables variance analysis to obtain their different application ranges,and then selects either 3PS algorithm or C3PS algorithm for 4-node graphlets sampling by calculating the distribution of network weight densities and real-time errors,to achieve quick estimation of the frequencies of 4-node graphlets by combing the estimation results of 3PS and C3PS based on the sampling proportions.Experimental results show that SmartMoss is more than 10 times faster than the state-of-the-art algorithms 3PS and C3PS under the same estimation error.SmartMoss efficiently and accurately estimates the frequencies of 4-node graph-lets for large network graphs,and provides theoretical reference for practical applications such as network community evolution and malicious code detection.
作者
覃遵颖
孙雨
李国栋
齐怀睿
陶敬
QIN Zunying;SUN Yu;LI Guodong;QI Huairui;TAO Jing(Ministry of Education Key Lab for Intelligent Networks and Network Security,Xi’an Jiaotong University, Xi’an 710049,China;Center of Network,Xi’an Jiaotong University,Xi’an 710049,China;School of Electronic and Information Engineering,Xi’an Jiaotong University,Xi’an 710049,China)
出处
《西安交通大学学报》
EI
CAS
CSCD
北大核心
2018年第12期57-62,共6页
Journal of Xi'an Jiaotong University
基金
深圳市基础研究计划资助项目(JCYJ20160229195940462)
陕西省自然科学基础研究计划资助项目(2016JQ6034
2017JM6095)
浙江省自然科学基金资助项目(LGG18F020016)
关键词
网络图
采样算法
子图数量估计
network graph
sampling algorithm
graphlet counting