期刊文献+

基于两种子结构感知的社交网络Graphlets采样估计算法

Estimating Graphlets via Two Common Substructures Aware Sampling in Social Networks
下载PDF
导出
摘要 graphlets是指大规模网络中节点数目较少的连通诱导子图,在社交网络和生物信息学领域有着广泛的应用。由于精确计数的计算成本较高,目前大多采用随机游走采样算法来近似估计graphlets的频率。随着节点数目的增多,graphlets的种类数增长迅速且结构变化复杂,快速估计大规模网络中所有种类的graphlets的频率是一项挑战。文中提出了基于两种子结构的随机游走采样算法CSRW2来估计graphlets频率,即给定graphlets节点数k(k=4,5),通过采样k-graphlets的子结构(k-1)-path和3-star得到两种样本,之后用比例放大法综合,以高效估计graphlets并适应graphlets结构的复杂变化。实验结果表明,CSRW2能以统一的框架估计所有k-graphlets类型的频率,其估计精度优于现有代表性算法,更适用于频率较低且结构较稠密的graphlets。例如,用CSRW2估计真实网络sofb-Penn94中的5-graphlets,当样本数为2万时,标准均方根误差的平均值由WRW算法的0.8降低至CSRW2算法的0.22左右。 Graphlets refer to the connected induced subgraphs with small amount of nodes in large-scale network,and have extensive applications in social networks and bioinformatics.Due to extremely high computational costs of exactly counting graphlets,approximately estimating graphlets concentrations via random walk sampling algorithms already becomes the mainstream approach.As node size k increases,the number of k -graphlets increases rapidly and their structures change dramatically,so it is a challenge to quickly estimate the relative frequency of all types of graphlets (graphlet concentrations) in a large-scale network.Aiming at this problem,this paper proposed a novel sampling algorithm,namely common substructures path and 3-star based graphlets sampling via random walk (CSRW2),to efficiently estimate graphlets concentrations.Given k ( k =4,5),apart from sampling path via random walk,CSRW2 also samples ano- ther substructure 3-star,and then derives graphlets concentrations by proportional amplification to find the dense graphlets with less appearance more efficiently and adapt to the complex structural changes.Experimental evaluations on real networks demonstrate that CSRW2 can estimate k -graphlets in a uniform framework.CSRW2 outperforms the representative methods in terms of accuracy and is more accurate for the k -graphlets with more edges and less appearances in graphs.For example,when 5-graphlets in sofb-Penn94 is estimated,the average NRMSE of all 5-graphlets is decreased to 0.22 via CSRW2 in contrast to 0.8 obtained by WRW.
作者 赵倩倩 吕敏 许胤龙 ZHAO Qian-qian;LV Min;XU Yin-long(Anhui Key Laboratory on High Performance Computing,School of Computer Science and Technology,University of Science and Technology of China,Hefei 230026,China)
出处 《计算机科学》 CSCD 北大核心 2019年第3期314-320,共7页 Computer Science
基金 国家自然科学基金面上项目(61672486)资助
关键词 社交网络 Graphlet Graphlet频率 随机游走 采样算法 无偏估计 Social network Graphlet Graphlet concentration Random walk Sampling algorithm Unbiased estimation
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部