摘要
在当今大数据时代,MapReduce等大数据处理框架处理数据能力有限,其在处理有关图的数据时常常显得缓慢低效,典型如3-clique计数问题,故需要探究一种高效的算法处理这类clique计数问题。由于在前人文献中对3-clique计数问题已有深入探讨,故针对该问题的扩展版本(4-clique计数问题)进行探究。在一个启发式的想法下提出了基于邻边采样的概率采样算法,利用切诺夫不等式证明该算法在近似条件下只需要一定数量的采样器作为相对误差的性能保证。通过实验评估对比发现,相对于传统精确算法,概率采样算法虽然在结果上损失了少量的精度,但在算法运行时间和空间占用上具有巨大的优势。最后得出其在实际应用中具有巨大实践价值的结论。
In today’s big data era,big data processing frameworks such as MapReduce often appear slow and inefficient when processing data,specially related to graphs.Therefore,it is necessary to explore an efficient algorithm to handle this type of clique counting problem.Since the predecessor literatures have thoroughly explored the 3-clique counting,the extended version of the problem(the 4-clique counting problem)improves its position gradually.Under the guidance of a heuristic idea,this paper proposed a probability sampling algorithm based on neighboring edge sampling to solve the extended problem.With the usage of Chernoff inequality,the algorithm only needed a certain number of samplers as the performance guarantee of relative error under the approximate condition.Later,the experimental evaluation and comparison shows that the probability sampling algorithm loses a small amount of precision compared with the traditional precision algorithm,but it has great advantages in algorithm running time and space occupation.Finally,it comes to the conclusion that it has great practical value in practical applications.
作者
姜丽丽
李叶飞
豆龙龙
陈智麒
钱柱中
Jiang Lili;Li Yefei;Dou Longlong;Chen Zhiqi;Qian Zhuzhong(Jiangsu Frontier Electric Technology Co.Ltd.,Nanjing 210000,China;Dept.of Computer Science&Technology,Nanjing University,Nanjing 210023,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第12期3545-3551,共7页
Application Research of Computers
基金
国家自然科学基金面上项目(61872175)
江苏省自然科学基金面上项目(BK20181252)。