摘要
随着移动通信技术和互联网的飞速发展,移动通信设备已经成为大多数人随身携带的工具,这些设备之间因互相通信而产生的数据构成了通信网络。文中提出了一种针对海量通信数据的频繁通信子图并行挖掘算法PMFCS。该算法在频繁项目集挖掘思想和子图连接规则的基础上,利用并行计算框架Spark将所有的图以边为单位分布到各个计算节点,在各个节点统计1阶候选频繁子图,再通过汇总候选子图得到1阶频繁子图。PMFCS算法通过迭代地连接k-1阶子图和1阶子图生成k阶候选子图,再计算k阶候选子图的频繁度,直至k阶频繁子图集合为空集。实验结果表明,该算法可以快速、有效地解决频繁通信关系的挖掘问题。
With the rapid development of mobile communication technology and Internet,mobile communication equipment has become a portable tool for most people.A parallel algorithm PMFCS was proposed for mining frequent communication sub-graph of mass communication data.The algorithm is based on the Apriori algorithm and sub-graph connect principle.It uses Spark to distribute all the edges to each computing node,then the 1 th-order frequent candidate sub-graphs are distributed to each node,the 1 th-order frequent candidate sub-graphs are counted at each node,and the1 th-order sub-graphs are got by summarizing candidate sub-graphs.PMFCS iteratively connects the(k-1)th-order subgraph and the 1 th-order sub-graph to generate kth-order candidate sub-graphs.Subsequently,the algorithm terminates until the kth-order frequent sub-graph set is empty.The experimental results show that PMFCS can mine the frequent communication sub-graph efficiently and quickly.
出处
《计算机科学》
CSCD
北大核心
2018年第2期103-108,共6页
Computer Science
基金
国家自然科学基金项目:云计算环境下顾及用户关系的手机用户时空轨迹模式挖掘方法研究(41471371)资助
关键词
通信网络
频繁子图
频繁通信关系
Communication network
Frequent sub-graph
Frequent communication relationship