针对现有的网络社区挖掘算法在社区划分的质量不高及执行效率低的问题,提出了一种基于日志聚类的邮件网络社区挖掘算法LENCM(the log clustering based e-mail network community mining algorithm),算法根据日志聚类节点的密度变化确...针对现有的网络社区挖掘算法在社区划分的质量不高及执行效率低的问题,提出了一种基于日志聚类的邮件网络社区挖掘算法LENCM(the log clustering based e-mail network community mining algorithm),算法根据日志聚类节点的密度变化确定核心节点,构成日志连通子图并确定邮件网络社区划分的初始社区中心点和个数,采用错误注入的方式构造算子,并把执行后的日志与关联规则进行比较,借助社区中心动态调整方法将非核心节点划分至所属社区。实验证明基于日志聚类的邮件网络社区划分挖掘算法有较高的划分质量和较快的执行效率,具有一定的有效性和可行性。展开更多
DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the alg...DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the algorithm is inefficient when processing large scale data. The MR-CLOPE algorithm is proposed, which is an extension and improvement on CLOPE based on Map Reduce. Different from the previous parallel clustering method, a two-stage Map Reduce implementation framework is proposed. Each of the stage is implemented by one kind Map Reduce task. In the first stage, the DNS query logs are divided into multiple splits and the CLOPE algorithm is executed on each split. The second stage usually tends to iterate many times to merge the small clusters into bigger satisfactory ones. In these two stages, a novel partition process is designed to randomly spread out original sub clusters, which will be moved and merged in the map phrase of the second phase according to the defined merge criteria. In such way, the advantage of the original CLOPE algorithm is kept and its disadvantages are dealt with in the proposed framework to achieve more excellent clustering performance. The experiment results show that MR-CLOPE is not only faster but also has better clustering quality on DNS query logs compared with CLOPE.展开更多
文摘针对现有的网络社区挖掘算法在社区划分的质量不高及执行效率低的问题,提出了一种基于日志聚类的邮件网络社区挖掘算法LENCM(the log clustering based e-mail network community mining algorithm),算法根据日志聚类节点的密度变化确定核心节点,构成日志连通子图并确定邮件网络社区划分的初始社区中心点和个数,采用错误注入的方式构造算子,并把执行后的日志与关联规则进行比较,借助社区中心动态调整方法将非核心节点划分至所属社区。实验证明基于日志聚类的邮件网络社区划分挖掘算法有较高的划分质量和较快的执行效率,具有一定的有效性和可行性。
基金Project(61103046) supported in part by the National Natural Science Foundation of ChinaProject(B201312) supported by DHU Distinguished Young Professor Program,China+1 种基金Project(LY14F020007) supported by Zhejiang Provincial Natural Science Funds of ChinaProject(2014A610072) supported by the Natural Science Foundation of Ningbo City,China
文摘DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the algorithm is inefficient when processing large scale data. The MR-CLOPE algorithm is proposed, which is an extension and improvement on CLOPE based on Map Reduce. Different from the previous parallel clustering method, a two-stage Map Reduce implementation framework is proposed. Each of the stage is implemented by one kind Map Reduce task. In the first stage, the DNS query logs are divided into multiple splits and the CLOPE algorithm is executed on each split. The second stage usually tends to iterate many times to merge the small clusters into bigger satisfactory ones. In these two stages, a novel partition process is designed to randomly spread out original sub clusters, which will be moved and merged in the map phrase of the second phase according to the defined merge criteria. In such way, the advantage of the original CLOPE algorithm is kept and its disadvantages are dealt with in the proposed framework to achieve more excellent clustering performance. The experiment results show that MR-CLOPE is not only faster but also has better clustering quality on DNS query logs compared with CLOPE.