摘要
基于模拟随机流的马尔科夫分类算法(Markov Cluster Algorithm,MCL)是一种快速且可扩展的无监督图分割算法,在用户行为异常检测中具有广泛的应用,但时间复杂度为O(N3),不利于处理海量数据。为提高分割质量,同时减少计算时间,文章提出了一种改进的MCL模型。采用调整互信息(Adjusted Mutual Information,AMI)指标,对不同时间的图分割结果的相似度进行比较,判断是否有异常发生。实验表明,相较多层图分割算法(METIS),文章所提出的改进的MCL模型具有以下优点:1)无需事先规定聚类的数目;2)不易被数据中的拓扑噪声所影响;3)适合处理长尾分布的数据;4)在计算时间一定的情况下,能获得质量较高的分割结果 。
The MCL algorithm is short for Markov Cluster Algorithm, a fast and scalable unsupervised partitioning algorithm for graphs. MCL has been widely applied to anomaly detection. However, it requires O(N^3) time, Which is no good for a wide range of data processing. In order to improve the quality of clustering and save time consumption, an improved MCL algorithm is proposed. With AMI(Adjusted Mutual Information) index, the similarities of clusters of different periods are compared to determine whether the abnormal behavior occurs. Compared with multilevel k-way partitioning scheme(METIS) for graphs, experiments show that the proposed MCL algorithm has following strengths: 1) Number of clusters not specified ahead of time. 2) Robust against noise in graph data. 3) Suitable for clusters with long tail distribution. 4) Produce better clustering results in case of certain time consumption.
出处
《信息网络安全》
2016年第6期35-40,共6页
Netinfo Security
基金
国家高技术研究发展计划(国家863计划)[2013AA01A214]