摘要
为了处理网络日志规模过大及其相关问题,并为后期日志分析提供简洁的数据源,提出一种多协议网络日志二次聚类方法。该方法采用划分网格的方式把网络日志进行网格内初次聚类,然后再依据相似度判断对初次聚类簇进行二次聚类,最后输出聚类后的日志记录及一些稀疏数据和孤立点数据。经实验测试证明,在不破坏网络日志的完整性和准确性,且不影响用户正常网络访问的前提下,该方法日志规模压缩效果显著,时间复杂度低以及能够处理实际的动态数据,实现增量式聚类。
To deal with large scale of Web log and related issues,and to provide brief data sources for the later log analysis,this paper proposed a method multi-protocol network log two-step clustering,which ploted every log into data grid and first clustering in the grid.Then according to similarity judgment,made the initial cluster grid secondary clustering.Finally output clustered log,some sparse data and outlier data.Through the test experiment,in the premise of ensuring the completeness and accuracy of log,and without affect the normal user network communication,the method can effectively compress log storage,reduce the time complexity and deal with actual dynamic data and realize incremental clustering.
出处
《计算机应用研究》
CSCD
北大核心
2012年第10期3929-3931,共3页
Application Research of Computers
基金
湖南省自然科学基金资助项目(11JJ6056)
关键词
网络日志分析
网格聚类
二次聚类
增量式聚类
Web log analysis
grid-based clustering
two-step clustering
incremental clustering