摘要
大型信息系统的日志数据规模呈现快速增长趋势,导致线速压缩与存储大规模日志数据成为当今数据管理的一大挑战。对大量的网络系统日志进行了研究,发现日志数据存在冗余的结构模式,在内容上存在时间局部相似性。提出了基于模板的细粒度日志差分压缩架构,针对具体日志数据,可配置与其相适应的细粒度差分策略。实验结果表明,与gzip工具相比,所提日志压缩系统在压缩速度上提高了2~10倍,压缩率比gzip更低,可达到10%。
The scale of log data produced by the large scale information system is growing rapidly. It leads to the big challenge of line-speed compressing and saving the large scale log data. By analysis on massive network log data, it is found that the log data has redundant pattern in terms of log structure and time local similarity in terms of log content. A differential log compression architecture based on template is proposed. Fine-grained differential compressive strategies in the architecture can be configured for a special log data. Experimental results show that, compared with gizp, the proposed log compressing architecture improves 2~10 times' compressive speed and gain a better compressing ratio approaching to 10%.
出处
《通信学报》
EI
CSCD
北大核心
2015年第S1期197-202,共6页
Journal on Communications
基金
中科院战略性先导科技专项基金资助项目(XDA06031000)~~
关键词
日志
差分压缩
细粒度
模板
log
differential compression
fine grain
template