期刊文献+

抗噪的未知应用层协议报文格式最佳分段方法 被引量:16

Noise-Tolerant and Optimal Segmentation of Message Formats for Unknown Application-Layer Protocols
下载PDF
导出
摘要 为了自动解析未知应用层协议的报文格式,提出一种未知应用层协议报文格式的最佳分段方法.这种方法不需要关于未知应用层协议的先验知识.它首先建立一种用于最佳分段的隐半马尔可夫模型(HSMM),并利用未知应用层协议在网络会话过程中传输的报文序列样本集来估计该模型的参数;再通过基于HSMM的最大似然概率分段方法,对报文中的各个字段进行最佳划分,同时获取代表各个字段语义的关键词.这种方法并不要求训练集绝对纯净.它能够基于观测序列的似然概率分布,发现混杂在训练集中的其他协议数据(噪声)并进行有效过滤.实验结果表明,该方法能够解析文本和二进制协议的报文格式,依据关键词构建的协议识别特征有很高的准确识别率,并能有效地检测出噪声. In order to automatically parse message formats of unknown application-layer protocols, this paper proposes an approach to optimally segment the message formats without a priori knowledge. A hidden semi-Markov model (HSMM) is established for the segmentation and its parameters are estimated from a set of message sequences collected from application sessions. By using the estimated HSMM in the maximum most likely segmentation, a message can be optimally divided into segments and keywords that provide semantic information about the segments can be extracted. This approach does not require the training set to be absolutely pure. The noise mixed in the training set can be filtered out based on its likelihood fitting to the HSMM. The experiments conducted in this paper show that the approach is suited to both text and binary protocols. The application-layer signatures constructed from the extracted keywords are highly accurate in identifying the protocols, The noise mixed in the training set can be efficiently detected and automatically filtered out.
作者 黎敏 余顺争
出处 《软件学报》 EI CSCD 北大核心 2013年第3期604-617,共14页 Journal of Software
基金 国家自然科学基金(60970146) 国家自然科学基金-广东联合基金(U0735002) 国家高技术研究发展计划(863)(2007AA01Z449)
关键词 应用层协议 报文格式 分段 隐半马尔可夫模型 application-layer protocol message format segmentation hidden semi Markov model
  • 相关文献

参考文献3

二级参考文献31

  • 1刘立芳,霍红卫,王宝树.PHGA-COFFEE:多序列比对问题的并行混合遗传算法求解[J].计算机学报,2006,29(5):727-733. 被引量:11
  • 2金婷,王攀,张顺颐,陆青莲,陈东.基于DPI和会话关联技术的QQ语音业务识别模型和算法[J].重庆邮电学院学报(自然科学版),2006,18(6):789-792. 被引量:10
  • 3THOMAS K, ANDRE B, NEVIL B. File-sharing in the Intemet: a Characterization of P2P Traffic in the Backbone[R]. UC, Riverside, 2003.
  • 4SUBHABRATA S, OLIVER S, WANG D M. Accurate, scalable in network identification of P2P traffic using application signatures[A]. International World Wide Web Conference[C]. New York,2004.
  • 5KARAGIANNIS T, PAPAGIANNAKI K, FALOUTSOS M. BLINC: multilevel tratfic classification in the dark[A]. Proc of ACM SIGCOMM[C]. Philadelphia, PA, 2005.
  • 6KARAGIANNIS T, BROIDO A, FALOUTSOS M. Transport layer identification of P2P traffic[A]. Proc of ACM SIGCOMM IMC[C]. Taormina, Sicily, Italy, 2004.
  • 7ZANDER S, NGUYENI T, ARMITAGEI G.Self-learning IP traffic classification based on statistical flow characteristics[A]. Proc of PAM[C]. Boston, MA, 2005.
  • 8ZUEV D, MOORE A W. Traffic classification using a statistical approach[A]. Proc of PAM[C]. Boston, 2005.
  • 9HERN E NOBEL A B, SMITH F D. Statistical clustering of intemet communication patterns[A]. Proceedings of the 35th Symposium on the Interface of Computing Science and Statistics, Computing Science and Statistics[C]. 2003.
  • 10MOORE A W, ZUEV D. Discriminators for Use in Flow-Based Classification[R]. Intel Research, Cambridge, 2005.

共引文献103

同被引文献99

  • 1赵咏,姚秋林,张志斌,郭莉,方滨兴.TPCAD:一种文本类多协议特征自动发现方法[J].通信学报,2009,30(S1):28-35. 被引量:10
  • 2李伟,田野,赵保华,周颢.一种ABNF编码协议消息的通用解析方法[J].计算机工程,2006,32(13):141-143. 被引量:3
  • 3ZUEV D, MOORE A W. Traffic classification using statistical ap- proach[ C]//Proc of the 6th International Workshop on Passive and Active Network Measurement. Berlin: Springer-Verlag, 2005: 321- 324.
  • 4DAS R, EACHEMPATI S, MISHRA A K, et al. Design and evalua- tion of a hierarchical on-chip interconnect for next-generation CMPs [ C ]//Proc of the 15th International Conference on High-performance Computer Architecture. Washington DC: IEEE Computer Society, 2009 : 175-186.
  • 5ESTE A, GRINGOLI F, SALGARELLI L. Support vector machines for TCP traffic classification [ J ]. Computer Networks, 2009, 53 (14) :2476-2490.
  • 6BERNAILLE L, TEIXEIRA R, AKODKENOU L, et al. Traffic clas- sification on the fly[ J]. ACM SIGCOMM Computer Communica- tion Review,2006,36 ( 2 ) : 23- 26.
  • 7YAGI S, WAIZUMI Y, TSUNODA H, et al. A reliable network identification method based on transition pattern of payload length [ C ]//Proe of Global Telecommunications Conference. 2008 : 1- 5.
  • 8NGUYEN T, ARMITAGE G. A survey of techniques for internet traf- fic classification using machine learning [ C ]//Proc of Communica- tions Survey Tutorials. [ S. 1. ] :IEEE Press,2008:56-76.
  • 9HAFFNER P, SEN S, SPATSCHECK O, et al. ACAS: automated construction of application signatures [ C ]//Proc of the 1st Annual ACM SIGCOMM Workshop on Mining Network Data. 2005.
  • 10MA J, LEVCHENKO K, KREIBICH C, et al. Unexpected means of protocol inference [ C ]//Proc of the 6th ACM SIGCOMM Conference on Internet Measurement. 2006.

引证文献16

二级引证文献40

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部