As the information technology rapidly develops,many network applications appear and their communication protocols are unknown.Although many protocol keyword recognition based protocol reverse engineering methods have ...As the information technology rapidly develops,many network applications appear and their communication protocols are unknown.Although many protocol keyword recognition based protocol reverse engineering methods have been proposed,most of the keyword recognition algorithms are time consuming.This paper firstly uses the traffic clustering method F-DBSCAN to cluster the unknown protocol traffic.Then an improved CFSM(Closed Frequent Sequence Mining)algorithm is used to mine closed frequent sequences from the messages and identify protocol keywords.Finally,CFGM(Closed Frequent Group Mining)algorithm is proposed to explore the parallel,sequential and hierarchical relations between the protocol keywords and obtain accurate protocol message formats.Experimental results show that the proposed protocol formats extraction method is better than Apriori algorithm and Sequence alignment algorithm in terms of time complexity and it can achieve high keyword recognition accuracy.Additionally,based on the relations between the keywords,the method can obtain accurate protocol formats.Compared with the protocol formats obtained from the existing methods,our protocol format can better grasp the overall structure of target protocols and the results perform better in the application of protocol reverse engineering such as fuzzing test.展开更多
基金supported by the National Key R&D Subsidized Project with 2017YFB0802900.
文摘As the information technology rapidly develops,many network applications appear and their communication protocols are unknown.Although many protocol keyword recognition based protocol reverse engineering methods have been proposed,most of the keyword recognition algorithms are time consuming.This paper firstly uses the traffic clustering method F-DBSCAN to cluster the unknown protocol traffic.Then an improved CFSM(Closed Frequent Sequence Mining)algorithm is used to mine closed frequent sequences from the messages and identify protocol keywords.Finally,CFGM(Closed Frequent Group Mining)algorithm is proposed to explore the parallel,sequential and hierarchical relations between the protocol keywords and obtain accurate protocol message formats.Experimental results show that the proposed protocol formats extraction method is better than Apriori algorithm and Sequence alignment algorithm in terms of time complexity and it can achieve high keyword recognition accuracy.Additionally,based on the relations between the keywords,the method can obtain accurate protocol formats.Compared with the protocol formats obtained from the existing methods,our protocol format can better grasp the overall structure of target protocols and the results perform better in the application of protocol reverse engineering such as fuzzing test.