As the information technology rapidly develops,many network applications appear and their communication protocols are unknown.Although many protocol keyword recognition based protocol reverse engineering methods have ...As the information technology rapidly develops,many network applications appear and their communication protocols are unknown.Although many protocol keyword recognition based protocol reverse engineering methods have been proposed,most of the keyword recognition algorithms are time consuming.This paper firstly uses the traffic clustering method F-DBSCAN to cluster the unknown protocol traffic.Then an improved CFSM(Closed Frequent Sequence Mining)algorithm is used to mine closed frequent sequences from the messages and identify protocol keywords.Finally,CFGM(Closed Frequent Group Mining)algorithm is proposed to explore the parallel,sequential and hierarchical relations between the protocol keywords and obtain accurate protocol message formats.Experimental results show that the proposed protocol formats extraction method is better than Apriori algorithm and Sequence alignment algorithm in terms of time complexity and it can achieve high keyword recognition accuracy.Additionally,based on the relations between the keywords,the method can obtain accurate protocol formats.Compared with the protocol formats obtained from the existing methods,our protocol format can better grasp the overall structure of target protocols and the results perform better in the application of protocol reverse engineering such as fuzzing test.展开更多
With the rapid development of the Internet,a large number of private protocols emerge on the network.However,some of them are constructed by attackers to avoid being analyzed,posing a threat to computer network securi...With the rapid development of the Internet,a large number of private protocols emerge on the network.However,some of them are constructed by attackers to avoid being analyzed,posing a threat to computer network security.The blockchain uses the P2P protocol to implement various functions across the network.Furthermore,the P2P protocol format of blockchain may differ from the standard format specification,which leads to sniffing tools such as Wireshark and Fiddler not being able to recognize them.Therefore,the ability to distinguish different types of unknown network protocols is vital for network security.In this paper,we propose an unsupervised clustering algorithm based on maximum frequent sequences for binary protocols,which can distinguish various unknown protocols to provide support for analyzing unknown protocol formats.We mine the maximum frequent sequences of protocolmessage sets in bytes.Andwe calculate the fuzzymembership of the protocolmessage to each maximum frequent sequence,which is based on fuzzy set theory.Then we construct the fuzzy membership vector for each protocol message.Finally,we adopt K-means++to split different types of protocol messages into several clusters and evaluate the performance by calculating homogeneity,integrity,and Fowlkes and Mallows Index(FMI).Besides,the clustering algorithms based onNeedleman–Wunsch and the fixed-length prefix are compared with the algorithm presented in this paper.Compared with these traditional clustering methods,we demonstrate a certain improvement in the clustering performance of our work.展开更多
Packet analysis is very important in our digital life. But what protocol analyzers can do is limited because they can only process data in determined format. This paper puts forward a solution to decode raw data in an...Packet analysis is very important in our digital life. But what protocol analyzers can do is limited because they can only process data in determined format. This paper puts forward a solution to decode raw data in an unknown format. It is certain that data can be cut into packets because there are usually characteristic bit sequences in packet headers. The key to solve the problem is how to find out those characteristic sequences. We present an efficient way of bit sequence enumeration. Both Aho-Corasick (AC) algorithm and data mining method are used to reduce the cost of the process.展开更多
基金supported by the National Key R&D Subsidized Project with 2017YFB0802900.
文摘As the information technology rapidly develops,many network applications appear and their communication protocols are unknown.Although many protocol keyword recognition based protocol reverse engineering methods have been proposed,most of the keyword recognition algorithms are time consuming.This paper firstly uses the traffic clustering method F-DBSCAN to cluster the unknown protocol traffic.Then an improved CFSM(Closed Frequent Sequence Mining)algorithm is used to mine closed frequent sequences from the messages and identify protocol keywords.Finally,CFGM(Closed Frequent Group Mining)algorithm is proposed to explore the parallel,sequential and hierarchical relations between the protocol keywords and obtain accurate protocol message formats.Experimental results show that the proposed protocol formats extraction method is better than Apriori algorithm and Sequence alignment algorithm in terms of time complexity and it can achieve high keyword recognition accuracy.Additionally,based on the relations between the keywords,the method can obtain accurate protocol formats.Compared with the protocol formats obtained from the existing methods,our protocol format can better grasp the overall structure of target protocols and the results perform better in the application of protocol reverse engineering such as fuzzing test.
基金National Natural Science Foundation of China under Grant No.61872111Sichuan Science and Technology Program(No.2019YFSY0049)the“Project for the Development and Application of Safety Testing and Verification Platform for Industrial Robots”of the Ministry of Industry and Information Technology.
文摘With the rapid development of the Internet,a large number of private protocols emerge on the network.However,some of them are constructed by attackers to avoid being analyzed,posing a threat to computer network security.The blockchain uses the P2P protocol to implement various functions across the network.Furthermore,the P2P protocol format of blockchain may differ from the standard format specification,which leads to sniffing tools such as Wireshark and Fiddler not being able to recognize them.Therefore,the ability to distinguish different types of unknown network protocols is vital for network security.In this paper,we propose an unsupervised clustering algorithm based on maximum frequent sequences for binary protocols,which can distinguish various unknown protocols to provide support for analyzing unknown protocol formats.We mine the maximum frequent sequences of protocolmessage sets in bytes.Andwe calculate the fuzzymembership of the protocolmessage to each maximum frequent sequence,which is based on fuzzy set theory.Then we construct the fuzzy membership vector for each protocol message.Finally,we adopt K-means++to split different types of protocol messages into several clusters and evaluate the performance by calculating homogeneity,integrity,and Fowlkes and Mallows Index(FMI).Besides,the clustering algorithms based onNeedleman–Wunsch and the fixed-length prefix are compared with the algorithm presented in this paper.Compared with these traditional clustering methods,we demonstrate a certain improvement in the clustering performance of our work.
文摘Packet analysis is very important in our digital life. But what protocol analyzers can do is limited because they can only process data in determined format. This paper puts forward a solution to decode raw data in an unknown format. It is certain that data can be cut into packets because there are usually characteristic bit sequences in packet headers. The key to solve the problem is how to find out those characteristic sequences. We present an efficient way of bit sequence enumeration. Both Aho-Corasick (AC) algorithm and data mining method are used to reduce the cost of the process.