Automatic Network Protocol Automaton Extraction

Ming-Ming Xiao,Shun-Zheng Yu,Yu Wang
DOI: https://doi.org/10.1109/NSS.2009.71
2009-01-01
Abstract:Protocol reverse engineering, the process of (re)constructing the protocol context of communication sessions by an implementation, which involves translating a sequence of packets into protocol messages, grouping them into sessions, and modeling state transitions in the protocol state machine, is well-known to be invaluable for many network security applications, including intrusion prevention and detection, traffic normalization, and penetration testing, etc. However, current practice in deriving protocol specifications is either mostly manual or focusing on automatic reverse engineering the message format only and leaving the protocol state machine inverse undone. Although regular expressions offer superior expressive ability and flexibility, application protocols are described by regular expression manually based on sufficiently understanding protocol itself. At present there is not an effect method to realize classification, recognition and control automatically for the known applications and the unknown applications in future. In this paper a novel approach is presented to model network application specification. In this work, the whole automatic protocol reverse engineering is realized through accomplishing the protocol state machine, and then the FSMs are translated to corresponding regular expressions to enrich and update the pattern database. This approach uses grammatical inference and is motivated by the observation that an implementation of the protocol is inherently a state transition process, the state machine model the essence exactly. The important significance is to describe various state protocols with a common method through modeling the protocol state transition, including known and unknown ones. This approach had been implemented in the system and evaluated using real-world implementations of three different protocols: HTTP, SMTP, FTP, and compared the extracted protocol to the corresponding other newly system, such as l7-filter.
What problem does this paper attempt to address?