Communication protocols define how systems exchange information and coordinate actions. In security contexts, understanding these protocols is crucial for tasks such as finding vulnerabilities and analyzing malware. In practice, protocol specifications are often missing, outdated, or incomplete, forcing analysts to reverse engineer protocol behavior. Manual protocol reverse engineering is slow and requires significant expertise, while current automatic approaches are ill-equipped to deal with real protocols, due to state explosion, long execution times, or limited code coverage.
In this seminar, we present PALI, an automated system for learning protocols directly from target systems. PALI employs LLM-driven hybrid analysis to create and expand protocol state machines. It then validates the state machines by combining path-level testing with a minimal consistent DFA inference algorithm, based on carefully selected positive and negative examples. This process yields a minimal and reliable state machine consistent with the target system. We evaluated PALI on a wide variety of real and novel protocols, used by benign systems (such as HTTP and SMTP) and malicious ones (the GH0ST malware), achieving accurate state machines while significantly reducing manual analysis effort.