What is it about?

Accurate disassembly of stripped binaries is the first step in binary analysis, instrumentation and reverse engineering. Complex instruction sets such as the x86 pose major challenges in this context because it is very difficult to distinguish between code and embedded data. In this paper, we present a new disassembly approach for accurate disassembly of complex binaries without relying on metadata. Our approach achieves 3-4x better accuracy than state-of-the-art disassemblers.

Featured Image

Why is it important?

To make progress, many recent approaches have either made optimistic assumptions (e.g., absence of embedded data) or relied on additional compiler-generated metadata (e.g., relocation info and/or exception handling metadata). Unfortunately, many complex binaries do contain embedded data, while lacking the additional metadata needed by these techniques. We therefore present a novel approach for accurate disassembly that uses statistical properties of data to detect code, and behavioral properties of code to flag data. We present new static analysis and data-driven probabilistic techniques that are then combined using a prioritized error correction algorithm to achieve results that are 3 to 4 times more accurate than the best previous results.

Read the Original

This page is a summary of: Accurate Disassembly of Complex Binaries Without Use of Compiler Metadata, March 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3623278.3624766.
You can read the full text:

Read

Contributors

The following have contributed to this page