One of the problems typically encountered when presented with a new binary to audit is where to focus attention. In a large piece of software the obvious approach of starting at an input point and auditing forwards often doesn't cut it. It's quite easy to end up in pages of boilerplate code and non-interesting libraries. A common workaround is to do some quick instrumentation to gather coverage information across non-interesting runs of the program, and then do the same across interesting ones. With some straighforward analysis you can extract an idea of where the code you care about is, where the utility functions are, and what is potentially boilerplate. While easy to implement and work with, this approach is rather coarse. For one, you still have no idea which input data influence which calculations, or which of those calculations are actually interesting. Fortunately, with a bit more effort we can solve this problem.
At ACSAC this week a paper by Agustin Gianni and I will be published under the above title. Our goal was to develop a system that could quickly identify the attack surface of an application and prioritise regions of code for manual analysis. As a further constraint, the system had to have runtimes that would make its usage acceptable in a typical auditing workflow.
The paper describes a system for performing taint analysis on binary programs using dynamic instrumentation and a stripped down specification for the data-movement properties of each instruction. The system generates alerts when the data-flow information matches particular patterns, e.g. an attacker controlled argument to malloc, and the results are automatically pulled into IDA for prioritisation and manual analysis. When auditing the highlighted code regions the data-flow information is also made available, making it easier to reason about control-flow and data influenced by attacker input. In the evaluation, we used the tool on quite a few different binaries and found it made for a good way to start an audit and find some nice bugs.
You can find the paper here, and the abstract is as follows:
Discovering and understanding security vulnerabilities in complex, binary code can be a difficult and time consuming problem. While there has been notable progress in the development of automatic solutions for vulnerability detection, manual analysis remains a necessary component of any binary auditing task. In this paper we present an approach based on run time data tracking that works to narrow down the attack surface of an application and prioritize code regions for manual analysis. By supporting arbitrary data sources and sinks we can track the spread of direct and indirect attacker influence throughout a program. Alerts are generated once this influence reaches potentially sensitive code and the results are post-processed, prioritized, and integrated into common reverse engineering tools. The data recorded is used to inform the decisions of users, rather than replace them. By avoiding the processing required for semantic analysis and automated reasoning our approach is sufficiently fast to integrate into the normal work flow of manual vulnerability detection.