Static Disassembly

Load a binary
find machine instructions in binary
disassemble into human- or machine readable form

Recursive Disassembly

Starts from known entry points
Recursively follows control flow
Used in many reverse-engineering applications

Dynamic Disassembly

Runtime information can resolve indirect calls, distinguishing data vs code
It allows for execution tracers to dump instructions, memory/register contents
Code coverage problem

Code coverage

test suites

Use known test inputs, manually developed, toincrease code coverage.
Trying to cover as much of the program’s functionality as possible.
Ready-made test suites aren’t always available.
Application specific.

Fuzzing

Automatically generate inputs
Favoring executing lots of tests to heavy duty analysis
Generation-based fuzzer
Mutation-based fuzzer

Symbolic Execution

Execute not with concrete values but symblic values
One exection path will generate a set of constraints
Path explosion

Structuring Disassembled Code and Data

Compartmentalizing

  • Breaking code into logically connected chunks and make it easier to understand the relationship between chunks.
    Revealing control flow
  • Some structures can reveal control flow. Especially in visual representation, it can make it easier to see how control flows through the code and to get a quick idea of what the code does.

Functions

logically connected codes
function detection

  • binaries can be stripped
  • code might be scattered
  • overlapping code blocks
  • Assume functions are contiguous and don’t share code

Based on function signatures:

  • well-known patterns and epilogues
  • vary depending on the platform, compiler and optimization level used.

Use $call$ for function so easy to locate.
Indirect and tail-call function

Control Flow Graph

CFG organize the inernals of a function
automated analysis, manual analysis
graphic representation
basic blocks: 1st instruction is the only entry point, last instruction is the only exit point
call edges are not part of CFG

Call Graph

show relationship between call sites and functions rather than basic blocks
indirect call not shown in call graph

direct call: call the specific funtion or address
indirect call: call the address stored in a register.