Binary Analysis - The Detective's Field Guide

Binary analysis is the process of examining and understanding compiled software (binary files) without access to the original source code. It's a critical discipline in cybersecurity, reverse engineering, and digital forensics, acting much like a detective investigating a crime scene.

Binary Files

A binary file is a sequence of bytes interpreted by a computer as something other than plain text. In binary analysis, we primarily deal with executable files (binaries) containing machine code—instructions directly executable by the CPU.

File Formats: Executable files adhere to specific formats that organize code, data, and metadata.

PE (Portable Executable): Used on Windows systems (.exe, .dll).
ELF (Executable and Linkable Format): Used on Linux and other Unix-like systems.

Reverse Engineering

Binary analysis is often synonymous with reverse engineering—the process of deconstructing a man-made object to determine how it was designed or how it operates. For software, this involves translating machine code back into a human-readable form.

Disassembly: Translating machine code into Assembly Language, a low-level, human-readable representation of CPU instructions.

Decompilation: Attempting to translate assembly code back into a high-level language (like C or C++), though this is often imprecise.

Today, we're not just going to tell you what to do - we're going to tell you what to look for and why it matters. Every step in binary analysis has a purpose. Let's create a field guide for what to look for at each step and why it's critical for security analysis.

The Binary to Analyze: "mystery_program"

// mystery_program.c

#include <stdio.h>

#include <string.h>

// Global variables in different sections

int global_initialized = 42; // Will be in .data

int global_uninitialized; // Will be in .bss

const char secret[] = "S3cr3tP@ssw0rd!"; // Will be in .rodata

// A simple function

void print_message() {

printf("Welcome to the mystery program!\n");

}

// A function with a buffer

void process_input() {

char buffer[16]; // Small buffer

printf("Enter your name: ");

gets(buffer); // UNSAFE - but good for analysis!

printf("Hello, %s!\n", buffer);

}

// Main function

int main() {

printf("Program starting...\n");

print_message();

printf("Global initialized: %d\n", global_initialized);

printf("Global uninitialized: %d\n", global_uninitialized);

process_input();

printf("Program ending...\n");

return 0;

}

First, compile properly with correct flags

Now, let me compile it in two versions:

# Version 1: With debug symbols (easier to analyze)

gcc -std=c99 -fno-stack-protector -z execstack -no-pie -g -o my_mystryapp_debug mymystryapp.c

# Version 2: Stripped (harder, like real-world binaries)

gcc -std=c99 -fno-stack-protector -z execstack -no-pie -o my_mystryapp_stripped mymystryapp.c

strip my_mystryapp_stripped

The Analysis Process: What to Look For and Why

Phase 1: Initial Reconnaissance - The First Clues

Why we do this: To understand what we're dealing with before diving deep. Like checking the cover of a book before reading it.

Step 1: file command - Identifying the Target

file mystery_program

What to look for:

"ELF 64-bit" vs "ELF 32-bit" → Determines architecture and register sizes (64-bit uses rax, rbx, etc.; 32-bit uses eax, ebx, etc.)
"LSB" (Little Endian) → Confirms byte order (critical for interpreting memory correctly)
"stripped" vs "not stripped" → Tells us if we have function names or not (stripped = harder)
"dynamically linked" → Means it uses shared libraries (easier to trace)
"statically linked" → Contains all libraries inside (larger file, harder to analyze)
"executable" → Can be run (vs "shared object" which is a library)

Why it matters for security:

32-bit vs 64-bit affects exploit development (different stack layouts, register names)
Stripped binaries are common in malware and commercial software
Dynamically linked binaries reveal what libraries are used (attack surface)

Step 2: checksec or manual protection checking

readelf -a mystery_program | grep -E "(GNU_STACK|GNU_RELRO)"

What to look for:

NX (Non-eXecutable stack) → GNU_STACK with RWE flags (R=Read, W=Write, E=Execute)

RW = NX enabled (stack cannot execute code)
RWE = NX disabled (stack CAN execute code - easier to exploit!)

PIE (Position Independent Executable) → Check if entry point address starts with 0x4...

Fixed address (0x400000) = No PIE
Random-looking address = PIE enabled (harder to exploit)

Stack Canary → Look for __stack_chk_fail in symbols (present = canary enabled)
RELRO → GNU_RELRO section present

Partial RELRO = GOT (Global Offset Table) can be overwritten
Full RELRO = GOT is read-only (prevents GOT overwrite attacks)

Why it matters for security:

These are the defenses we need to bypass in exploit development
NX disabled = We can execute shellcode on the stack (classic buffer overflow)
No PIE = Addresses are predictable (easier to target specific functions)
No stack canary = Buffer overflows won't be detected
Partial RELRO = We can overwrite function pointers in GOT

Step 3: strings - The Human Readable Clues

What to look for:

Hardcoded credentials → Passwords, API keys, tokens
Error messages → Reveal function names and logic flow
URLs and paths → Network connections, file operations
Format strings → %s, %d, %x can indicate format string vulnerabilities
Command strings → system(), exec(), popen() calls

Why it matters for security:

Hardcoded credentials = Instant compromise
Error messages help understand program flow for reverse engineering
URLs might indicate C2 (Command & Control) servers in malware
Format strings in printf() without proper arguments = format string vulnerability
Command strings might indicate possible command injection points

Phase 2: Static Analysis - Reading the Blueprint

Step 1: nm - The Symbol Table (if not stripped)

nm mystery_program | grep -E "(T|t|U) .*"

What to look for:

"T" or "t" → Defined functions (T = global, t = local)

Look for: main, vuln, login, auth, encrypt, decrypt

"U" → Undefined functions (imported from libraries)

Look for dangerous functions: gets, strcpy, sprintf, system, exec

"D" or "d" → Global variables (D = initialized, d = uninitialized)

Look for: flags, configuration variables, encryption keys

Why it matters for security:

Dangerous imported functions (gets, strcpy) = potential buffer overflows
system() or exec() calls = potential command injection
Custom encryption functions = might have weak implementation
Authentication functions = target for bypass attacks

Step 2: objdump -d - Reading the Assembly

objdump -d mystery_program | grep -B5 -A5 "call.*gets\|call.*strcpy"

What to look for in disassembly:

Function Prologues (Start of functions):

assembly

push rbp ; Save base pointer

mov rbp, rsp ; Set up new stack frame

sub rsp, 0x20 ; Allocate 32 bytes on stack ← BUFFER SIZE!

sub rsp, 0xXX → Tells us stack buffer sizes
Small buffers (0x10, 0x20) =更容易溢出

Dangerous Function Calls:

assembly

call 0x400500 <gets@plt> ; NO bounds checking!

call 0x400510 <strcpy@plt> ; NO bounds checking!

call 0x400520 <strcat@plt> ; NO bounds checking!

call 0x400530 <sprintf@plt> ; Format string vulnerability possible

Buffer Allocation Patterns:

assembly

lea rax, [rbp-0x10] ; Buffer starts at rbp-0x10 (16 bytes)

lea rdx, [rbp-0x20] ; Another buffer at rbp-0x20 (32 bytes)

Return Instruction Patterns:

assembly

leave ; Clean up stack frame

ret ; Return to caller ← WHERE WE HIJACK CONTROL!

Why it matters for security:

Buffer sizes tell us how much data needed to overflow
Dangerous functions are vulnerability indicators
Return instructions are where we hijack control flow
Leave/ret sequences are where we insert our exploit

Step 3: readelf -S - Memory Layout

readelf -S mystery_program | grep -E "(text|data|rodata|bss|plt|got)"

What to look for:

.text → Executable code section (where shellcode might go if executable)
.data → Writable data (where we might write exploit data)
.rodata → Read-only data (hardcoded strings, might contain secrets)
.plt → Procedure Linkage Table (function stubs for dynamic linking)
.got → Global Offset Table (actual addresses of imported functions)
Section permissions: AX (Execute), W (Write), R (Read)

Why it matters for security:

.got is writable with Partial RELRO → We can overwrite function addresses!
.text executable → We can place shellcode there if we can write to it
.data writable → Good place to store exploit strings
Knowing memory layout helps in ROP (Return-Oriented Programming) chain building

Phase 3: Dynamic Analysis - Watching It Run

Step 1: strace - System Call Tracing

bash

strace -e trace=file,network ./mystery_program

What to look for:

File operations: open, read, write, close

Look for: config files, password files, log files

Network operations: socket, connect, send, recv

Look for: IP addresses, ports (malware C2)

Process operations: fork, execve, system

Look for: command execution (potential injection)

Memory operations: mprotect, mmap

Look for: memory protection changes

Why it matters for security:

File operations reveal sensitive data access
Network operations show communication patterns (data exfiltration?)
execve with user input = possible command injection
mprotect changing permissions = anti-debugging or self-modifying code

Step 2: ltrace - Library Call Tracing

bash

ltrace ./mystery_program 2>&1 | grep -E "(gets|strcpy|printf|system)"

What to look for:

Input functions: gets, fgets, scanf, read
String functions: strcpy, strcat, sprintf
Memory functions: malloc, free, memcpy
Format functions: printf, fprintf, snprintf

Why it matters for security:

See what data flows into dangerous functions
Track user input through the program
Identify format string vulnerabilities (printf with user-controlled format)
Spot heap operations (potential heap overflows)

Step 3: gdb - Interactive Debugging

3.1: Initial Setup

GNU gdb (Debian 13.1-3) 13.1

...

(gdb) help

List of classes of commands:

...

What's happening: GDB started successfully. You asked for help which shows all command categories. This is good!

What to do next: We need to load a binary to analyze. Or

3.2: Loading the Debug Binary

gdb -q my_mystryapp_debug

Reading symbols from my_mystryapp_debug...

Success!

"Reading symbols" means this binary has debug information (compiled with -g)
We can use function names like main, process_input, etc.
This will make our analysis easier

3.3: Setting Breakpoints - The Critical Issue

(gdb) break process_input

Breakpoint 1 at 0x401164

(gdb) break *strcpy

No symbol "strcpy" in current context.

What's happening here:

✅ break *gets worked → Found at address 0x1050
❌ break *strcpy failed → "No symbol in current context"

Why this matters:

Our program doesn't use strcpy()! We only use gets()
Check our source code:

gets(buffer); // We have this

// No strcpy() anywhere!

The compiler warning was suggesting fgets() instead of gets(), not mentioning strcpy

What to do:

Only set breakpoints for functions that actually exist
Let's check what functions we have:

gdb

(gdb) info functions

3.4: Switching to Stripped Binary - The Confusion

(gdb) file my_mystryapp_stripped

Load new symbol table from "my_mystryapp_stripped"? (y or n) y

Reading symbols from my_mystryapp_stripped...

(No debugging symbols found in my_mystryapp_stripped)

What's happening:

GDB is asking: "You loaded symbols for debug, now switching to stripped. Reload symbols?"
We say y (yes)
"No debugging symbols found" → This is expected! The binary is stripped

The critical mistake happening here:

gdb

(gdb) break *strcpy

No symbol table is loaded. Use the "file" command.

Why this error?

When you have no symbols (stripped binary), you cannot use function names
You must use addresses instead
Example: break *0x401050 (use the address we found earlier)

3.5: The Confusion Loop

(gdb) file my_mystryapp_stripped

Reading symbols from my_mystryapp_stripped...

(No debugging symbols found in my_mystryapp_stripped)

(gdb) break *gets

Note: breakpoint 1 also set at pc 0x1050.

Breakpoint 2 at 0x1050

Interesting finding:

Even though it's stripped, break *gets works!
But wait, look at the address: 0x1050 (same as debug version)
And then later: Breakpoint 3 at 0x401050

Why two different addresses?

0x1050 = Relative address (offset)
0x401050 = Absolute address (actual memory location)
This is due to PIE (Position Independent Executable) being disabled with -no-pie

3.6: The Final Confusion

(gdb) info frame

No stack.

Why "No stack"?

The program isn't running yet!
info frame shows the current stack frame... but we haven't started execution
We need to run the program first

(gdb) break *strcpy

No symbol table is loaded. Use the "file" command.

Why this keeps happening:

You're trying to use function names on a stripped binary
Stripped = No function names available
You must use addresses

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

The Detective's Checklist: What to Document

For Every Binary Analysis, Document:

Basic Information

Architecture (32/64-bit)
Endianness
Stripped or not
Linked statically or dynamically

Protections (Defenses)

NX (Stack executable?): [ ] Yes [ ] No
ASLR/PIE: [ ] Enabled [ ] Disabled
Stack Canary: [ ] Present [ ] Absent
RELRO: [ ] Full [ ] Partial [ ] None

Vulnerability Indicators

Dangerous functions used: gets, strcpy, sprintf, system
Buffer sizes found: [ ] Small (<32) [ ] Medium (32-128) [ ] Large (>128)
Format strings with user input: [ ] Yes [ ] No
Command execution with user input: [ ] Yes [ ] No

Attack Surface

User input points identified: [ ] Network [ ] File [ ] Command line [ ] Environment
Authentication functions found: __________
Encryption functions found: __________

Exploit Development Notes

Crash confirmed at: ____ bytes
Return address offset: ____ bytes from buffer start
Available gadgets: [ ] pop rdi; ret [ ] pop rsi; ret [ ] execve available
Writable memory regions: __________

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Why This Systematic Approach Matters

In the real world:

Real Malware Analysis: Malware is always stripped - you must use addresses
Exploit Development: You need exact addresses for payloads
Reverse Engineering: You gradually rebuild symbols as you analyze
CTF Challenges: Often give stripped binaries to make it harder

Remember: Every piece of information you gather tells a story:

Small buffer + gets() = Classic buffer overflow
system() with user input = Command injection
Partial RELRO + GOT entry = GOT overwrite attack possible
No PIE + known function address = Return-to-libc attack

Binary Analysis - The Detective's Field Guide

Binary Files

Reverse Engineering

The Binary to Analyze: "mystery_program"

Phase 1: Initial Reconnaissance - The First Clues

Step 1: file command - Identifying the Target

Step 2: checksec or manual protection checking

Step 3: strings - The Human Readable Clues

Phase 2: Static Analysis - Reading the Blueprint

Step 1: nm - The Symbol Table (if not stripped)

Step 2: objdump -d - Reading the Assembly

Step 3: readelf -S - Memory Layout

Phase 3: Dynamic Analysis - Watching It Run

Step 1: strace - System Call Tracing

Step 2: ltrace - Library Call Tracing

Step 3: gdb - Interactive Debugging

3.1: Initial Setup

3.2: Loading the Debug Binary

3.3: Setting Breakpoints - The Critical Issue

3.4: Switching to Stripped Binary - The Confusion

3.5: The Confusion Loop

3.6: The Final Confusion

The Detective's Checklist: What to Document

For Every Binary Analysis, Document:

Why This Systematic Approach Matters

Posted by Jram

Most Popular

Fundamentals of the Windows PE (Portable Executable) Format

From SOC Triad to SOC Quad: Why Exposure Management Matters for Nepal in 2026

Understanding readelf --segments

Tags

Menu Footer Widget

Contact form

Binary Analysis - The Detective's Field Guide

Binary Files

Reverse Engineering

The Binary to Analyze: "mystery_program"

Phase 1: Initial Reconnaissance - The First Clues

Step 1: file command - Identifying the Target

Step 2: checksec or manual protection checking

Step 3: strings - The Human Readable Clues

Phase 2: Static Analysis - Reading the Blueprint

Step 1: nm - The Symbol Table (if not stripped)

Step 2: objdump -d - Reading the Assembly

Step 3: readelf -S - Memory Layout

Phase 3: Dynamic Analysis - Watching It Run

Step 1: strace - System Call Tracing

Step 2: ltrace - Library Call Tracing

Step 3: gdb - Interactive Debugging

3.1: Initial Setup

3.2: Loading the Debug Binary

3.3: Setting Breakpoints - The Critical Issue

3.4: Switching to Stripped Binary - The Confusion

3.5: The Confusion Loop

3.6: The Final Confusion

The Detective's Checklist: What to Document

For Every Binary Analysis, Document:

Why This Systematic Approach Matters

Posted by Jram

You may like these posts

Social Plugin

Most Popular

Fundamentals of the Windows PE (Portable Executable) Format

From SOC Triad to SOC Quad: Why Exposure Management Matters for Nepal in 2026

Understanding readelf --segments

Tags

Menu Footer Widget

Contact form