Binary analysis is the process of examining and understanding compiled software (binary files) without access to the original source code. It's a critical discipline in cybersecurity, reverse engineering, and digital forensics, acting much like a detective investigating a crime scene.
Binary Files
A binary file is a sequence of bytes interpreted by a computer as something other than plain text. In binary analysis, we primarily deal with executable files (binaries) containing machine code—instructions directly executable by the CPU.
- File Formats: Executable files adhere to specific formats that organize code, data, and metadata.
- PE (Portable Executable): Used on Windows systems (.exe, .dll).
- ELF (Executable and Linkable Format): Used on Linux and other Unix-like systems.
Reverse Engineering
Binary analysis is often synonymous with reverse engineering—the process of deconstructing a man-made object to determine how it was designed or how it operates. For software, this involves translating machine code back into a human-readable form.
- Disassembly: Translating machine code into Assembly Language, a low-level, human-readable representation of CPU instructions.
- Decompilation: Attempting to translate assembly code back into a high-level language (like C or C++), though this is often imprecise.
Today, we're not just going to tell you what to do - we're going to tell you what to look for and why it matters. Every step in binary analysis has a purpose. Let's create a field guide for what to look for at each step and why it's critical for security analysis.
The Binary to Analyze: "mystery_program"
// mystery_program.c
#include <stdio.h>
#include <string.h>
// Global variables in different sections
int global_initialized = 42; // Will be in .data
int global_uninitialized; // Will be in .bss
const char secret[] = "S3cr3tP@ssw0rd!"; // Will be in .rodata
// A simple function
void print_message() {
printf("Welcome to the mystery program!\n");
}
// A function with a buffer
void process_input() {
char buffer[16]; // Small buffer
printf("Enter your name: ");
gets(buffer); // UNSAFE - but good for analysis!
printf("Hello, %s!\n", buffer);
}
// Main function
int main() {
printf("Program starting...\n");
print_message();
printf("Global initialized: %d\n", global_initialized);
printf("Global uninitialized: %d\n", global_uninitialized);
process_input();
printf("Program ending...\n");
return 0;
}
First, compile properly with correct flags
Now, let me compile it in two versions:
# Version 1: With debug symbols (easier to analyze)
gcc -std=c99 -fno-stack-protector -z execstack -no-pie -g -o my_mystryapp_debug mymystryapp.c
# Version 2: Stripped (harder, like real-world binaries)
gcc -std=c99 -fno-stack-protector -z execstack -no-pie -o my_mystryapp_stripped mymystryapp.c
strip my_mystryapp_stripped
# Version 3: Vulnerable version (for exploit practice)
gcc -fno-stack-protector -z execstack -no-pie -o mystery_vuln mystery_program.c
We are going to analyze above files:
mystery_debug - Binary with debug symbols
mystery_stripped - Stripped binary (no symbols)
mystery_vuln - Vulnerable binary for exploit dev
The Analysis Process: What to Look For and Why
Phase 1: Initial Reconnaissance - The First Clues
Why we do this: To understand what we're dealing with before diving deep. Like checking the cover of a book before reading it.
Step 1: file command - Identifying the Target
file mystery_program
What to look for:
- "ELF 64-bit" vs "ELF 32-bit" → Determines architecture and register sizes (64-bit uses rax, rbx, etc.; 32-bit uses eax, ebx, etc.)
- "LSB" (Little Endian) → Confirms byte order (critical for interpreting memory correctly)
- "stripped" vs "not stripped" → Tells us if we have function names or not (stripped = harder)
- "dynamically linked" → Means it uses shared libraries (easier to trace)
- "statically linked" → Contains all libraries inside (larger file, harder to analyze)
- "executable" → Can be run (vs "shared object" which is a library)
Why it matters for security:
- 32-bit vs 64-bit affects exploit development (different stack layouts, register names)
- Stripped binaries are common in malware and commercial software
- Dynamically linked binaries reveal what libraries are used (attack surface)
Step 2: checksec or manual protection checking
readelf -a mystery_program | grep -E "(GNU_STACK|GNU_RELRO)"
What to look for:
- NX (Non-eXecutable stack) → GNU_STACK with RWE flags (R=Read, W=Write, E=Execute)
- RW = NX enabled (stack cannot execute code)
- RWE = NX disabled (stack CAN execute code - easier to exploit!)
- PIE (Position Independent Executable) → Check if entry point address starts with 0x4...
- Fixed address (0x400000) = No PIE
- Random-looking address = PIE enabled (harder to exploit)
- Stack Canary → Look for __stack_chk_fail in symbols (present = canary enabled)
- RELRO → GNU_RELRO section present
- Partial RELRO = GOT (Global Offset Table) can be overwritten
- Full RELRO = GOT is read-only (prevents GOT overwrite attacks)
Why it matters for security:
- These are the defenses we need to bypass in exploit development
- NX disabled = We can execute shellcode on the stack (classic buffer overflow)
- No PIE = Addresses are predictable (easier to target specific functions)
- No stack canary = Buffer overflows won't be detected
- Partial RELRO = We can overwrite function pointers in GOT
Step 3: strings - The Human Readable Clues
strings mystery_program | grep -i "pass\|secret\|key\|admin\|root\|error\|fail"
What to look for:
- Hardcoded credentials → Passwords, API keys, tokens
- Error messages → Reveal function names and logic flow
- URLs and paths → Network connections, file operations
- Format strings → %s, %d, %x can indicate format string vulnerabilities
- Command strings → system(), exec(), popen() calls
Why it matters for security:
- Hardcoded credentials = Instant compromise
- Error messages help understand program flow for reverse engineering
- URLs might indicate C2 (Command & Control) servers in malware
- Format strings in printf() without proper arguments = format string vulnerability
- Command strings might indicate possible command injection points
Phase 2: Static Analysis - Reading the Blueprint
Step 1: nm - The Symbol Table (if not stripped)
nm mystery_program | grep -E "(T|t|U) .*"
What to look for:
- "T" or "t" → Defined functions (T = global, t = local)
- Look for: main, vuln, login, auth, encrypt, decrypt
- "U" → Undefined functions (imported from libraries)
- Look for dangerous functions: gets, strcpy, sprintf, system, exec
- "D" or "d" → Global variables (D = initialized, d = uninitialized)
- Look for: flags, configuration variables, encryption keys
Why it matters for security:
- Dangerous imported functions (gets, strcpy) = potential buffer overflows
- system() or exec() calls = potential command injection
- Custom encryption functions = might have weak implementation
- Authentication functions = target for bypass attacks
Step 2: objdump -d - Reading the Assembly
objdump -d mystery_program | grep -B5 -A5 "call.*gets\|call.*strcpy"
What to look for in disassembly:
Function Prologues (Start of functions):
assembly
push rbp ; Save base pointer
mov rbp, rsp ; Set up new stack frame
sub rsp, 0x20 ; Allocate 32 bytes on stack ← BUFFER SIZE!
- sub rsp, 0xXX → Tells us stack buffer sizes
- Small buffers (0x10, 0x20) =更容易溢出
Dangerous Function Calls:
assembly
call 0x400500 <gets@plt> ; NO bounds checking!
call 0x400510 <strcpy@plt> ; NO bounds checking!
call 0x400520 <strcat@plt> ; NO bounds checking!
call 0x400530 <sprintf@plt> ; Format string vulnerability possible
Buffer Allocation Patterns:
assembly
lea rax, [rbp-0x10] ; Buffer starts at rbp-0x10 (16 bytes)
lea rdx, [rbp-0x20] ; Another buffer at rbp-0x20 (32 bytes)
Return Instruction Patterns:
assembly
leave ; Clean up stack frame
ret ; Return to caller ← WHERE WE HIJACK CONTROL!
Why it matters for security:
- Buffer sizes tell us how much data needed to overflow
- Dangerous functions are vulnerability indicators
- Return instructions are where we hijack control flow
- Leave/ret sequences are where we insert our exploit
Step 3: readelf -S - Memory Layout
readelf -S mystery_program | grep -E "(text|data|rodata|bss|plt|got)"
What to look for:
- .text → Executable code section (where shellcode might go if executable)
- .data → Writable data (where we might write exploit data)
- .rodata → Read-only data (hardcoded strings, might contain secrets)
- .plt → Procedure Linkage Table (function stubs for dynamic linking)
- .got → Global Offset Table (actual addresses of imported functions)
- Section permissions: AX (Execute), W (Write), R (Read)
Why it matters for security:
- .got is writable with Partial RELRO → We can overwrite function addresses!
- .text executable → We can place shellcode there if we can write to it
- .data writable → Good place to store exploit strings
- Knowing memory layout helps in ROP (Return-Oriented Programming) chain building
Phase 3: Dynamic Analysis - Watching It Run
Step 1: strace - System Call Tracing
bash
strace -e trace=file,network ./mystery_program
What to look for:
- File operations: open, read, write, close
- Look for: config files, password files, log files
- Network operations: socket, connect, send, recv
- Look for: IP addresses, ports (malware C2)
- Process operations: fork, execve, system
- Look for: command execution (potential injection)
- Memory operations: mprotect, mmap
- Look for: memory protection changes
Why it matters for security:
- File operations reveal sensitive data access
- Network operations show communication patterns (data exfiltration?)
- execve with user input = possible command injection
- mprotect changing permissions = anti-debugging or self-modifying code
Step 2: ltrace - Library Call Tracing
bash
ltrace ./mystery_program 2>&1 | grep -E "(gets|strcpy|printf|system)"
What to look for:
- Input functions: gets, fgets, scanf, read
- String functions: strcpy, strcat, sprintf
- Memory functions: malloc, free, memcpy
- Format functions: printf, fprintf, snprintf
Why it matters for security:
- See what data flows into dangerous functions
- Track user input through the program
- Identify format string vulnerabilities (printf with user-controlled format)
- Spot heap operations (potential heap overflows)
Step 3: gdb - Interactive Debugging
3.1: Initial Setup
GNU gdb (Debian 13.1-3) 13.1
...
(gdb) help
List of classes of commands:
...
What's happening: GDB started successfully. You asked for help which shows all command categories. This is good!
What to do next: We need to load a binary to analyze. Or
3.2: Loading the Debug Binary
gdb -q my_mystryapp_debug
Reading symbols from my_mystryapp_debug...
Success!
- "Reading symbols" means this binary has debug information (compiled with -g)
- We can use function names like main, process_input, etc.
- This will make our analysis easier
3.3: Setting Breakpoints - The Critical Issue
(gdb) break process_input
Breakpoint 1 at 0x401164
(gdb) break *strcpy
No symbol "strcpy" in current context.
What's happening here:
- ✅ break *gets worked → Found at address 0x1050
- ❌ break *strcpy failed → "No symbol in current context"
Why this matters:
- Our program doesn't use strcpy()! We only use gets()
- Check our source code:
gets(buffer); // We have this
// No strcpy() anywhere!
- The compiler warning was suggesting fgets() instead of gets(), not mentioning strcpy
What to do:
- Only set breakpoints for functions that actually exist
- Let's check what functions we have:
(gdb) info functions
3.4: Switching to Stripped Binary - The Confusion
(gdb) file my_mystryapp_stripped
Load new symbol table from "my_mystryapp_stripped"? (y or n) y
Reading symbols from my_mystryapp_stripped...
(No debugging symbols found in my_mystryapp_stripped)
What's happening:
- GDB is asking: "You loaded symbols for debug, now switching to stripped. Reload symbols?"
- We say y (yes)
- "No debugging symbols found" → This is expected! The binary is stripped
The critical mistake happening here:
gdb
(gdb) break *strcpy
No symbol table is loaded. Use the "file" command.
Why this error?
- When you have no symbols (stripped binary), you cannot use function names
- You must use addresses instead
- Example: break *0x401050 (use the address we found earlier)
3.5: The Confusion Loop
(gdb) file my_mystryapp_stripped
Reading symbols from my_mystryapp_stripped...
(No debugging symbols found in my_mystryapp_stripped)
(gdb) break *gets
Note: breakpoint 1 also set at pc 0x1050.
Breakpoint 2 at 0x1050
Interesting finding:
- Even though it's stripped, break *gets works!
- But wait, look at the address: 0x1050 (same as debug version)
- And then later: Breakpoint 3 at 0x401050
Why two different addresses?
- 0x1050 = Relative address (offset)
- 0x401050 = Absolute address (actual memory location)
- This is due to PIE (Position Independent Executable) being disabled with -no-pie
3.6: The Final Confusion
(gdb) info frame
No stack.
Why "No stack"?
- The program isn't running yet!
- info frame shows the current stack frame... but we haven't started execution
- We need to run the program first
(gdb) break *strcpy
No symbol table is loaded. Use the "file" command.
Why this keeps happening:
- You're trying to use function names on a stripped binary
- Stripped = No function names available
- You must use addresses
The Detective's Checklist: What to Document
For Every Binary Analysis, Document:
- Basic Information
- Architecture (32/64-bit)
- Endianness
- Stripped or not
- Linked statically or dynamically
- Protections (Defenses)
- NX (Stack executable?): [ ] Yes [ ] No
- ASLR/PIE: [ ] Enabled [ ] Disabled
- Stack Canary: [ ] Present [ ] Absent
- RELRO: [ ] Full [ ] Partial [ ] None
- Vulnerability Indicators
- Dangerous functions used: gets, strcpy, sprintf, system
- Buffer sizes found: [ ] Small (<32) [ ] Medium (32-128) [ ] Large (>128)
- Format strings with user input: [ ] Yes [ ] No
- Command execution with user input: [ ] Yes [ ] No
- Attack Surface
- User input points identified: [ ] Network [ ] File [ ] Command line [ ] Environment
- Authentication functions found: __________
- Encryption functions found: __________
- Exploit Development Notes
- Crash confirmed at: ____ bytes
- Return address offset: ____ bytes from buffer start
- Available gadgets: [ ] pop rdi; ret [ ] pop rsi; ret [ ] execve available
- Writable memory regions: __________
Why This Systematic Approach Matters
In the real world:
- Real Malware Analysis: Malware is always stripped - you must use addresses
- Exploit Development: You need exact addresses for payloads
- Reverse Engineering: You gradually rebuild symbols as you analyze
- CTF Challenges: Often give stripped binaries to make it harder
Remember: Every piece of information you gather tells a story:
- Small buffer + gets() = Classic buffer overflow
- system() with user input = Command injection
- Partial RELRO + GOT entry = GOT overwrite attack possible
- No PIE + known function address = Return-to-libc attack
















