Symbols in binary files are essentially human-readable names (like variable and function names) that are mapped to machine-level addresses (memory locations or offsets). They are crucial for the software development process, especially for linking and debugging.
When a program is compiled or assembled, all the identifiers you use in the source code (like main, printf, or a custom variable name like user_data) are converted into symbols that are stored in a dedicated data structure within the binary file called the Symbol Table.
The Furniture Warehouse Analogy
Imagine you're in a massive, well-organized furniture warehouse like IKEA. Every item has:
- A name (like "POÄNG" or "BILLY")
- An address (Aisle 15, Bin 42)
- The actual furniture in the box
When you shop here, you can:
- Look up the name in the catalog → find the address → go get it
- Or just know the address if you're familiar with the store
Now, imagine someone removes all the nametags and labels from the catalog and boxes. Everything still works - the furniture is still there at the same addresses - but you can no longer look things up by name. You have to figure out what each item is by:
- Its size and shape
- Where it's located relative to other items
- Any remaining markings
This is exactly what happens with symbols in binaries.
What Are Symbols?
| Symbol Type | What It Represents | Example |
|---|---|---|
| Function Names | Blocks of code that do specific tasks | main(),login_user(), calculate_total() |
| Variable Names | Data storage locations | username, user_age, is_admin |
| Class/Method Names | In object-oriented code | User.authenticate(), Database.connect() |
When you write code in C or C++:
// Source code with symbolsint calculate_sum(int a, int b) {int result = a + b;return result;}
The Compilation Process: From Names to Addresses
- The compiler translates it to machine code
- It creates a symbol table - a directory mapping names to addresses
- The binary contains BOTH the code AND this symbol table
Let me draw what happens:
Not Stripped Binary (Like our labeled warehouse):
Symbol Table (The Catalog):-------------------------------Function "calculate_sum" → Address 0x401520Variable "result" → Stack offset -0x4Actual Code at 0x401520:mov eax, [ebp+8] ; Get first parameteradd eax, [ebp+12] ; Add second parametermov [ebp-4], eax ; Store in "result"mov eax, [ebp-4] ; Return the value
Stripped Binary (Labels removed):
Symbol Table: [EMPTY]Actual Code at 0x401520:mov eax, [ebp+8]add eax, [ebp+12]mov [ebp-4], eaxmov eax, [ebp-4]
The code is identical. The functionality is identical. But in the stripped version, you have no idea that this function is called calculate_sum or that the variable is called result.
Why Strip Binaries? The Practical Reasons
"Companies strip binaries for several reasons:
- Intellectual Property Protection
- Without function names, reverse engineering is much harder. You can't easily tell what
encrypt_database()vscheck_license()does. - Smaller File Size
- Symbol tables can be large, especially in debug builds.
- Security Through Obscurity
- Makes it slightly harder for attackers to find specific functions to exploit.
Real-World Example: The Exploit Developer's Challenge
Let's say you've discovered a vulnerability in a web server. You need to find the function that handles user authentication to write your exploit.
With symbols (Not Stripped):
$ readelf -s webserver | grep -i auth101: 0x08048a20 150 FUNC GLOBAL DEFAULT 14 authenticate_user203: 0x08048b10 89 FUNC GLOBAL DEFAULT 14 check_auth_token
You immediately know where to look!
Without symbols (Stripped):
$ readelf -s webserver | grep -i auth[No output]
Now you have to:
- Find all functions that take string inputs
- Trace where password checking happens
- Look for string comparisons or hash functions
- Deduce which is the authentication function
Types of Symbols: More Than Just Names
Let's explore the "levels" of symbol information:
Level 1: Full Debug Symbols (The Complete Blueprint)
Contains everything: function names, variable names, line numbers, data types.
Function: calculate_sum (line 42 of math.c)Parameters: int a (at ebp+8), int b (at ebp+12)Local variable: int result (at ebp-4)Return type: int
Used during development, never shipped to customers.
Level 2: Export Symbols (Public Interface Only)
Only functions meant to be used by other programs.
Public functions: main(), process_request()Private/internal functions: [HIDDEN]
Common in libraries.
Level 3: Stripped (The Black Box)
No names, only addresses.
Address 0x401520: [some function]Address 0x401580: [another function]
What you typically encounter in malware and commercial software.
Hands-On: Seeing the Difference
A simple C program:
// calculator.c#include <stdio.h>int add_numbers(int x, int y) {return x + y;}int main() {int a = 5;int b = 3;int result = add_numbers(a, b);printf("Result: %d\n", result);return 0;}
Compile with symbols (not stripped):
gcc -o calc calc.cfile calc
Compile stripped:
gcc -o calc_strip calc.cstrip calc_stripfile calc_stripLook at the symbols:
# Not stripped version:nm calc | grep -E "add_numbers|main"# Stripped version:nm calc_strip | grep -E "add_numbers|main"
How Reverse Engineers Work With Stripped Binaries
Even without symbols, we have techniques:
- String References
- If a function uses the string "Login successful", we can find where that string is referenced.
- Pattern Recognition
- Authentication functions often follow patterns:
- Take username and password parameters
- Call hash functions
- Compare results
- Return true/false
- Cross-Reference Analysis
- See what other functions call this function, and from where.
- Dynamic Analysis
- Run the program and see what happens when we reach certain code.
The Security Implications
This matters for both attackers and defenders:
For Attackers (Exploit Development):
- Finding the "vulnerable function" is harder without symbols
- But not impossible - just requires more analysis
- Often, they look for known vulnerable patterns rather than specific names
For Defenders (Malware Analysis):
- Malware is almost always stripped
- You need to analyze behavior, not rely on names
- Suspicious function patterns stand out (e.g., encryption routines in a "calculator" app)
Your First Reverse Engineering Exercise
Here's what you'll do in our next lab:
- With a not-stripped binary:
- Use
objdump -tto see all symbols - Find the
main()function - Trace function calls by name
- With a stripped binary:
- Use strings to find interesting text
- Look for the entry point (not called
mainanymore) - Identify functions by their structure
- The key insight: Both binaries do the same thing. One just has the labels removed.



