Symbols in Binaries - The "Nametag" Dilemma

Symbols in binary files are essentially human-readable names (like variable and function names) that are mapped to machine-level addresses (memory locations or offsets). They are crucial for the software development process, especially for linking and debugging.

When a program is compiled or assembled, all the identifiers you use in the source code (like main, printf, or a custom variable name like user_data) are converted into symbols that are stored in a dedicated data structure within the binary file called the Symbol Table.

The Furniture Warehouse Analogy

Imagine you're in a massive, well-organized furniture warehouse like IKEA. Every item has:

A name (like "POÄNG" or "BILLY")
An address (Aisle 15, Bin 42)
The actual furniture in the box

When you shop here, you can:

Look up the name in the catalog → find the address → go get it
Or just know the address if you're familiar with the store

Now, imagine someone removes all the nametags and labels from the catalog and boxes. Everything still works - the furniture is still there at the same addresses - but you can no longer look things up by name. You have to figure out what each item is by:

Its size and shape
Where it's located relative to other items
Any remaining markings

This is exactly what happens with symbols in binaries.

What Are Symbols?

In programming, a symbol is a human-readable name that represents something in your code:

Symbol Type	What It Represents	Example
Function Names	Blocks of code that do specific tasks	`main()`,`login_user()`, `calculate_total()`
Variable Names	Data storage locations	`username`, `user_age`, `is_admin`
Class/Method Names	In object-oriented code	`User.authenticate()`, `Database.connect()`

When you write code in C or C++:

// Source code with symbols
int calculate_sum(int a, int b) {
    int result = a + b;
    return result;
}

The Compilation Process: From Names to Addresses

When you compile this code:

The compiler translates it to machine code
It creates a symbol table - a directory mapping names to addresses
The binary contains BOTH the code AND this symbol table

Let me draw what happens:

Not Stripped Binary (Like our labeled warehouse):

Symbol Table (The Catalog):
-------------------------------
Function "calculate_sum" → Address 0x401520
Variable "result" → Stack offset -0x4
Actual Code at 0x401520:
mov eax, [ebp+8]    ; Get first parameter
add eax, [ebp+12]   ; Add second parameter
mov [ebp-4], eax    ; Store in "result"
mov eax, [ebp-4]    ; Return the value

Stripped Binary (Labels removed):

Symbol Table: [EMPTY]
Actual Code at 0x401520:
mov eax, [ebp+8]
add eax, [ebp+12]
mov [ebp-4], eax
mov eax, [ebp-4]

The code is identical. The functionality is identical. But in the stripped version, you have no idea that this function is called calculate_sum or that the variable is called result.

Why Strip Binaries? The Practical Reasons

"Companies strip binaries for several reasons:

Intellectual Property Protection

Without function names, reverse engineering is much harder. You can't easily tell what encrypt_database() vs check_license() does.

Smaller File Size

Symbol tables can be large, especially in debug builds.

Security Through Obscurity

Makes it slightly harder for attackers to find specific functions to exploit.

Real-World Example: The Exploit Developer's Challenge

Let's say you've discovered a vulnerability in a web server. You need to find the function that handles user authentication to write your exploit.

With symbols (Not Stripped):

$ readelf -s webserver | grep -i auth
   101: 0x08048a20   150    FUNC    GLOBAL DEFAULT   14 authenticate_user
   203: 0x08048b10    89    FUNC    GLOBAL DEFAULT   14 check_auth_token

You immediately know where to look!

Without symbols (Stripped):

$ readelf -s webserver | grep -i auth
[No output]

Now you have to:

Find all functions that take string inputs
Trace where password checking happens
Look for string comparisons or hash functions
Deduce which is the authentication function

Types of Symbols: More Than Just Names

Let's explore the "levels" of symbol information:

Level 1: Full Debug Symbols (The Complete Blueprint)

Contains everything: function names, variable names, line numbers, data types.

Function: calculate_sum (line 42 of math.c)
  Parameters: int a (at ebp+8), int b (at ebp+12)
  Local variable: int result (at ebp-4)
  Return type: int

Used during development, never shipped to customers.

Level 2: Export Symbols (Public Interface Only)

Only functions meant to be used by other programs.

Public functions: main(), process_request()
Private/internal functions: [HIDDEN]

Common in libraries.

Level 3: Stripped (The Black Box)

No names, only addresses.

Address 0x401520: [some function]
Address 0x401580: [another function]

What you typically encounter in malware and commercial software.

Hands-On: Seeing the Difference

A simple C program:

// calculator.c
#include <stdio.h>
int add_numbers(int x, int y) {    return x + y;
}
int main() {    int a = 5;
    int b = 3;
    int result = add_numbers(a, b);
    printf("Result: %d\n", result);
    return 0;
}

Compile with symbols (not stripped):

gcc -o calc calc.c
file calc

Compile stripped:

gcc -o calc_strip calc.c
strip calc_strip
file calc_strip
Look at the symbols:

# Not stripped version:
nm calc | grep -E "add_numbers|main"

# Stripped version:nm calc_strip | grep -E "add_numbers|main"

How Reverse Engineers Work With Stripped Binaries

Even without symbols, we have techniques:

String References

If a function uses the string "Login successful", we can find where that string is referenced.

Pattern Recognition

Authentication functions often follow patterns:

Take username and password parameters
Call hash functions
Compare results
Return true/false

Cross-Reference Analysis

See what other functions call this function, and from where.

Dynamic Analysis

Run the program and see what happens when we reach certain code.

The Security Implications

This matters for both attackers and defenders:

For Attackers (Exploit Development):

Finding the "vulnerable function" is harder without symbols
But not impossible - just requires more analysis
Often, they look for known vulnerable patterns rather than specific names

For Defenders (Malware Analysis):

Malware is almost always stripped
You need to analyze behavior, not rely on names
Suspicious function patterns stand out (e.g., encryption routines in a "calculator" app)

Your First Reverse Engineering Exercise

Here's what you'll do in our next lab:

With a not-stripped binary:

Use objdump -t to see all symbols
Find the main() function
Trace function calls by name

With a stripped binary:

Use strings to find interesting text
Look for the entry point (not called main anymore)
Identify functions by their structure

The key insight: Both binaries do the same thing. One just has the labels removed.