Symbols in Binaries - The "Nametag" Dilemma

Symbols in binary files are essentially human-readable names (like variable and function names) that are mapped to machine-level addresses (memory locations or offsets). They are crucial for the software development process, especially for linking and debugging.

When a program is compiled or assembled, all the identifiers you use in the source code (like main, printf, or a custom variable name like user_data) are converted into symbols that are stored in a dedicated data structure within the binary file called the Symbol Table.

The Furniture Warehouse Analogy

Imagine you're in a massive, well-organized furniture warehouse like IKEA. Every item has:

  1. A name (like "POÄNG" or "BILLY")
  2. An address (Aisle 15, Bin 42)
  3.  The actual furniture in the box

When you shop here, you can:

  • Look up the name in the catalog → find the address → go get it
  •  Or just know the address if you're familiar with the store

Now, imagine someone removes all the nametags and labels from the catalog and boxes. Everything still works - the furniture is still there at the same addresses - but you can no longer look things up by name. You have to figure out what each item is by:

  • Its size and shape
  • Where it's located relative to other items
  • Any remaining markings

This is exactly what happens with symbols in binaries.

What Are Symbols?

In programming, a symbol is a human-readable name that represents something in your code:
Symbol TypeWhat It RepresentsExample
Function NamesBlocks of code that do specific tasksmain(),login_user(), calculate_total()
Variable NamesData storage locationsusername, user_age, is_admin
Class/Method NamesIn object-oriented codeUser.authenticate(), Database.connect()

When you write code in C or C++:

// Source code with symbols
int calculate_sum(int a, int b) {
int result = a + b;
return result;
}

The Compilation Process: From Names to Addresses

When you compile this code:
  1. The compiler translates it to machine code
  2. It creates a symbol table - a directory mapping names to addresses
  3. The binary contains BOTH the code AND this symbol table

Let me draw what happens:

Not Stripped Binary (Like our labeled warehouse):

Symbol Table (The Catalog):
-------------------------------
Function "calculate_sum" → Address 0x401520
Variable "result" → Stack offset -0x4
Actual Code at 0x401520:
mov eax, [ebp+8] ; Get first parameter
add eax, [ebp+12] ; Add second parameter
mov [ebp-4], eax ; Store in "result"
mov eax, [ebp-4] ; Return the value

Stripped Binary (Labels removed):

Symbol Table: [EMPTY]
Actual Code at 0x401520:
mov eax, [ebp+8]
add eax, [ebp+12]
mov [ebp-4], eax
mov eax, [ebp-4]

The code is identical. The functionality is identical. But in the stripped version, you have no idea that this function is called calculate_sum or that the variable is called result.

Why Strip Binaries? The Practical Reasons

"Companies strip binaries for several reasons:

  1. Intellectual Property Protection
    •  Without function names, reverse engineering is much harder. You can't easily tell what encrypt_database() vs check_license() does.
  2.  Smaller File Size
    •  Symbol tables can be large, especially in debug builds.
  3.  Security Through Obscurity
    •  Makes it slightly harder for attackers to find specific functions to exploit.

Real-World Example: The Exploit Developer's Challenge

Let's say you've discovered a vulnerability in a web server. You need to find the function that handles user authentication to write your exploit.

With symbols (Not Stripped):

$ readelf -s webserver | grep -i auth
101: 0x08048a20 150 FUNC GLOBAL DEFAULT 14 authenticate_user
203: 0x08048b10 89 FUNC GLOBAL DEFAULT 14 check_auth_token

You immediately know where to look!

Without symbols (Stripped):

$ readelf -s webserver | grep -i auth
[No output]

Now you have to:

  1. Find all functions that take string inputs
  2. Trace where password checking happens
  3. Look for string comparisons or hash functions
  4. Deduce which is the authentication function

Types of Symbols: More Than Just Names

Let's explore the "levels" of symbol information:

Level 1: Full Debug Symbols (The Complete Blueprint)

Contains everything: function names, variable names, line numbers, data types.

Function: calculate_sum (line 42 of math.c)
Parameters: int a (at ebp+8), int b (at ebp+12)
Local variable: int result (at ebp-4)
Return type: int

Used during development, never shipped to customers.

Level 2: Export Symbols (Public Interface Only)

Only functions meant to be used by other programs.

Public functions: main(), process_request()
Private/internal functions: [HIDDEN]

Common in libraries.

Level 3: Stripped (The Black Box)

No names, only addresses.

Address 0x401520: [some function]
Address 0x401580: [another function]

What you typically encounter in malware and commercial software.

Hands-On: Seeing the Difference

A simple C program:

// calculator.c
#include <stdio.h>
int add_numbers(int x, int y) {
return x + y;
}
int main() {
int a = 5;
int b = 3;
int result = add_numbers(a, b);
printf("Result: %d\n", result);
return 0;
}

Compile with symbols (not stripped):

gcc -o calc calc.c
file calc

Compile stripped:

gcc -o calc_strip calc.c
strip calc_strip
file calc_strip
Look at the symbols:
# Not stripped version:
nm calc | grep -E "add_numbers|main"

# Stripped version:
nm calc_strip | grep -E "add_numbers|main"

How Reverse Engineers Work With Stripped Binaries

Even without symbols, we have techniques:

  1. String References
    •  If a function uses the string "Login successful", we can find where that string is referenced.
  2.  Pattern Recognition
    •  Authentication functions often follow patterns:
      • Take username and password parameters
      • Call hash functions
      • Compare results
      • Return true/false
  3.  Cross-Reference Analysis
    •  See what other functions call this function, and from where.
  4.  Dynamic Analysis
    •  Run the program and see what happens when we reach certain code.

The Security Implications

This matters for both attackers and defenders:

For Attackers (Exploit Development):

  • Finding the "vulnerable function" is harder without symbols
  • But not impossible - just requires more analysis
  • Often, they look for known vulnerable patterns rather than specific names

For Defenders (Malware Analysis):

  • Malware is almost always stripped
  • You need to analyze behavior, not rely on names
  • Suspicious function patterns stand out (e.g., encryption routines in a "calculator" app)

Your First Reverse Engineering Exercise

Here's what you'll do in our next lab:

  1.  With a not-stripped binary:
    •  Use objdump -t to see all symbols
    • Find the main() function
    • Trace function calls by name
  2.  With a stripped binary:
    • Use strings to find interesting text
    • Look for the entry point (not called main anymore)
    • Identify functions by their structure
  3.  The key insight: Both binaries do the same thing. One just has the labels removed.