The Recipe Book Analogy
Think of your program like a recipe book:
- The source code (hello.c) is like the original recipe written in English
- The object file is like translating that recipe into a universal cooking language that all chefs understand
- The executable is like a complete cookbook with all instructions, tools, and ingredients ready to go
Step 1: Let's Write the Simplest Program
Open a text editor and create a file called hello.c:
#include <stdio.h>int main() {printf("Hello, World!\n");return 0;}
Done! We've written 5 lines of code. But here's what this actually means:
#include <stdio.h>: We're saying "give me the manual for printing to the screen"int main(): This is the starting point of our programprintf(...): This is a function that prints textreturn 0: We're saying "the program finished successfully"
Now, let's see what happens when we transform this human-readable code into something the computer understands.
Step 2: The Compilation Journey
When we compile this program, it goes through 4 stages:
hello.c (Source) → Preprocessor → Compiler → Assembler → Linker → hello (Executable)Each step with actual commands:
2.1 The Preprocessor Stage
The preprocessor is like a copy-paste machine:
gcc -E hello.c -o hello.iThis creates a file called hello.i. Let's look at a small part of it:
# 1 "hello.c"# 1 "<built-in>"# 1 "<command-line>"# 1 "hello.c"# 1 "/usr/include/stdio.h" 1 3 4# 27 "/usr/include/stdio.h" 3 4# 1 "/usr/include/features.h" 1 3 4# 28 "/usr/include/features.h" 3 4// ... hundreds of lines later ...extern int printf (const char *__restrict __format, ...);// ... more hundreds of lines ...# 3 "hello.c" 2int main() {printf("Hello, World!\n");return 0;}
#include <stdio.h> and literally copied the entire stdio.h file into our code! That's why it's hundreds of lines long. It also added line number information (like # 3 "hello.c" 2).2.2 The Compiler Stage
The compiler translates our C code into assembly language:
gcc -S hello.c -o hello.sLet's look at hello.s:
Look closely! Notice something interesting? Our printf("Hello, World!\n") became call puts@PLT! The compiler is smart - it optimized our printf to puts because we're only printing a simple string.
2.3 The Assembler Stage
The assembler translates assembly code into machine code (0s and 1s):
gcc -c hello.c -o hello.oNow we have an object file called hello.o. This isn't a complete program yet - it's like a chapter in a book that hasn't been bound into the full book.
Let's examine this object file:
file hello.o# hello.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
The key word here is "relocatable". This means:
- The code is ready, but the addresses aren't final
- It references external functions (like
puts) but doesn't have them yet - It needs to be linked with other parts
Let's look inside with objdump:
objdump -d hello.oOutput:
hello.o: file format elf64-x86-64Disassembly of section .text:
Look at the addresses! They're all0000000000000000or similar. This is because the object file doesn't know where it will be loaded in memory. Thecall 10instruction also has zeros - it doesn't know where theputsfunction is yet!
Let's also check the symbols:
nm hello.oOutput:
U _GLOBAL_OFFSET_TABLE_0000000000000000 T mainU puts
This shows:
main is defined (T means Text/Code section)puts is undefined (U) - we need to find it somewhere else- The address of
main is currently 0 (will be filled in later)
2.4 The Linker Stage
The linker combines our object file with libraries to create a complete executable:
gcc hello.o -o helloNow we have a complete executable called hello. Let's examine it:
file helloNotice the difference: it's now an "executable" not "relocatable".
Let's look at the disassembly again:
objdump -d hello | grep -A 20 "<main>:"Output:
0000000000001139 <main>:1139: 55 push %rbp113a: 48 89 e5 mov %rsp,%rbp113d: 48 8d 3d c0 0e 00 00 lea 0xec0(%rip),%rdi # 2004 <_IO_stdin_used+0x4>1144: e8 e7 fe ff ff call 1030 <puts@plt>1149: b8 00 00 00 00 mov $0x0,%eax114e: 5d pop %rbp114f: c3 ret
Notice the differences:
mainnow has a real address:0000000000001139- The
leainstruction has a real offset:0xec0(%rip) - The
callinstruction has a real address:1030(which isputs@plt)
The linker has:
- Assigned final addresses to everything
- Connected our call to
putswith the actualputsfunction from the C library - Added extra code for startup and shutdown
Let's run it:
./hello# Hello, World!
It works! But there's more inside...
Step 3: What's REALLY Inside the Executable?
An executable file is like a layered cake. Let explore the layers:
readelf -S helloThis shows all the sections in our executable. Here are the important ones:
Section Headers:
[Nr] Name Type Address Offset Size EntSize Flags Link Info Align [13] .text PROGBITS 0000000000001060 00001060 0000000000000116 0000000000000000 AX 0 0 16 [15] .rodata PROGBITS 0000000000002000 00002000 000000000000000f 0000000000000000 A 0 0 4 [16] .eh_frame PROGBITS 0000000000002010 00002010 0000000000000040 0000000000000000 A 0 0 8 [24] .data PROGBITS 0000000000004018 00003018 0000000000000010 0000000000000000 WA 0 0 8What each section contains:
3.1 The .text Section
This contains our actual code (the machine instructions). This is where our main function lives.
Let's extract just the .text section:
3.2 The .rodata Section
This contains read-only data - things that shouldn't be modified. Like our string "Hello, World!"!
Let's look at it:
Output:
Contents of section .rodata:2000 01000200 48656c6c 6f2c2057 6f726c64 ....Hello, World2010 2100 !.
See that? 48656c6c6f2c20576f726c642100 is "Hello, World!" in hexadecimal!
48= H65= e6c= l6c= l6f= o2c= ,20= (space)57= W6f= o72= r6c= l64= d21= !00= (null terminator)
3.3 The .data Section
This contains global variables that can be modified. Our simple program doesn't have any, so it's small.
3.4 The .bss Section
This contains uninitialized global variables (like int x; without giving it a value).
Step 4: Let's See the Program in Action at Binary Level
Let's create a simple visualization of what's in memory when our program runs:
┌─────────────────────────────┐│ Text Section │ ← Our code (main function)│ 0x0000000000001060 │├─────────────────────────────┤│ ROData Section │ ← "Hello, World!" string│ 0x0000000000002000 │├─────────────────────────────┤│ Data Section │ ← Global variables│ 0x0000000000004018 │├─────────────────────────────┤│ Heap │ ← Dynamic memory│ (Grows upward) │├─────────────────────────────┤│ Stack │ ← Local variables│ (Grows downward) │└─────────────────────────────┘When we run ./hello, here's what happens:
- The OS loads the executable into memory
- It finds the entry point (not
maindirectly - there's startup code!) - The startup code calls our
mainfunction maincallsputswith the address of our stringputsfinds the string at address0x2004(in .rodata section)- It prints "Hello, World!" to the screen
mainreturns 0- The exit code is returned to the OS
Step 5: Let's Look at the Hex Dump
Finally, let's look at the raw bytes of our executable:
Output (truncated):
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............00000010: 0300 3e00 0100 0000 0000 0000 0000 0000 ..>.............00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................00000030: 0000 0000 4000 0000 0000 4000 0d00 0c00 ....@.....@.....00000040: 5548 89e5 488d 3d00 0000 00e8 0000 0000 UH..H.=.........00000050: b800 0000 005d c300 0000 0000 0000 0000 .....]..........
The first few bytes are the ELF header:
7f 45 4c 46= The ELF magic number:0x7Ffollowed by 'E', 'L', 'F'02= 64-bit architecture01= Little-endian (remember our lesson on endianness!)01= ELF version- And much more metadata...
If you look carefully around offset 0x40, you might recognize some of our code!
What We Learned
- Source code is just the beginning - it gets transformed multiple times
- Object files are incomplete - they have placeholders for external functions
- The linker completes the puzzle - it connects everything together
- Executables have sections - code, data, strings all live in different places
- Even "Hello World" is complex - it needs startup code, libraries, and proper memory layout





