The Inner World of "Hello World"

The Recipe Book Analogy

Think of your program like a recipe book:

  1. The source code (hello.c) is like the original recipe written in English
  2. The object file is like translating that recipe into a universal cooking language that all chefs understand
  3. The executable is like a complete cookbook with all instructions, tools, and ingredients ready to go

Step 1: Let's Write the Simplest Program

Open a text editor and create a file called hello.c:

#include <stdio.h>
int main() {
printf("Hello, World!\n");
return 0;
}

Done! We've written 5 lines of code. But here's what this actually means:

  • #include <stdio.h>: We're saying "give me the manual for printing to the screen"
  • int main(): This is the starting point of our program
  • printf(...): This is a function that prints text
  • return 0: We're saying "the program finished successfully"

Now, let's see what happens when we transform this human-readable code into something the computer understands.


Step 2: The Compilation Journey

When we compile this program, it goes through 4 stages:

hello.c (Source) → Preprocessor → Compiler → Assembler → Linker → hello (Executable)

Each step with actual commands:

2.1 The Preprocessor Stage

The preprocessor is like a copy-paste machine:

gcc -E hello.c -o hello.i

This creates a file called hello.i. Let's look at a small part of it:

# 1 "hello.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "hello.c"
# 1 "/usr/include/stdio.h" 1 3 4
# 27 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/features.h" 1 3 4
# 28 "/usr/include/features.h" 3 4
// ... hundreds of lines later ...
extern int printf (const char *__restrict __format, ...);
// ... more hundreds of lines ...
# 3 "hello.c" 2
int main() {
printf("Hello, World!\n");
return 0;
}

What happened?
The preprocessor took #include <stdio.h> and literally copied the entire stdio.h file into our code! That's why it's hundreds of lines long. It also added line number information (like # 3 "hello.c" 2).

2.2 The Compiler Stage

The compiler translates our C code into assembly language:

gcc -S hello.c -o hello.s

Let's look at hello.s:

assembly

Look closely! Notice something interesting? Our printf("Hello, World!\n") became call puts@PLT! The compiler is smart - it optimized our printf to puts because we're only printing a simple string.

2.3 The Assembler Stage

The assembler translates assembly code into machine code (0s and 1s):

gcc -c hello.c -o hello.o

Now we have an object file called hello.o. This isn't a complete program yet - it's like a chapter in a book that hasn't been bound into the full book.

Let's examine this object file:

file hello.o
# hello.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

The key word here is "relocatable". This means:

  • The code is ready, but the addresses aren't final
  • It references external functions (like puts) but doesn't have them yet
  • It needs to be linked with other parts

Let's look inside with objdump:

objdump -d hello.o

Output:

hello.o: file format elf64-x86-64
Disassembly of section .text:

Look at the addresses! They're all 0000000000000000 or similar. This is because the object file doesn't know where it will be loaded in memory. The call 10 instruction also has zeros - it doesn't know where the puts function is yet!

Let's also check the symbols:

nm hello.o

Output:

U _GLOBAL_OFFSET_TABLE_
0000000000000000 T main
U puts

This shows:

  • main is defined (T means Text/Code section)
  • puts is undefined (U) - we need to find it somewhere else
  • The address of main is currently 0 (will be filled in later)

2.4 The Linker Stage

The linker combines our object file with libraries to create a complete executable:

gcc hello.o -o hello

Now we have a complete executable called hello. Let's examine it:

file hello

Notice the difference: it's now an "executable" not "relocatable".

Let's look at the disassembly again:

objdump -d hello | grep -A 20 "<main>:"

Output:

0000000000001139 <main>:
1139: 55 push %rbp
113a: 48 89 e5 mov %rsp,%rbp
113d: 48 8d 3d c0 0e 00 00 lea 0xec0(%rip),%rdi # 2004 <_IO_stdin_used+0x4>
1144: e8 e7 fe ff ff call 1030 <puts@plt>
1149: b8 00 00 00 00 mov $0x0,%eax
114e: 5d pop %rbp
114f: c3 ret

Notice the differences:

  1. main now has a real address: 0000000000001139
  2. The lea instruction has a real offset: 0xec0(%rip)
  3. The call instruction has a real address: 1030 (which is puts@plt)

The linker has:

  1. Assigned final addresses to everything
  2. Connected our call to puts with the actual puts function from the C library
  3. Added extra code for startup and shutdown

Let's run it:

./hello
# Hello, World!

It works! But there's more inside...


Step 3: What's REALLY Inside the Executable?

An executable file is like a layered cake. Let explore the layers:

readelf -S hello

This shows all the sections in our executable. Here are the important ones:

Section Headers:

[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[13] .text PROGBITS 0000000000001060 00001060
0000000000000116 0000000000000000 AX 0 0 16
[15] .rodata PROGBITS 0000000000002000 00002000
000000000000000f 0000000000000000 A 0 0 4
[16] .eh_frame PROGBITS 0000000000002010 00002010
0000000000000040 0000000000000000 A 0 0 8
[24] .data PROGBITS 0000000000004018 00003018
0000000000000010 0000000000000000 WA 0 0 8

What each section contains:

3.1 The .text Section

This contains our actual code (the machine instructions). This is where our main function lives.

Let's extract just the .text section:

objdump -d -j .text hello | head -30

3.2 The .rodata Section

This contains read-only data - things that shouldn't be modified. Like our string "Hello, World!"!

Let's look at it:

objdump -s -j .rodata hello

Output:

Contents of section .rodata:
2000 01000200 48656c6c 6f2c2057 6f726c64 ....Hello, World
2010 2100 !.

See that? 48656c6c6f2c20576f726c642100 is "Hello, World!" in hexadecimal!

  • 48 = H
  • 65 = e
  • 6c = l
  • 6c = l
  • 6f = o
  • 2c = ,
  • 20 = (space)
  • 57 = W
  • 6f = o
  • 72 = r
  • 6c = l
  • 64 = d
  • 21 = !
  • 00 = (null terminator)

3.3 The .data Section

This contains global variables that can be modified. Our simple program doesn't have any, so it's small.

3.4 The .bss Section

This contains uninitialized global variables (like int x; without giving it a value).


Step 4: Let's See the Program in Action at Binary Level

Let's create a simple visualization of what's in memory when our program runs:

Memory Layout of Our Hello Program:
┌─────────────────────────────┐
│ Text Section │ ← Our code (main function)
│ 0x0000000000001060 │
├─────────────────────────────┤
│ ROData Section │ ← "Hello, World!" string
│ 0x0000000000002000 │
├─────────────────────────────┤
│ Data Section │ ← Global variables
│ 0x0000000000004018 │
├─────────────────────────────┤
│ Heap │ ← Dynamic memory
│ (Grows upward) │
├─────────────────────────────┤
│ Stack │ ← Local variables
│ (Grows downward) │
└─────────────────────────────┘

When we run ./hello, here's what happens:

  1. The OS loads the executable into memory
  2. It finds the entry point (not main directly - there's startup code!)
  3. The startup code calls our main function
  4. main calls puts with the address of our string
  5. puts finds the string at address 0x2004 (in .rodata section)
  6. It prints "Hello, World!" to the screen
  7. main returns 0
  8. The exit code is returned to the OS

Step 5: Let's Look at the Hex Dump

Finally, let's look at the raw bytes of our executable:

xxd hello | head -30

Output (truncated): 

00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............
00000010: 0300 3e00 0100 0000 0000 0000 0000 0000 ..>.............
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 4000 0000 0000 4000 0d00 0c00 ....@.....@.....
00000040: 5548 89e5 488d 3d00 0000 00e8 0000 0000 UH..H.=.........
00000050: b800 0000 005d c300 0000 0000 0000 0000 .....]..........

The first few bytes are the ELF header:

  • 7f 45 4c 46 = The ELF magic number: 0x7F followed by 'E', 'L', 'F'
  • 02 = 64-bit architecture
  • 01 = Little-endian (remember our lesson on endianness!)
  • 01 = ELF version
  • And much more metadata...

If you look carefully around offset 0x40, you might recognize some of our code!


What We Learned

  1. Source code is just the beginning - it gets transformed multiple times
  2. Object files are incomplete - they have placeholders for external functions
  3. The linker completes the puzzle - it connects everything together
  4. Executables have sections - code, data, strings all live in different places
  5. Even "Hello World" is complex - it needs startup code, libraries, and proper memory layout