How a Binary Loads and Executes

Every operating system uses slightly different mechanisms, but the overall flow is the same:

  1. You launch the program
  2. The OS loader reads the binary file format (EXE/PE on Windows, ELF on Linux)
  3. The loader maps code/data into memory
  4. The loader sets up the process environment
  5. The CPU's instruction pointer (IP/RIP) jumps to the entry point
  6. Execution begins

Let’s break it down carefully.

Image:AI Generated 

1. You launch the binary

When you:

  • type ./program in Linux, or
  • double-click program.exe in Windows, or
  • run through CreateProcess API

…the OS takes over.

The OS:

  • Creates a new process control block (PCB)
  • Gives the process a unique process ID (PID)
  • Starts preparing virtual memory for the program

This is the “birth” of a process

2. The OS loader identifies the binary format

Different OSes use different binary formats:

Windows →PE/PE32+/Portable Executable

Linux → ELF (Executable and Linkable Format)

macOS→Mach-O

The loader reads the binary’s header, which includes:

  • CPU architecture (x86, x64, ARM)
  • Entry point address
  • Section headers (.text, .data, .bss, .rdata, etc.)
  • Dynamic linking info (libraries needed)
  • Memory layout requirements
  • Stack & heap initialization data

The binary header is like a blueprint.

3. Loader maps sections into memory (Virtual Memory Mapping)

Every program has sections like:

Section Meaning
.text machine code (instructions)
.data initialized global variables
.bss uninitialized global variables
.rodata read-only data (strings, constants)
.reloc / .plt / .got relocation & linking info

The loader uses the MMU (Memory Management Unit) and page tables to map:

  • .text → read + execute
  • .data → read + write
  • .bss → read + write
  • stack → read + write
  • heap → read + write, expandable

The binary is not "loaded" as one chunk — it's mapped in pages (usually 4 KB per page).

4. Loader resolves dynamic libraries (DLL / Shared libs)

Programs often depend on system libraries such as:

  • Windows:kernel32.dlluser32.dll, ntdll.dll
  • Linux:libc.so,libpthread.so

The loader:

  • Finds required shared libraries
  • Maps them into the process
  • Fixes up import table (IAT on Windows, GOT/PLT on Linux)
  • Applies relocations (adjusting address references)

This is why running "Hello World" in C still loads tens of libraries.

5. The loader sets up the initial process environment

Before the CPU runs your code, the OS prepares:

1. Stack

Allocated and pointer initialized (ESP/RSP).

Stack contains:

  • argc
  • argv[]
  • environment variables
  • auxiliary vectors (Linux)

2. Heap

Sets the base for dynamic memory (malloc, new).

3. Thread info

Creates the main thread and assigns TCB (Thread Control Block).

4. CPU State

  • Instruction pointer →entry point
  • Registers defaulted/reset
  • Flags cleared/set as required

6. Loader jumps to the program’s ENTRY POINT

Every binary has an entry address:

  • In ELF → e_entry
  • In Windows PE → AddressOfEntryPoint

This is not main().

In C/C++ binaries:

Entry point is runtime initialization:

Windows:__tmainCRTStartup → calls main()

Linux:_start → __libc_start_main → main()

This startup code sets up:

  • global constructors
  • memory allocators
  • TLS (thread-local storage)
  • exception stack frames

Only after this, your main() begins.

7. The CPU begins execution

The loader hands control to the program by:

RIP = entry_point

The CPU now:

  • Fetches the instruction (from .text section)
  • Decodes it
  • Executes it
  • Moves to next instruction

This is the classic fetch–decode–execute cycle.

The binary is now running like any other process on the CPU.

8. Program ends → cleanup and exit

When your program calls exit() or returns from main():

  • Exit code is stored
  • OS destroys process environment
  • Frees memory mappings
  • Closes file handles
  • Notifies parent process