Published on

What Actually Happens When You Run a Program?

Authors
  • avatar
    Name
    Mike
    Twitter

🧠 What Actually Happens When You Run a Program?

Running a program feels like a single action.

You type a command, press Enter, and something appears on the screen.

./hello

But that one command crosses a surprising number of boundaries: shell parsing, process creation, executable loading, virtual memory setup, dynamic linking, system calls, scheduling, and cleanup.

The goal of this article is not to explain every transistor-level detail. It is to build the mental model I wish I had earlier:

A program is a file sitting on disk. A process is that program brought to life by the operating system.

Think of the executable file as a script for a play. By itself, it is just text on a shelf. A process is an actual performance of that script: it has a stage, props, actors, and a stage manager keeping everything from colliding.

In this metaphor:

  • The program file is the script.
  • The process is one running performance.
  • The kernel is the stage manager.
  • The CPU is the actor reading instructions.
  • A system call is the actor asking the stage manager for something they cannot do alone.

Like all metaphors, this one eventually breaks. But it gets the first important distinction right: the file is passive, the process is alive.


🧪 The Program We Will Follow

Let's use the smallest useful C program:

#include <stdio.h>

int main(int argc, char **argv) {
  printf("hello, process\n");
  return 0;
}

Compile it:

gcc -o hello hello.c

Run it:

./hello

Output:

hello, process

That output is the final visible result. Now let's walk backward through what had to happen for those bytes to reach your terminal.


📦 Program vs Process

Before anything runs, hello is just a file.

You can inspect it:

file hello

On a typical Linux machine, you might see something like:

hello: ELF 64-bit LSB pie executable, x86-64, dynamically linked

That sentence already tells us a lot:

  • ELF is the executable file format Linux understands.
  • x86-64 means the machine code targets a 64-bit Intel/AMD-style CPU.
  • Dynamically linked means the program expects help from shared libraries like libc.

But none of that means the program is running yet.

A program is the recipe. A process is someone actively cooking from that recipe.

Open the same editor twice and you usually have one program file but two processes. Each process gets its own identity, its own memory, its own open files, and its own place in the scheduler's line.

The kernel tracks that running thing with a process data structure. On Linux, this includes things like:

  • A process ID, or PID.
  • A virtual address space.
  • CPU register state.
  • Open file descriptors.
  • Signal handlers.
  • Credentials and permissions.
  • Scheduling state.

The program file is not enough. The operating system has to create a whole environment around it.


🐚 Step 1: The Shell Reads Your Command

When you type this:

./hello

your shell reads the line first. The shell is not magic. It is just another program, like bash, zsh, or fish.

It parses the command and decides what to do.

If you type a shell builtin like cd, the shell handles it itself. There is no separate /usr/bin/cd process in the normal case because changing directories has to affect the shell's own state.

But ./hello is different. The ./ says: run the file named hello in the current directory.

At a high level, the shell does something shaped like this:

pid_t pid = fork();

if (pid == 0) {
  execve("./hello", argv, envp);
} else {
  waitpid(pid, &status, 0);
}

That is not the full source code for a real shell, but it is the core idea.

There are three important calls here:

  • fork() creates a child process.
  • execve() replaces the child process with the program you asked to run.
  • waitpid() lets the shell wait until that child finishes.

This split between fork and exec is one of the cleanest ideas in Unix.

fork() does not run a new program. It clones the current process.

After fork(), there are two processes running the same code:

  • In the child, fork() returns 0.
  • In the parent, fork() returns the child's PID.

That is how the shell knows which side should call execve() and which side should wait.

Modern systems do not eagerly copy all of the shell's memory when this happens. They use copy-on-write, which means the parent and child can share memory pages until one of them tries to modify a page. Then the kernel makes a private copy.

So fork() is conceptually a clone, but not necessarily a giant memory copy.


🔁 Step 2: execve Replaces the Child

The child process now calls:

execve("./hello", argv, envp);

This is the real handoff.

execve() does not create a new process. That part already happened with fork().

Instead, execve() takes the current process and replaces its program image.

The child keeps some important process-level things:

  • Its PID.
  • Its parent process.
  • Its open file descriptors, unless marked close-on-exec.
  • Some credentials and kernel bookkeeping.

But its old memory layout is thrown away and replaced with a new one built from ./hello.

This detail explains why pipes and redirection work so cleanly.

If the shell sets up file descriptor 1 to point into a pipe before calling execve(), the new program inherits that file descriptor. The new program does not need to know whether its output is going to a terminal, a file, or another process.

It just writes to standard output.

That is Unix doing what Unix does best: small pieces connected by simple interfaces.


🧾 Step 3: The Kernel Reads the Executable

Once execve() crosses into the kernel, Linux has to answer some basic questions:

  • Does this file exist?
  • Is it executable?
  • What format is it?
  • What CPU architecture is it for?
  • Where should its code and data live in memory?
  • Does it need a dynamic linker?

For a normal Linux C program, the file is usually an ELF executable.

ELF files contain headers that describe how the program should be loaded. The kernel does not treat the file as a random bag of machine code. It reads the headers and maps the important pieces into the process's virtual address space.

That word matters: maps.

The operating system usually does not read the entire executable into RAM all at once. It creates mappings from parts of the file into memory. When the process actually touches a page, the CPU triggers a page fault, and the kernel brings in the needed page.

So the simple beginner version says:

The OS loads the program into memory.

The more accurate version is:

The kernel maps the executable into the process's address space and lets demand paging pull pieces into RAM as needed.

That second sentence is less cute, but it is much closer to what happens.


🧠 Step 4: Memory Gets Laid Out

Every process gets its own virtual address space: a private view of memory.

It is not just one big blob. It has regions with different jobs.

Conceptually, it looks something like this:

High addresses

+-------------------------------+
| stack                         |
| argc, argv, envp, local vars  |
| grows downward                |
+-------------------------------+
|                               |
| unused / mapped libraries     |
|                               |
+-------------------------------+
| heap                          |
| malloc/free memory            |
| grows upward                  |
+-------------------------------+
| bss                           |
| zero-initialized globals      |
+-------------------------------+
| data                          |
| initialized globals           |
+-------------------------------+
| text                          |
| executable machine code       |
+-------------------------------+

Low addresses

The exact addresses vary because of ASLR, shared libraries, kernel choices, and architecture details. But the shape is useful.

Some regions are read-only. Some are writable. Some are executable. The kernel and CPU cooperate to enforce those permissions.

For our tiny program, the printf call lives in code. The string literal "hello, process\n" lives in read-only data. Local variables like argc and argv are accessed through the stack. If we called malloc, the memory would come from the heap.

You can see a rough section breakdown with:

size hello

Example output:

   text    data     bss     dec     hex filename
   1421     584       8    2013     7dd hello

Those numbers are not the whole process, but they make the executable feel less mystical. It has code, data, and zero-initialized storage before it ever starts running.


🚪 Step 5: The Dynamic Linker Runs Before main

From the programmer's point of view, C starts here:

int main(int argc, char **argv) {
  printf("hello, process\n");
  return 0;
}

But main is not the first instruction.

The executable has an entry point. In a normal C program, that entry point is usually called _start, provided by the C runtime.

For dynamically linked programs, there is another important actor before that: the dynamic linker, often something like /lib64/ld-linux-x86-64.so.2.

The ELF file can contain a PT_INTERP entry that says, in effect:

Before running this program directly, run this interpreter so it can prepare the shared libraries.

The dynamic linker maps shared libraries like libc, resolves symbols, applies relocations, and then transfers control into the program's startup code.

Only after that setup does the C runtime call your main function.

The path is roughly:

kernel execve
  -> dynamic linker
  -> _start
  -> libc startup
  -> main

So when someone says "C starts at main," they are saying something useful for application programming, but not quite true for operating systems.

main is where your code starts participating. It is not where the process starts existing.


🖨 Step 6: printf Becomes a System Call

Now the program is finally running your code.

It reaches:

printf("hello, process\n");

printf is a libc function. It formats text and writes it to standard output.

But your program cannot directly push characters onto the terminal. Hardware access is controlled by the kernel. User programs live in user mode, where they cannot just scribble on devices or rewrite kernel memory.

So eventually, printing reaches a system call shaped like:

write(1, "hello, process\n", 15);

File descriptor 1 is standard output.

By default, standard output points at your terminal. But it might point somewhere else:

./hello > out.txt

Now file descriptor 1 points at a file.

Or:

./hello | wc -c

Now file descriptor 1 points into a pipe connected to another process.

The program does not need separate logic for each case. It writes bytes to file descriptor 1. The kernel handles what that descriptor means.

This is why file descriptors are such a powerful abstraction. They let programs stay simple while the shell wires them together in different ways.


🔍 Watching It With strace

On Linux, strace lets you watch system calls as a program runs:

strace ./hello

The real output is noisy, especially for dynamically linked programs. But you will see a shape like this:

execve("./hello", ["./hello"], ...) = 0
brk(NULL)                         = 0x55f3c6b1d000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = ...
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
mmap(NULL, ..., PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = ...
write(1, "hello, process\n", 15)  = 15
exit_group(0)                     = ?

Even if you do not understand every line yet, the important parts are visible:

  • execve starts the new program image.
  • mmap maps memory.
  • openat opens files needed by the loader.
  • write sends bytes to standard output.
  • exit_group ends the process.

The interesting thing is what you do not see: there is no line that says "run main." The kernel does not know about your C function. It knows about executable formats, virtual memory, file descriptors, and CPU state.

main is part of the language/runtime contract above that.


⏱ Step 7: The Scheduler Shares the CPU

While your program is running, it probably does not own the CPU continuously.

The kernel scheduler decides which process or thread gets to run next. Your process may run for a little while, get paused, and resume later as if nothing happened.

To make that work, the kernel saves and restores execution context:

  • Instruction pointer.
  • Stack pointer.
  • Registers.
  • Process state.
  • Memory mapping information.

This is why you can have dozens or hundreds of processes on a machine with only a handful of CPU cores.

It is not that every process is literally running at the same instant. It is that the operating system switches between them fast enough, and carefully enough, that each process can pretend it has its own little machine.

That illusion is one of the operating system's main jobs.


🧹 Step 8: The Process Exits

Our program returns from main:

return 0;

That return value becomes the process's exit status.

Returning from main does not instantly erase the process. Control goes back through the C runtime, which can flush buffered output and run cleanup handlers. Then it asks the kernel to end the process, usually through a syscall like exit_group.

The kernel then cleans up most of the process's resources:

  • Memory mappings are released.
  • File descriptors are closed.
  • Kernel bookkeeping is updated.
  • The exit status is saved for the parent.

But one tiny piece remains until the parent collects it: the process becomes a zombie.

A zombie is not a running process. It is just an exit record waiting to be read.

The shell, which has been waiting in waitpid(), collects that status. Then it prints your prompt again.

You can see the last exit code with:

echo $?

For our program, it should be:

0

In Unix tradition, 0 means success.


🧭 The Whole Trip in Order

Here is the full path from command to cleanup:

  1. You type ./hello into the shell.
  2. The shell parses the command.
  3. The shell calls fork().
  4. The child process calls execve("./hello", argv, envp).
  5. The kernel checks permissions and reads the ELF headers.
  6. The kernel builds a new virtual address space for the child.
  7. Executable segments are mapped into memory.
  8. Arguments, environment variables, and auxiliary data are placed on the stack.
  9. The dynamic linker runs if the program is dynamically linked.
  10. Runtime startup code runs before main.
  11. Your main function finally gets called.
  12. printf eventually becomes a write system call.
  13. The scheduler may pause and resume the process many times.
  14. main returns 0.
  15. The runtime calls into the kernel to exit.
  16. The kernel releases resources and stores the exit status.
  17. The shell collects the status with waitpid().
  18. The prompt comes back.

That is a lot of machinery behind one line of terminal input.

But the core model is simple:

The shell creates a process, the child replaces itself with your executable, the kernel maps the program into memory, startup code calls main, your program uses system calls to interact with the outside world, and the parent collects the exit status when it is done.


🧩 Things We Skipped

This article intentionally leaves some rabbit holes for later:

  • How virtual memory and page tables work.
  • What the CPU does during privilege transitions.
  • The exact ELF header layout.
  • Static vs dynamic linking in detail.
  • How interpreters like Python add another layer.
  • How graphical launchers differ from shells.
  • Security features like ASLR, stack canaries, and W^X.

Those are all worth learning. But they make more sense after this lifecycle is clear.

First understand the path. Then zoom into each piece.

The next time you type ./hello, it should feel less like magic and more like a well-orchestrated handoff between simple parts.