Understanding Memory: Stack, Heap, and Everything Between
Note: This is a test, not an actual blog
Every program you run is an illusion. The contiguous, flat address space your code operates in doesn’t exist in hardware — it’s a fiction the operating system maintains on your behalf. Understanding how memory actually works transforms the way you write code.
The Virtual Address Space
When a process starts, the OS gives it a private virtual address space — typically 128 TB on a 64-bit Linux system. None of this is real memory yet. It’s a range of possible addresses. Physical RAM is mapped in only when the process actually touches a page.
The address space has a structure:
High addresses ┌─────────────────────────────┐
│ Kernel (inaccessible) │
├─────────────────────────────┤
│ Stack (grows downward ↓) │
├─────────────────────────────┤
│ Memory-mapped files │
├─────────────────────────────┤
│ Heap (grows upward ↑) │
├─────────────────────────────┤
│ BSS (uninitialized globals) │
├─────────────────────────────┤
│ Data (initialized globals) │
├─────────────────────────────┤
Low addresses │ Text (executable code) │
└─────────────────────────────┘
The Stack
The stack is a LIFO (last in, first out) region of memory managed automatically by the CPU. When you call a function, a stack frame is pushed: space for local variables, the return address, and saved registers. When the function returns, the frame is popped.
This is why stack allocation is essentially free — it’s just a single pointer decrement. And it’s why stack memory is automatically reclaimed — the pointer moves back when the frame exits.
The tradeoff is size. Stack space is limited (typically 8 MB on Linux). Recursive algorithms that go too deep exhaust it — a stack overflow. Large arrays on the stack cause the same problem.
void bad_idea() {
int huge_array[10000000]; // ~40 MB on the stack — crash
// ...
}
void better_idea() {
int *huge_array = malloc(10000000 * sizeof(int)); // on the heap
// ...
free(huge_array);
}
The Heap
The heap is unstructured memory managed explicitly (in C/C++) or by a garbage collector (in Go, Java, JavaScript). malloc carves out a region, free returns it. The allocator maintains its own bookkeeping structures to track what’s in use and what’s available.
Heap allocation is slower than stack allocation for two reasons:
- Bookkeeping overhead — the allocator must find a suitable free region
- Cache locality — heap objects can be scattered across memory, causing cache misses
This is why performance-critical code often avoids heap allocation in hot paths. Allocating a small struct on the stack and returning it by value is often faster than allocating it on the heap and returning a pointer.
Why This Matters for Modern Languages
In Rust, the distinction is explicit. Box<T> is a heap allocation. T without Box is stack-allocated. The borrow checker ensures stack memory is never accessed after the frame exits — at compile time.
In Go, the compiler’s escape analysis decides for you. If a variable is returned from a function or its address is stored somewhere that outlives the function, it escapes to the heap. Otherwise it stays on the stack.
func stackAlloc() int {
x := 42 // stays on stack
return x // copied out by value
}
func heapAlloc() *int {
x := 42 // escapes to heap
return &x // address is returned — x must outlive the frame
}
In JavaScript and most garbage-collected languages, you rarely think about this directly — but the GC is paying the tax on your behalf. The pauses you sometimes observe are the collector tracing and reclaiming heap objects.
Cache Lines and Spatial Locality
The CPU doesn’t read individual bytes from RAM. It reads cache lines — typically 64 bytes at a time. If your data is laid out sequentially in memory, iterating through it is fast: each cache line load populates the cache with the next several elements.
This is why arrays of structs (AoS) versus structs of arrays (SoA) is a meaningful performance choice:
// AoS — accessing one field of every element skips 56 bytes each step
struct Point { float x, y, z, w; };
Point points[1000];
// SoA — accessing all x values is sequential: cache-friendly
struct Points { float x[1000], y[1000], z[1000], w[1000]; };
If your algorithm only ever reads x, the SoA layout is dramatically faster because each cache line is full of x values.
These aren’t academic details. They explain why your Redis server is fast, why columnar databases beat row-oriented ones for analytics, why Rust’s zero-cost abstractions are actually zero-cost, and why profilers report “cache misses” as a meaningful bottleneck. Memory shapes everything.