Structures in jemalloc

October 4, 20194 min read

Times have changed. In modern systems, spatial locality of memory is more important than minimizing the total number of pages. Data that is allocated together is often used together. It is also optimized to reduce lock contention when multiple threads try to allocate memory.

jemalloc like every other malloc implementation, needs to support two primary API calls:

Allocating memory
- malloc: returns memory chunk
- calloc: returns zeroed memory chunk
free: Freeing allocated memory

Does OS not do this for us? Short answer, No. OS gives us two ways of requesting memory:

mmap: maps device space memory (generally file system) to user space memory.
brk: is used to change the program break address (which is the first location after the end of the uninitialized data segment). Increasing the program break allocates more memory, decreasing deallocates memory.

Structures

jemalloc uses the above two ways to get a memory that it will then manage for the user. To do this it uses various structures. The key structures are:

Arena

This is the highest unit of memory management structure that jemalloc uses. If we imagine a pool of arenas, each thread is given an arena based on the hash of its thread-id or a similar mechanism. This is to prevent lock contention, in an ideal case, if each thread has it's own arena, there would be no contention.

Chunk

Chunks are virtual memory areas that available memory is conceptually divided into. They are typically of the same size --- 2MB. Each chunk is associated with an Arena. Chunks are the highest abstraction used in jemalloc's design, the rest of the structures described below are actually placed within a chunk somewhere.

Regions

A region is jemalloc term for the size of the allocation a user requests. Regions are divided into three classes according to their size, namely:

small/medium: these regions are smaller than the page size (typically 4KB)
large: these regions are between small/medium and huge.
huge: size of these is greater than the chunk size. These are dealt with separately and not managed by arenas, they have a global allocator tree.

Run

In essence, a chunk is broken into several runs. Each run is actually a set of one or more contiguous pages (but a run cannot be smaller than one page). Therefore, they are aligned to multiples of the page size.

The main responsibility of a run is to keep track of the state (i.e. free or used) of regions. Each run holds regions of a specific size and their state is tracked with a bitmask. Note: I simplified the bitmask to an array of booleans for clarity.

Bin

A bin is what keeps track of free regions of a specific size. All the runs in a Bin have the same region size, equal to the binRegionSize. Every Run is part of some Bin. A Bin can manage multiple Runs.

Picture Time

Example Allocation

Let's start with a single call to malloc(2). Assuming that the smallest RegionSize we support is 2 (this is a pretty common smallest value) which corresponds to SM_2.

The Arena would start with no chunks and one Bin for every RegionSize. All the bins would start empty. The bin at index 0 in Arena.bins, would then end up with one Run, with one of its first available Page.region allocated to the caller of malloc(2). The Bin's currentRun would be set to this Run.

If there was a follow up malloc(4) . We would to the same but for the bin at index 1 in Arena.bins. This would also have exactly one Run.

This hopefully gives you some insight into how jemalloc works. As I dug through this, i've noticed that theres a lot more subtleties to memory management thatn I first imagined.