Structures in jemalloc
Times have changed. In modern systems, spatial locality of memory is more important than minimizing the total number of pages. Data that is allocated together is often used together. It is also optimized to reduce lock contention when multiple threads try to allocate memory.
jemalloc like every other malloc implementation, needs to support two primary API calls:
malloc: returns memory chunk
calloc: returns zeroed memory chunk
free: Freeing allocated memory
Does OS not do this for us? Short answer, No. OS gives us two ways of requesting memory:
mmap: maps device space memory (generally file system) to user space memory.
brk: is used to change the program break address (which is the first location after the end of the uninitialized data segment). Increasing the program break allocates more memory, decreasing deallocates memory.
jemalloc uses the above two ways to get a memory that it will then manage for the user. To do this it uses various structures. The key structures are:
This is the highest unit of memory management structure that jemalloc uses. If we imagine a pool of arenas, each thread is given an arena based on the hash of its thread-id or a similar mechanism. This is to prevent lock contention, in an ideal case, if each thread has it's own arena, there would be no contention.
Chunks are virtual memory areas that available memory is conceptually divided into. They are typically of the same size --- 2MB. Each chunk is associated with an Arena. Chunks are the highest abstraction used in jemalloc's design, the rest of the structures described below are actually placed within a chunk somewhere.
A region is jemalloc term for the size of the allocation a user requests. Regions are divided into three classes according to their size, namely:
small/medium: these regions are smaller than the page size (typically 4KB)
large: these regions are between small/medium and huge.
huge: size of these is greater than the chunk size. These are dealt with separately and not managed by arenas, they have a global allocator tree.
In essence, a chunk is broken into several runs. Each run is actually a set of one or more contiguous pages (but a run cannot be smaller than one page). Therefore, they are aligned to multiples of the page size.
The main responsibility of a run is to keep track of the state (i.e. free or used) of regions. Each run holds regions of a specific size and their state is tracked with a bitmask. Note: I simplified the bitmask to an array of booleans for clarity.
A bin is what keeps track of free regions of a specific size. All the runs in a Bin have the same region size, equal to the binRegionSize. Every Run is part of some Bin. A Bin can manage multiple Runs.
Let's start with a single call to
malloc(2). Assuming that the smallest
RegionSize we support is
2 (this is a pretty common smallest value) which corresponds to
The Arena would start with no chunks and one
Bin for every
RegionSize. All the bins would start empty. The bin at index
Arena.bins, would then end up with one
Run, with one of its first available
Page.region allocated to the caller of
malloc(2). The Bin's
currentRun would be set to this
If there was a follow up
malloc(4) . We would to the same but for the bin at index
Arena.bins. This would also have exactly one
This hopefully gives you some insight into how jemalloc works. As I dug through this, i've noticed that theres a lot more subtleties to memory management thatn I first imagined.