Memory Profiling with Mesos and Jemalloc

On Linux systems, Mesos is able to leverage the memory-profiling capabilities of the jemalloc general-purpose allocator to provide powerful debugging tools for investigating memory-related issues.

These include detailed real-time statistics of the current memory usage, as well as information about the location and frequency of individual allocations.

This generally works by having libprocess detect at runtime whether the current process is using jemalloc as its memory allocator, and if so enable a number of HTTP endpoints described below that allow operators to generate the desired data at runtime.

Requirements

A prerequisite for memory profiling is a suitable allocator. Currently only jemalloc is supported, which can be connected via one of the following ways.

The recommended method is to specify the --enable-jemalloc-allocator compile-time flag, which causes the mesos-master and mesos-agent binaries to be statically linked against a bundled version of jemalloc that will be compiled with the correct compile-time flags.

Alternatively and analogous to other bundled dependencies of Mesos, it is of course also possible to use a suitable custom version of jemalloc with the --with-jemalloc=</path-to-jemalloc> flag.

NOTE: Suitable here means that jemalloc should have been built with the --enable-stats and --enable-prof flags, and that the string prof:true;prof_active:false is part of the malloc configuration. The latter condition can be satisfied either at configuration or at run-time, see the section on MALLOC_CONF below.

The third way is to use the LD_PRELOAD mechanism to preload a libjemalloc.so shared library that is present on the system at runtime. The MemoryProfiler class in libprocess will automatically detect this and enable its memory profiling support.

The generated profile dumps will be written to a random directory under TMPDIR if set, otherwise in a subdirectory of /tmp.

Finally, note that since jemalloc was designed to be used in highly concurrent allocation scenarios, it can improve performance over the default system allocator. In this case, it can be beneficial to build Mesos with jemalloc even if there is no intention to use the memory profiling functionality.

Usage

There are two independent sets of data that can be collected from jemalloc: memory statistics and heap profiling information.

Using any of the endpoints described below requires the jemalloc allocator and starting the mesos-agent or mesos-master binary with the option --memory_profiling=true (or setting the environment variable LIBPROCESS_MEMORY_PROFILING=true for other binaries using libprocess).

Memory Statistics

The /statistics endpoint returns exact statistics about the memory usage in JSON format, for example the number of bytes currently allocated and the size distribution of these allocations.

It takes no parameters and will return the results in JSON format:

http://example.org:5050/memory-profiler/statistics

Be aware that the returned JSON is quite large, so when accessing this endpoint from a terminal, it is advisable to redirect the results into a file.

Heap Profiling

The profiling done by jemalloc works by sampling from the calls to malloc() according to a configured probability distribution, and storing stack traces for the sampled calls in a separate memory area. These can then be dumped into files on the filesystem, so-called heap profiles.

To start a profiling run one would access the /start endpoint:

http://example.org:5050/memory-profiler/start?duration=5mins

followed by downloading one of the generated files described below after the duration has elapsed. The remaining time of the current profiling run can be verified via the /state endpoint:

http://example.org:5050/memory-profiler/state

Since profiling information is stored process-global by jemalloc, only a single concurrent profiling run is allowed. Additionally, only the results of the most recently finished run are stored on disk.

The profile collection can also be stopped early with the /stop endpoint:

http://example.org:5050/memory-profiler/stop

To analyze the generated profiling data, the results are offered in three different formats.

Raw profile

http://example.org:5050/memory-profiler/download/raw

This returns a file in a plain text format containing the raw backtraces collected, i.e., lists of memory addresses. It can be interactively analyzed and rendered using the jeprof tool provided by the jemalloc project. For more information on this file format, check out the official jemalloc documentation.

Symbolized profile

http://example.org:5050/memory-profiler/download/text

This is similar to the raw format above, except that jeprof is called on the host machine to attempt to read symbol information from the current binary and replace raw memory addresses in the profile by human-readable symbol names.

Usage of this endpoint requires that jeprof is present on the host machine and on the PATH, and no useful information will be generated unless the binary contains symbol information.

Call graph

http://example.org:5050/memory-profiler/download/graph

This endpoint returns an image in SVG format that shows a graphical representation of the samples backtraces.

Usage of this endpoint requires that jeprof and dot are present on the host machine and on the PATH of mesos, and no useful information will be generated unless the binary contains symbol information.

Overview

Which of these is needed will depend on the circumstances of the application deployment and of the bug that is investigated.

For example, the call graph presents information in a visual, immediately useful form, but is difficult to filter and post-process if non-default output options are desired.

On the other hand, in many debian-like environments symbol information is by default stripped from binaries to save space and shipped in separate packages. In such an environment, if it is not permitted to install additional packages on the host running Mesos, one would store the raw profiles and enrich them with symbol information locally.

Jeprof Installation

As described above, the /download/text and /download/graph endpoints require the jeprof program installed on the host system. Where possible, it is recommended to install jeprof through the system package manager, where it is usually packaged alongside with jemalloc itself.

Alternatively, a copy of the script can be found under 3rdparty/jemalloc-5.0.1/bin/jeprof in the build directory, or can be downloaded directly from the internet using a command like:

$ curl https://raw.githubusercontent.com/jemalloc/jemalloc/dev/bin/jeprof.in | sed s/@jemalloc_version@/5.0.1/ >jeprof

Note that jeprof is just a perl script that post-processes the raw profiles. It has no connection to the jemalloc library besides being distributed in the same package. In particular, it is generally not required to have matching versions of jemalloc and jeprof.

If jeprof is installed manually, one also needs to take care to install the necessary dependencies. In particular, this include the perl interpreter to execute the script itself and the dot binary to generate graph files.

Command-line Usage

In some circumstances, it might be desired to automate the downloading of heap profiles by writing a simple script. A simple example for how this might look like this:

#!/bin/bash

SECONDS=600
HOST=example.org:5050

curl ${HOST}/memory-profiler/start?duration=${SECONDS}
sleep $((${SECONDS} + 1))
wget ${HOST}/memory-profiler/download/raw

A more sophisticated script would additionally store the id value returned by the call to /start and pass it as a paremter to /download, to ensure that a new run was not started in the meantime.

Using the MALLOC_CONF Interface

The jemalloc allocator provides a native interface to control the memory profiling behaviour. The usual way to provide settings through this interface is by setting the environment variable MALLOC_CONF.

NOTE: If libprocess detects that memory profiling was started through MALLOC_CONF, it will reject starting a profiling run of its own to avoid interference.

The MALLOC_CONF interface provides a number of options that are not exposed by libprocess, like generating heap profiles automatically after a certain amount of memory has been allocated, or whenever memory usage reaches a new high-water mark. The full list of settings is described on the jemalloc man page.

On the other hand, features like starting and stopping the profiling at runtime or getting the information provided by the /statistics endpoint can not be achieved through the MALLOC_CONF interface.

For example, to create a dump automatically for every 1 GiB worth of recorded allocations, one might use the configuration:

MALLOC_CONF="prof:true,prof_prefix:/path/to/folder,lg_prof_interval=20"

To debug memory allocations during early startup, profiling can be activated before accessing the /start endpoint:

MALLOC_CONF="prof:true,prof_active:true"