03

PMR's monotonic_buffer_resource as architectural statelessness

PMR as the in-language realization of "request brings its own memory, all releases together." The request arena RAII pattern, the canonical layered monotonic + unsynchronized_pool recipe, container-type choices, the lifetime trap counterexample, C++23 additions.

Section 3

03 — PMR’s monotonic_buffer_resource as Architectural Statelessness

Thesis

Doc 02 established the discipline: bundle per-request state into a RequestContext RAII type, destroy at scope exit, no manual cleanup. The arena member of that context — a std::pmr::monotonic_buffer_resource backed by an inline buffer — is the in-language realization of “request brings its own memory, all releases together, no allocation crosses the boundary.” Construct an arena at request entry, allocate everything from it via std::pmr::polymorphic_allocator, destroy at request exit. The arena is the statelessness boundary, mechanized in the type system and enforced by the C++ destruction model.

This document covers PMR for service handlers: how monotonic_buffer_resource works, the layered-resource pattern that’s become the canonical recipe, how the std::pmr::* container family integrates, the choice of std:: container types over PMR, and the lifetime traps that catch people the first few times. The performance angle from Doc 02 — the asymmetry between O(N) per-entry destruction and O(1) arena release — gets its concrete realization here.

Diagram statelessness/03-pmr hasn't been drawn yet. See diagrams/README.md for the conventions.

PMR monotonic_buffer_resource: bump-pointer per request, single bulk free at scope-end. Download .excalidraw

Why PMR

The Polymorphic Memory Resources facility, introduced in C++17 in <memory_resource>, separates two things that pre-PMR custom allocators conflated: which memory resource to use, and how to use it. A std::pmr::memory_resource is an abstract base with a single allocate/deallocate pair. A std::pmr::polymorphic_allocator<T> wraps a pointer to a memory resource and presents the standard allocator interface. The same vector type — std::pmr::vector<int> — can be backed by a stack-allocated arena, a heap-backed pool, a third-party allocator, or even a memory-mapped region, simply by handing it a different resource pointer.

The mechanical advantage over pre-PMR custom allocators is that allocator type does not appear in the container’s type. std::vector<int, MyArenaAllocator> and std::vector<int, MyPoolAllocator> are different types and don’t interoperate. std::pmr::vector<int> is one type regardless of which resource backs a particular instance. This makes PMR-based code composable: a function that takes a std::pmr::vector<int>& accepts vectors backed by any resource the caller chose.

Opinion. PMR is one of the most undersold features of C++17. For a service codebase that wants per-request allocation discipline, the alternative — custom allocator templates threaded through every container type — is almost always worse. Reach for PMR first; reach for custom allocators only when you have a profile that says PMR’s vtable overhead is the bottleneck.

The runtime cost of PMR is one virtual call per allocation through the memory_resource base. For arena-style use this is negligible; for high-frequency allocation of small objects in tight inner loops, it can be measurable, in which case the PMR-allocator can be re-wrapped as a concrete allocator type to elide the indirection. The compiler can devirtualize when the resource type is known statically, but only if the inlining context lets it see through the pointer.

monotonic_buffer_resource mechanics

std::pmr::monotonic_buffer_resource is a bump-pointer allocator. Allocation increments a pointer; deallocation does nothing. When the current buffer is exhausted, the resource asks its upstream resource for another buffer (geometrically growing in size by an implementation-defined factor). On destruction, all buffers are returned to the upstream resource in one go.

This shape gives three properties that matter to a handler.

First, allocation is fast. A bump-pointer increment costs a handful of instructions, often inlined to nothing. Compared to a heap allocation — lock acquisition, free-list traversal, possible page fault, possible system call — the difference is one to two orders of magnitude.

Second, deallocation is a no-op. do_deallocate does nothing. Per-allocation deallocation cost is zero. This is the underpinning of the O(1) destruction property: the destructor of a pmr::vector over a monotonic arena runs the value destructors but the per-element deallocate goes nowhere. For trivially destructible values, even the value destructors compile out and the entire teardown collapses to “decrement the bump pointer to where it was.”

Third, release-all on destruction. When the resource itself is destroyed, it returns all allocated buffers to the upstream resource. For a request-scoped arena, this happens at the end of the handler scope. For one allocated on the stack with no upstream, it happens when the stack frame unwinds and there is nothing further to do.

The constructor variants matter for the patterns below:

// Variant 1: no initial buffer, default upstream (the global allocator).
// Geometrically grows as needed.
std::pmr::monotonic_buffer_resource arena;

// Variant 2: initial buffer (stack-allocated here), default upstream.
// Zero heap allocation as long as the buffer holds.
std::array<std::byte, 64 * 1024> buffer;
std::pmr::monotonic_buffer_resource arena{buffer.data(), buffer.size()};

// Variant 3: initial buffer + null upstream. Stack-only, never heap.
// Throws std::bad_alloc if the buffer is exhausted.
std::pmr::monotonic_buffer_resource arena{
    buffer.data(), buffer.size(),
    std::pmr::null_memory_resource()};

// Variant 4: initial buffer + explicit upstream (e.g., a process-wide pool).
// Falls through to the pool when the stack buffer is full.
std::pmr::monotonic_buffer_resource arena{
    buffer.data(), buffer.size(), &process_upstream};

The third variant is worth knowing for latency-critical services where heap allocation under load is unacceptable. Combined with a sized buffer it gives a hard cap: either the handler fits in the budget or it fails fast with std::bad_alloc, which a top-level handler frame can translate into a RESOURCE_EXHAUSTED gRPC status. No silent fallback to malloc, no tail-latency surprise from a stray allocation.

The request arena RAII pattern

A handler-scoped arena is a few lines:

#include <array>
#include <cstddef>
#include <memory_resource>

class RequestArena {
public:
    RequestArena()
        : arena_{buffer_.data(), buffer_.size()} {}

    RequestArena(const RequestArena&)            = delete;
    RequestArena& operator=(const RequestArena&) = delete;
    RequestArena(RequestArena&&)                 = delete;
    RequestArena& operator=(RequestArena&&)      = delete;

    std::pmr::memory_resource* resource() noexcept { return &arena_; }

private:
    static constexpr std::size_t kInitial = 64 * 1024;
    std::array<std::byte, kInitial>      buffer_;
    std::pmr::monotonic_buffer_resource  arena_;
};

The RequestContext from Doc 02 already has this as a member; the standalone class above is for cases where the arena is wanted alone. Sizing the inline buffer to 64 KB is a reasonable starting point — large enough that most handlers stay heap-free, small enough that 1,000 in-flight requests fit in 64 MB. Tune from a profile.

A handler that uses it through the standard library:

grpc::ServerUnaryReactor* MyService::TokenizeAndCount(
    grpc::CallbackServerContext* ctx,
    const text::Request* req,
    text::Response* resp) {

    RequestArena arena;
    auto* mr = arena.resource();

    // Tokenize input into a pmr::vector of pmr::string, all over the arena
    std::pmr::vector<std::pmr::string> tokens{mr};
    tokens.reserve(req->input_size());
    for (const auto& chunk : req->input()) {
        tokens.emplace_back(chunk, mr);
    }

    // Build a frequency map, also over the arena.
    std::pmr::unordered_map<std::pmr::string, std::size_t> freq{mr};
    for (const auto& tok : tokens) {
        ++freq[tok];
    }

    // Populate response. The response message is owned by gRPC, not the
    // arena — populating it copies out where needed.
    for (const auto& [word, count] : freq) {
        auto* entry = resp->add_counts();
        entry->set_word(std::string{word});  // copy out of arena
        entry->set_count(static_cast<int64_t>(count));
    }

    auto* reactor = ctx->DefaultReactor();
    reactor->Finish(grpc::Status::OK);
    return reactor;
    // arena destructs here. The tokens vector, the strings inside it, the
    // freq map and its node storage — all reclaimed in one buffer release.
}

Two things to notice. First, every container takes the resource pointer as a constructor argument. There is no global default — passing the resource is the trade for not having to template the container type on the allocator. Second, the boundary between arena-owned and gRPC-owned memory is explicit: the response protobuf message is owned by gRPC and its strings are heap-allocated by protobuf’s own allocator. Copying out of the arena into the response is intentional, because the response outlives the arena.

Opinion. Make the arena/protobuf boundary explicit in code review. The mistake is to hold an arena-allocated std::pmr::string reference inside a structure that outlives the arena — the response message, a process-scoped cache, a closure passed to an async continuation. The counterexample below shows the failure mode.

Layered resources: monotonic + pool

The single-resource pattern above is great for one-shot scratch — build it, use it, destroy it. For handlers that build and modify container contents (insertions, deletions, reallocations), a layered resource works better:

class RequestArena {
public:
    RequestArena()
        : monotonic_{buffer_.data(), buffer_.size()},
          pool_{pool_options(), &monotonic_} {}

    std::pmr::memory_resource* resource() noexcept { return &pool_; }

private:
    static constexpr std::size_t kInitial = 64 * 1024;

    static std::pmr::pool_options pool_options() noexcept {
        return {.max_blocks_per_chunk = 0,
                .largest_required_pool_block = 512};
    }

    std::array<std::byte, kInitial>           buffer_;
    std::pmr::monotonic_buffer_resource       monotonic_;
    std::pmr::unsynchronized_pool_resource    pool_;
};

The pool resource carves the monotonic slab into size-class freelists. Allocations under 512 bytes go through the pool’s per-size-class freelist; allocations above the threshold pass through to the monotonic resource directly. Deallocations to the pool return the block to its freelist for reuse within the same request. Deallocations to the monotonic resource are still no-ops.

The win is that handlers which repeatedly allocate and deallocate small objects of the same size — typical of unordered_map rebucketing, vector reallocations, push/pop on a deque — reuse memory within the arena rather than monotonically growing the underlying slab. The arena footprint stays bounded by the high-water mark of concurrent allocations rather than the total of all allocations.

Choose unsynchronized_pool_resource for single-threaded handler use; the request handler runs on one thread at a time. Use synchronized_pool_resource only if the arena is shared across threads. Sharing a request arena across threads is usually a design smell — if a handler dispatches work to another thread, that work either copies what it needs out, or gets its own arena.

Choice of std:: containers over PMR

PMR is the gateway to making std:: container choices that actually serve a handler well. The choice is rarely defaulted well by reflex, especially for std::map and std::unordered_map which are reached for too often.

std::pmr::vector<T> is the workhorse. Bump-allocated on push_back until the arena is exhausted; geometric reallocation moves still happen, but the freed blocks stay in the arena (no-op deallocate). Call reserve() to the expected size on construction to avoid the reallocations entirely. The reallocation cost in an arena is small but not zero, and the reallocated old buffer is dead weight in the arena until it dies.

std::pmr::string keeps the small-string optimization. Strings up to ~15 bytes (libstdc++) or ~22 bytes (libc++) live in the string’s own bytes and bypass the resource entirely. Only longer strings hit the arena. For handlers that copy many short identifiers — user IDs, type tags, short keys — SBO does more work than the arena.

std::pmr::unordered_map<K, V> is where the destruction-asymmetry payoff is largest. Each node allocates from the arena; lookup and insert costs are unchanged from the default-allocator version. Destruction of a 100,000-entry map collapses from 100,000 destructor calls with 100,000 deallocations (default unordered_map) to 100,000 destructor calls with 100,000 no-op deallocations plus one buffer release (PMR). For trivially destructible value types, even the destructor calls compile out and teardown becomes a single bump-pointer reset.

std::flat_map<K, V> (C++23) is the alternative for small N. Backed by a single contiguous vector of key/value pairs, it has much better cache behaviour than unordered_map for N < ~64. Insertions cost O(N) because the backing vector is kept sorted, but for handler-scoped maps that are built once and queried a few times, the cache wins dominate. Over PMR, std::pmr::flat_map is the same idea with arena backing. Reach for this when the map is small and built-then-queried; reach for unordered_map when it’s larger or has heavy churn.

std::array<T, N> and std::span<T> are zero-allocation, trivially destructible, and ideal for fixed-size scratch buffers. If the handler knows the result size at compile time or has a known upper bound, prefer these over a pmr::vector that just happens to never exceed the bound.

std::pmr::deque and std::pmr::list are rarely the right answer. deque has good amortized push-back/push-front but allocates chunks rather than a single contiguous buffer; this fragments the arena. list allocates per-node and chases pointers on iteration. Both lose to pmr::vector (or pmr::flat_map) in nearly every handler workload.

A decision rule of thumb: fixed size → std::array/std::span; growing sequence → pmr::vector with reserve(); small map → pmr::flat_map (C++23); larger map → pmr::unordered_map; set semantics → pmr::flat_set or pmr::unordered_set on the same rule. The PMR-aware versions of all of these accept the resource pointer as a constructor argument.

The lifetime trap

The single most common mistake with arena allocation is letting an arena-allocated value escape the arena’s scope. The compiler does not catch this; the runtime symptom is anything from corrupted reads to crashes to subtly wrong answers.

A small counterexample:

// Anti-pattern: arena-allocated string captured into a longer-lived structure.

struct GlobalCache {
    std::unordered_map<std::string_view, int> entries;  // process-scoped
    std::mutex mtx;

    void remember(std::string_view key, int value) {
        std::lock_guard lock{mtx};
        entries.emplace(key, value);
        // key now points into arena memory that's about to die
    }
};

GlobalCache g_cache;

grpc::ServerUnaryReactor* MyService::Cache(
    grpc::CallbackServerContext* ctx,
    const cache::Request* req,
    cache::Response* resp) {

    RequestArena arena;
    auto* mr = arena.resource();

    std::pmr::string normalized{req->key(), mr};
    normalize_in_place(normalized);

    // BUG: storing a string_view into arena-allocated memory globally.
    g_cache.remember(normalized, req->value());

    auto* r = ctx->DefaultReactor();
    r->Finish(grpc::Status::OK);
    return r;
    // arena destructs here. g_cache.entries now contains a dangling view.
    // Next reader of g_cache reads freed memory. Undefined behaviour.
}

The compiler is happy. AddressSanitizer catches this immediately if the memory has been overwritten; without ASan, the bug manifests as inconsistent cache reads under load. The mental rule: anything stored in process-scoped state must own its memory or have a documented lifetime at least as long as the process-scoped state. An arena-allocated string is owned by the arena; storing a view into it past the arena’s death is a bug.

The fix is either to copy out — allocate a regular std::string into the cache — or to design the cache around arena ownership (rarely worth it for cross-request state). Doc 07 covers the externalized-cache pattern that sidesteps the problem entirely.

Other pitfalls

A handful of other PMR gotchas worth knowing.

Individual deallocate is a no-op on monotonic_buffer_resource. A std::pmr::vector that grows from size 1 to size 1024 reallocates its backing array ten times in geometric progression. All nine prior buffers remain in the arena until the arena dies. reserve() avoids this; without it, the arena grows faster than intuition suggests.

Falling off the inline buffer into the upstream resource negates the heap-free property. A handler whose typical case fits in 64 KB but whose occasional case spikes to 1 MB will hit the upstream — for the bottom-only monotonic pattern, that’s the global allocator. Size the inline buffer to comfortably cover the typical case, accept the occasional fallthrough, and instrument to detect when the fallthrough rate climbs.

Arena release() is callable on a monotonic_buffer_resource to reclaim all buffers without destroying the resource. This lets a single resource be reused across iterations of a loop — useful for batch processing but rarely for one-shot RPC handlers. If you find yourself reaching for release() mid-handler, the design probably wants two arenas.

std::pmr::polymorphic_allocator carries the resource pointer by raw pointer. The allocator does not own the resource. A copy of a PMR container shares the resource with the original — this is correct for the per-request pattern but surprising the first time. The implication is that copy-constructing a PMR vector copies elements into the source’s arena, not a new one. Pass the destination arena explicitly when copying across arenas.

gRPC integration: per-RPC allocators

gRPC’s callback API includes a hook for per-method memory allocation. The generated code exposes SetMessageAllocatorFor_<Method> and SetContextAllocator on the server. A user-supplied allocator can satisfy the request, response, and CallbackServerContext allocations from a chosen resource.

This is the direct hook for per-RPC PMR. Implementing it well requires a per-method allocator that constructs a fresh arena per call and tears it down when the call finishes; the gRPC proposal L67-cpp-callback-api.md covers the interface in detail. Doc 10 walks through wiring a per-RPC arena into a complete service.

The lighter-touch approach — and the one this document leans on — is to leave the protobuf request/response objects on the default allocator and use the request arena only for handler-internal scratch. The overhead of letting protobuf allocate via the global heap is small compared to the wins from arena-allocating intermediate computations. Reach for the per-RPC allocator hook only when profiling shows protobuf allocation as a measurable fraction of handler time.

C++23: std::pmr::stacktrace and std::pmr::flat_map

C++23 added two PMR-related types worth noting.

std::pmr::stacktrace is an alias for std::basic_stacktrace<std::pmr::polymorphic_allocator<std::stacktrace_entry>>. Captured in a handler with the request arena as the resource, a stacktrace at the point of an error is arena-allocated — no extra cleanup, no extra heap traffic. For services that capture stacks for telemetry (sending them to Tempo or attaching as span attributes), this keeps the diagnostic path arena-local:

void log_error(const RequestContext& rc, std::string_view what) {
    auto trace = std::pmr::stacktrace::current(rc.arena());
    // Format and attach to span. The trace dies with the arena.
}

std::flat_map, std::flat_set, std::flat_multimap, and std::flat_multiset bring sorted-contiguous-storage container types into the standard. The PMR variants under std::pmr::* accept a resource pointer the same way pmr::vector does. For small handler-scoped maps these are usually the right default in C++23 code.

GCC 14 and Clang 18 with libc++ support both; Doc 11 covers the build-tooling specifics for enabling them under Conan and CMake.

A note on benchmarks

The cppreference example for monotonic_buffer_resource shows about a 2–3× speedup on a 200,000-element std::list<int> build compared to default std::list<int>. That’s a real number but it’s specific to that workload — high-churn small-object allocation in a single-threaded build phase. Real handlers see anywhere from no improvement (when allocation isn’t the bottleneck) to 5–10× on P99 (when default-allocator lock contention is the dominant tail-latency source).

The honest answer is: measure. PMR’s biggest practical win is usually the predictability — tail-latency variance shrinks dramatically when per-request heap traffic is constrained to a bounded arena. Mean throughput improvements are smaller and more workload-dependent. The right reason to adopt PMR per-request is the architectural one (request-scope memory bound to request-scope lifetime), with the performance win as confirmation.

Recommendation summary

Bundle a request arena into the RequestContext from Doc 02. Pass its resource pointer through the call graph to any handler-scoped container.

Use the layered monotonic + unsynchronized_pool pattern unless the handler is purely additive, in which case the bottom monotonic alone is fine. Size the inline buffer for the typical case.

Reach for std::pmr::vector with reserve() for growing sequences. Use std::pmr::flat_map (C++23) for small maps, std::pmr::unordered_map for larger ones, std::pmr::flat_set/pmr::unordered_set similarly for sets. Avoid std::pmr::deque and std::pmr::list in handlers.

Make the arena/external boundary explicit. Anything stored in process-scoped state (Doc 04), in a protobuf response, or captured into an async continuation must not hold a view into arena memory. Copy out, or use ownership types from the start.

Profile before reaching for the gRPC per-RPC allocator hook. Handler-internal PMR is the high-leverage refactor; protobuf-level PMR is finer-grained tuning that’s rarely the bottleneck.

Adopt PMR for the architectural shape — request-scope memory tied to request-scope lifetime — and treat the performance numbers as confirmation rather than the primary motivation.

Cross-references

Doc 02 establishes the RAII discipline that the arena participates in, and the performance angle on construction and destruction that PMR concretely realizes.

Doc 04 covers process-scoped memory — the PMR upstream resource, prepared-statement caches, connection pools — and how it interacts with OS container memory limits.

Doc 05 covers threading; the unsynchronized_pool_resource choice above assumes single-threaded handler use. Coroutines that hop threads across co_await need to be careful about which thread’s arena they’re operating on.

Doc 07 covers state externalization, including the cache fix for the lifetime-trap counterexample above.

Doc 10 (gRPC microservices) shows the per-RPC allocator hook wired into a complete service, including the SetMessageAllocatorFor_<Method> plumbing.

Doc 11 (build tooling appendix) covers the GCC/Clang versions and standard-library flags needed for C++23 PMR types like std::pmr::stacktrace and std::pmr::flat_map.

Annotated bibliography

“C++ High Performance” (2nd edition). The memory chapter explicitly covers PMR and is the closest the book gets to direct guidance on the patterns in this document. Worth re-reading before applying PMR to a new codebase. The chapter on cache-friendly data structures is the entry point for understanding why flat_map beats unordered_map on small N.

Iglberger, C++ Software Design. The chapter on the Strategy pattern and the chapter on value semantics frame the PMR design choice well: polymorphic_allocator is the type-erased strategy for “how do I allocate,” and the resource pointer is the concrete strategy at runtime. The dependency-injection chapter applies to how arenas get into the call graph.

Enberg, Latency. The argument that tail latency is dominated by predictability of allocation cost (rather than mean allocation cost) is core to why PMR pays off in services. The book frames the wins in latency terms, which matches the practical experience of adopting PMR.

“Building Low Latency Applications with C++”. The allocator chapter covers PMR alongside other patterns; useful background for the layered-resource discussion. The book leans more on custom allocators than PMR specifically, so read it for the constraints rather than the conclusions.

Yonts, 100 C++ Mistakes and How to Avoid Them. The cluster of mistakes around allocator type erasure, container template parameters, and lifetime confusion is directly relevant — particularly the entries on dangling references and on container/allocator mismatches.

cppreference (en.cppreference.com/w/cpp/memory/monotonic_buffer_resource). The canonical reference for the type’s semantics and the canonical benchmark example. Worth reading once end-to-end before designing an arena.