Modern C++ performance, in containers.
Fifteen tutorial sections, seven runnable Podman demos, an instrumented Grafana stack, and diagrams that explain where C++20/23 performance work actually lives once your code is shipping in an OCI image.
Tutorial
Read in order the first time. Each section ends with a prev/next pager. Per-section duration is reading time for the site; the PPTX deck has its own talk-time pacing — see the plan.
00 Outline & reading order
How this tutorial is organised, what each section covers, the 3-hour presentation budget, and what's deliberately out of scope.
01 Prerequisites
Fedora 44, Podman 5.x rootless, the C++ toolchain (GCC 14 / Clang 18, Conan 2, CMake, Ninja), supporting tools (hey, jq, libabigail, bpftrace), and the host-check script that confirms everything is wired correctly before you touch the demos.
02 Introduction & Mental Model
Why container constraints change C++ performance reasoning, the four-layer model the rest of the tutorial hangs off, and the cross-cutting concepts (LTO, PGO, PIE/ASLR, threading models) every later section references.
03 RAII & Container Resource Discipline
Deterministic cleanup is a vibe on a fat host and a survival skill in a 256MB cgroup.
04 Container Strategy: UBI, ubi-micro, multi-stage
How a multi-stage Containerfile drops the same C++ service from 689 MB to 26.4 MB without sacrificing the toolchain you needed at compile time, and how to pick between UBI's runtime tiers (ubi, ubi-minimal, ubi-micro).
05 Compile-Time Wins: LTO, PGO, constexpr
Three compiler-side levers that move runtime performance — link-time optimization, profile-guided optimization, and constexpr — what each costs in build time, and a worked PGO pipeline that doesn't skip the workload step.
06 STL, Layout, and C++20/23 Containers
Why `boost::container::flat_map` is 2.5× faster than `std::unordered_map` and 35× faster than `std::map` on a real iterate workload, where the gap comes from, and the silent-overhead choices that betray "obvious" container picks.
07 Memory Management: Allocators, Huge Pages, cgroups v2, OOM
Where allocation cost actually lives, what PMR buys you, when transparent huge pages help, why standard allocators don't return memory to the OS, and how cgroups v2 + the OOM killer change everything above them.
08 I/O Latency: io_uring, Async gRPC, SO_REUSEPORT
Why direct liburing achieves 274K req/s at 181µs p99 while the same workload through sync gRPC manages 4.85K RPS at 30.92ms p99 — a 60× throughput gap from where syscalls happen. Plus the container-security gates that block io_uring by default.
09 Networking & Kernel Parameters
What a veth pair actually costs, when `--network=host` is the right escape hatch, the small set of sysctls that move tail latency for C++ services, and the eBPF tooling for diagnosing network plumbing itself — bcc-tools, bpftrace, and bpftool.
10 Observability & Profiling: OTel, Grafana Stack, perf, eBPF
The single biggest performance knob in OpenTelemetry-cpp is the choice between SimpleSpanProcessor and BatchSpanProcessor — verified 8.5× throughput collapse with the wrong one. Plus the LGTM stack, perf and eBPF against containerized processes.
11 Noisy Neighbor Isolation: cgroups, CPU pinning, NUMA
A noisy neighbor turns a 2 ms p99 into a 25 ms p99 with no malice and no bug. cgroup v2 `cpu.weight` recovers most of that; `cpuset.cpus` recovers all of it, then beats baseline. Real numbers from demo-05, plus the mechanism for each result.
12 Static Analysis & Debugging in Containers
A static-analysis pipeline that catches bugs at build time, runtime sanitizers (ASan, UBSan, MSan, TSan) in containers, Valgrind for what sanitizers can't catch, Meta's Object Introspection for memory mysteries, and the ephemeral gdb sidecar pattern for the bugs that escape anyway.
13 Reproducibility & ABI: Conan, CMake Presets, Hermetic Builds, Coverage
Conan lockfiles + CMake presets + ABI labels + abidiff give you binary-identical builds across time and machines; Konflux and Cachi2 give you those builds without network access at build time; gcov/lcov and clang source-based coverage give you the test-quality signal that hermetic builds preserve across regenerations.
14 Pitfalls
AVX-512 mismatches that SIGILL on production, abstraction overhead invisible in the type system, container builds that take seven minutes for thirty seconds of compile, and the EPERM/EACCES rubric that tells you which security layer is denying you.
15 Where to Go Next
What to read next, and the broader ecosystem this tutorial only scratched.
16 Appendix A — Conan, autotools, and UBI 9's minimal perl
A survival guide for building autotools-based C++ deps (libcurl, c-ares, openssl, etc.) on UBI 9 via Conan, learned the hard way during demo-04.
Reference
Out-of-band documents that support the tutorial. Read the plan if something doesn't work as written — chances are it's tracked there.