Hands-on tutorial

Modern C++ performance, in containers.

Fifteen tutorial sections, seven runnable Podman demos, an instrumented Grafana stack, and diagrams that explain where C++20/23 performance work actually lives once your code is shipping in an OCI image.

Fedora 44 Podman 5.x rootless GCC 14 / Clang 18 CMake · Conan 2 · Ninja io_uring · gRPC · OTel cgroups v2 · NUMA
15 Tutorial sections
7 Runnable Podman demos
1.5–3h PPTX talk-time range
4 Reference books cited

Tutorial

Read in order the first time. Each section ends with a prev/next pager. Per-section duration is reading time for the site; the PPTX deck has its own talk-time pacing — see the plan.

Map

00 Outline & reading order

How this tutorial is organised, what each section covers, the 3-hour presentation budget, and what's deliberately out of scope.

⏱ 10 minutes
Setup

01 Prerequisites

Fedora 44, Podman 5.x rootless, the C++ toolchain (GCC 14 / Clang 18, Conan 2, CMake, Ninja), supporting tools (hey, jq, libabigail, bpftrace), and the host-check script that confirms everything is wired correctly before you touch the demos.

⏱ 15 minutes (+ install)
Concepts

02 Introduction & Mental Model

Why container constraints change C++ performance reasoning, the four-layer model the rest of the tutorial hangs off, and the cross-cutting concepts (LTO, PGO, PIE/ASLR, threading models) every later section references.

⏱ 20 minutes
Build

03 RAII & Container Resource Discipline

Deterministic cleanup is a vibe on a fat host and a survival skill in a 256MB cgroup.

⏱ 10 minutes
Build

04 Container Strategy: UBI, ubi-micro, multi-stage

How a multi-stage Containerfile drops the same C++ service from 689 MB to 26.4 MB without sacrificing the toolchain you needed at compile time, and how to pick between UBI's runtime tiers (ubi, ubi-minimal, ubi-micro).

⏱ 10 minutes
Memory

05 Compile-Time Wins: LTO, PGO, constexpr

Three compiler-side levers that move runtime performance — link-time optimization, profile-guided optimization, and constexpr — what each costs in build time, and a worked PGO pipeline that doesn't skip the workload step.

⏱ 15 minutes
Memory

06 STL, Layout, and C++20/23 Containers

Why `boost::container::flat_map` is 2.5× faster than `std::unordered_map` and 35× faster than `std::map` on a real iterate workload, where the gap comes from, and the silent-overhead choices that betray "obvious" container picks.

⏱ 15 minutes
I/O

07 Memory Management: Allocators, Huge Pages, cgroups v2, OOM

Where allocation cost actually lives, what PMR buys you, when transparent huge pages help, why standard allocators don't return memory to the OS, and how cgroups v2 + the OOM killer change everything above them.

⏱ 10 minutes
I/O

08 I/O Latency: io_uring, Async gRPC, SO_REUSEPORT

Why direct liburing achieves 274K req/s at 181µs p99 while the same workload through sync gRPC manages 4.85K RPS at 30.92ms p99 — a 60× throughput gap from where syscalls happen. Plus the container-security gates that block io_uring by default.

⏱ 15 minutes
Observe

09 Networking & Kernel Parameters

What a veth pair actually costs, when `--network=host` is the right escape hatch, the small set of sysctls that move tail latency for C++ services, and the eBPF tooling for diagnosing network plumbing itself — bcc-tools, bpftrace, and bpftool.

⏱ 15 minutes
Isolate

10 Observability & Profiling: OTel, Grafana Stack, perf, eBPF

The single biggest performance knob in OpenTelemetry-cpp is the choice between SimpleSpanProcessor and BatchSpanProcessor — verified 8.5× throughput collapse with the wrong one. Plus the LGTM stack, perf and eBPF against containerized processes.

⏱ 15 minutes
Quality

11 Noisy Neighbor Isolation: cgroups, CPU pinning, NUMA

A noisy neighbor turns a 2 ms p99 into a 25 ms p99 with no malice and no bug. cgroup v2 `cpu.weight` recovers most of that; `cpuset.cpus` recovers all of it, then beats baseline. Real numbers from demo-05, plus the mechanism for each result.

⏱ 15 minutes
Quality

12 Static Analysis & Debugging in Containers

A static-analysis pipeline that catches bugs at build time, runtime sanitizers (ASan, UBSan, MSan, TSan) in containers, Valgrind for what sanitizers can't catch, Meta's Object Introspection for memory mysteries, and the ephemeral gdb sidecar pattern for the bugs that escape anyway.

⏱ 25 minutes
Pitfalls

13 Reproducibility & ABI: Conan, CMake Presets, Hermetic Builds, Coverage

Conan lockfiles + CMake presets + ABI labels + abidiff give you binary-identical builds across time and machines; Konflux and Cachi2 give you those builds without network access at build time; gcov/lcov and clang source-based coverage give you the test-quality signal that hermetic builds preserve across regenerations.

⏱ 25 minutes
Next

14 Pitfalls

AVX-512 mismatches that SIGILL on production, abstraction overhead invisible in the type system, container builds that take seven minutes for thirty seconds of compile, and the EPERM/EACCES rubric that tells you which security layer is denying you.

⏱ 15 minutes
Section

15 Where to Go Next

What to read next, and the broader ecosystem this tutorial only scratched.

⏱ 3 minutes
Section

16 Appendix A — Conan, autotools, and UBI 9's minimal perl

A survival guide for building autotools-based C++ deps (libcurl, c-ares, openssl, etc.) on UBI 9 via Conan, learned the hard way during demo-04.

⏱ 10 minutes