Container Strategy: UBI, ubi-micro, multi-stage
How a multi-stage Containerfile drops the same C++ service from 689 MB to 26.4 MB without sacrificing the toolchain you needed at compile time, and how to pick between UBI's runtime tiers (ubi, ubi-minimal, ubi-micro).
Learning objectives
By the end of this section you can:
- Explain why a naïve single-stage build produces a 689 MB image even when the binary is 4 MB, and what each of the other 685 MB is doing in there.
- Write a multi-stage
Containerfilethat uses one base for building (full toolchain) and a different one for running (no compiler, nodnf), and explain whatCOPY --from=builddoes at the layer level. - Choose between
ubi,ubi-minimal, andubi-microfor a runtime base — and predict what you give up at each tier (no shell, noldd, nostraceto attach). - Order
COPYandRUNlines so dependency installs survive a source-only change and the build cache stays useful. - Write
LABELlines that tell future-you (and future CVE-triagers) what libc, libstdc++, micro-architecture, and PGO state the binary inside was built against.
Diagram
The 689 MB problem
A first-pass Containerfile for a C++ service usually looks like this:
FROM registry.access.redhat.com/ubi9:latest
RUN dnf install -y gcc-toolset-14 cmake ninja-build python3-pip && \
pip3 install conan
COPY . /src
WORKDIR /src
RUN conan install . --build=missing && \
cmake --preset conan-release && \
cmake --build --preset conan-release
ENTRYPOINT ["/src/build/Release/myservice"]
It works. It also produces, in demo-01, an image of 689 MB, of which the binary itself is roughly 4 MB. The remaining 685 MB is doing one of four things, none of which need to be present at runtime:
| What | How much | Why it’s in there |
|---|---|---|
gcc-toolset-14 (compiler, headers, linker) |
~400 MB | needed to compile, not to run |
| Conan cache (sources + variants + binary cache) | ~200 MB | needed to resolve dependencies, not to use them |
CMake configure cache, .o files, .a archives |
~80 MB | intermediates from the build step |
Source tree (/src) |
~5 MB | needed to be compiled, not to be executed |
The container has to pull all 685 MB from the registry every time the image is pulled. The container has to surface all 685 MB in its attack surface — every dnf-installed package, every Conan- cached transitive dependency, every CVE-scannable component travels with the binary forever. The container has to walk all 685 MB of overlay layers on cold start. The actual program is 0.6% of the image.
The fix isn’t to shrink the toolchain. The fix is to leave the toolchain behind.
Multi-stage builds — the mechanism
A multi-stage Containerfile declares two (or more) FROM
lines. Each FROM starts a new image; you give it an alias with
AS; only the final stage is what podman build tags. The
intermediate stages exist long enough to compile, then they’re
discarded:
# Stage 1 — heavy: compile here
FROM registry.access.redhat.com/ubi9:latest AS build
RUN dnf install -y gcc-toolset-14 cmake ninja-build python3-pip && \
pip3 install conan
COPY . /src
WORKDIR /src
RUN conan install . --build=missing && \
cmake --preset conan-release && \
cmake --build --preset conan-release && \
cp build/Release/myservice /usr/local/bin/myservice
# Stage 2 — lean: only the binary travels
FROM registry.access.redhat.com/ubi9-micro:latest
COPY --from=build /usr/local/bin/myservice /usr/local/bin/myservice
ENTRYPOINT ["/usr/local/bin/myservice"]
The COPY --from=build line is the entire trick. It says: “from
the image we just finished building under the alias build,
copy this one file into our new image.” None of stage 1’s other
contents — not the compiler, not the Conan cache, not the
intermediate .o files, not the source tree — make it into the
final image. They die with stage 1.
Demo-01’s Containerfile.ubi-multistage produces a 114 MB
image with this pattern (UBI 9 base + your binary + dynamic
libs). Switching the runtime base from ubi9 to ubi9-micro
(see the next section) gets it to 26.4 MB.
The build time is roughly the same as the single-stage version — you’re still compiling the same code. What changes is what ships.
Choosing your runtime base
Red Hat’s UBI (“Universal Base Image”) ships in three runtime tiers, with very different trade-offs:
| Tier | Size | Shell | Package manager | When to pick it |
|---|---|---|---|---|
ubi9 |
~210 MB | bash |
dnf |
development bases, debug images |
ubi9-minimal |
~100 MB | bash |
microdnf |
most production C++ services |
ubi9-micro |
~14 MB | none | none | static-ish C++ services, security-sensitive |
ubi9 is what you build with. It has the full
gcc-toolset-14, dnf, every utility your build script might
need. You almost never want this as your runtime base.
ubi9-minimal is the comfortable default for C++ runtime. You
get a working bash (so podman exec -it ... bash works for
diagnosis), a working microdnf (so you can install missing
runtime libraries if you must), and a consistent glibc /
libstdc++ from the UBI release stream (so security updates
flow). The trade-off is ~85 MB extra over ubi9-micro for
that comfort.
ubi9-micro is what you ship when you’ve measured everything.
Roughly 14 MB on disk before you copy your binary in. No shell.
No dnf. No ldd, no strace, no bash — you can’t even
podman exec -it ... bash into it because there’s no bash to
exec. Your binary plus its dynamic libraries plus a working
glibc is what’s there. If your binary needs libssl or
libstdc++ you have to either statically link them or
explicitly COPY them from the build stage.
The decision is about your incident-response posture, not
about the binary. If the on-call playbook for “service is
misbehaving” starts with podman exec -it ... bash, ship
ubi9-minimal. If the playbook is “start an ephemeral
debug sidecar with --pid=container:main
that has gdb and the debug tools”, ship ubi9-micro. The
sidecar approach is what production deployments converge to and
it’s what §12 walks through.
Distroless (Google) and Wolfi (Chainguard) are out of scope for
this tutorial but exist in the same space as ubi9-micro: very
small bases that assume you’ve moved diagnosis into a separate
sidecar.
Layer caching — order COPY and RUN deliberately
podman build caches each Containerfile instruction. A cached
instruction is reused if its inputs (the previous layer plus the
files this instruction touches) haven’t changed. The
implication: put the things that change rarely above the
things that change often.
A bad ordering forces a full rebuild on every source change:
# BAD: source change invalidates conan install
FROM ubi9:latest AS build
COPY . /src # ← changes every commit
WORKDIR /src
RUN dnf install -y gcc-toolset-14 ... # ← re-runs every commit
RUN pip3 install conan # ← re-runs every commit
RUN conan install . --build=missing # ← re-runs every commit
RUN cmake --preset conan-release && cmake --build --preset conan-release
A good ordering keeps the expensive dependency steps cached:
# GOOD: source change only invalidates the cmake build
FROM ubi9:latest AS build
RUN dnf install -y gcc-toolset-14 cmake ninja-build python3-pip
RUN pip3 install conan
COPY conanfile.txt conan.lock /src/ # ← changes when deps change
WORKDIR /src
RUN conan install . --lockfile=conan.lock --build=missing
COPY CMakeLists.txt CMakePresets.json /src/ # ← changes when build config changes
COPY src/ /src/src/ # ← changes every commit
COPY include/ /src/include/
RUN cmake --preset conan-release && cmake --build --preset conan-release
The second form rebuilds in ~5 seconds when only a .cpp file
changed. The first rebuilds in ~3-5 minutes every time. On a
busy team this is the difference between fast feedback and the
team turning the build cache off because it “doesn’t help”.
The Conan lockfile reference ties into §13’s reproducibility
story — pinning the lockfile is what
makes the cached conan install layer trustworthy across CI runs.
ABI labels — tell future-you what’s inside
A small ubi9-micro image is opaque. You can’t dnf list
installed. You can’t ldd your binary. Six months from now,
when a CVE drops against libstdc++ in some version range,
nobody can easily tell whether your shipped image is affected.
The fix is to write the answer into the image metadata at build time:
FROM ubi9-micro:latest
COPY --from=build /usr/local/bin/myservice /usr/local/bin/myservice
COPY --from=build /usr/lib64/libstdc++.so.6 /usr/lib64/
LABEL org.opencontainers.image.title="myservice"
LABEL org.opencontainers.image.version="1.4.2"
LABEL org.opencontainers.image.revision="a3f29b1"
LABEL ai.cpp-tutorial.libc="glibc-2.34-100.el9_4"
LABEL ai.cpp-tutorial.libstdcxx="libstdc++.so.6.0.32"
LABEL ai.cpp-tutorial.march="x86-64-v3"
LABEL ai.cpp-tutorial.pgo="enabled"
LABEL ai.cpp-tutorial.lto="thin"
LABEL ai.cpp-tutorial.sanitizers="none"
ENTRYPOINT ["/usr/local/bin/myservice"]
podman inspect myservice:1.4.2 | jq '.[0].Labels' reads them
all back at incident time. The org.opencontainers.image.*
labels are the standard set; the ai.cpp-tutorial.* labels are
ours and you should adapt the prefix to your org. The point is
that a 26 MB image is too small to hold “the toolchain that
built me” — so write the toolchain identity in as metadata
instead.
The glibc-mismatch story
ubi9-micro is glibc-2.34 in current releases. If you build
your binary against a different glibc — say, by using a
fedora:41 build base instead of ubi9 — your binary may
reference symbols (memcpy@GLIBC_2.39, etc.) that don’t exist
in the runtime image. The container will start, the dynamic
linker will fail, and you’ll see this:
./myservice: /lib64/libc.so.6: version `GLIBC_2.39' not found
This isn’t a ubi9-micro problem — it’s a build/runtime base
mismatch problem. Use the same UBI release for the build
stage and the runtime stage. Demo-01 ships a deliberately-
broken ubi-micro-glibc-mismatch variant (a 25.2 MB image built
against newer glibc) so you can see this exact failure
firsthand. The general “build host vs runtime host CPU”
version of this problem — AVX-512 in the binary, an older CPU
in production — is what §14 covers; the
toolchain mismatch story is its sibling.
Production diagnostic — what’s actually inside this image?
When triage starts with “what version of libstdc++ is in
that image?” — here’s the recipe:
# 1. inspect labels you wrote at build time
podman inspect myservice:1.4.2 | jq '.[0].Config.Labels'
# 2. enumerate the layers (sizes tell you where bloat lives)
podman history myservice:1.4.2
# 3. list files in the image without running it
podman create --name tmp myservice:1.4.2
podman export tmp | tar -tvf - | sort -k3 -n | tail -50
podman rm tmp
# 4. for ubi-minimal: dynamic library check
podman run --rm --entrypoint=/usr/bin/ldd myservice:1.4.2 \
/usr/local/bin/myservice
# 5. for ubi-micro: do the ldd in a sidecar
podman run --rm \
--pid=container:myservice-prod \
--entrypoint=ldd ubi9:latest /proc/1/root/usr/local/bin/myservice
Steps 1-2 work on any image and are usually enough. Step 4 only
works if you have a shell + ldd in the runtime base (so
ubi9-minimal yes, ubi9-micro no). Step 5 is the
debug-sidecar pattern in miniature:
join the prod container’s PID namespace so you can see its
filesystem at /proc/1/root, and run ldd from a heavier image
that has the tool.
Why this is a C++ concern
JVMs and Go runtimes carry their runtime in the binary. A
Go binary statically linked is the entire dependency tree; an
Uberjar contains the JVM’s class libraries. C++ doesn’t work
that way: a typical C++ binary depends on a specific
libstdc++.so.6, a specific libc.so.6, often a specific
libssl.so.3 and libcrypto.so.3. These dependencies are
implicit until they aren’t. The image strategy you choose
is, at heart, a decision about which dynamic libraries
travel with the binary and which ones are assumed to be at
the runtime end.
The micro-architecture decision is the same shape. C++ binaries
are not portable across CPU feature sets: a binary built with
-march=native on a build host with AVX-512 will SIGILL on a
runtime host without it. The labels above (ai.cpp-tutorial.march)
make that decision visible. §14 walks through the AVX-512
pitfall in detail — it’s where this section’s
“build base vs runtime base” theme meets “build host CPU vs
runtime host CPU”.
Demo
examples/demo-01-image-strategy/
builds the same C++ service three ways and prints the
verified sizes:
| Variant | Size | What’s gone |
|---|---|---|
single-stage-naive |
689 MB | (baseline) |
ubi-multistage |
114 MB | toolchain, Conan cache, intermediates, source |
ubi-micro |
26.4 MB | + bash, dnf, every utility |
ubi-micro-glibc-mismatch |
25.2 MB | broken — dies on GLIBC_2.39 not found |
Run ./demo.sh to build and tag all four; ./demo.sh inspect
walks through the diagnostic recipe above on each variant.
For deeper coverage
- Andrist & Sehr, C++ High Performance, ch. 3 (build pipeline)
- Iglberger, C++ Software Design, ch. 1 (the cost of decisions made early)
- Red Hat, “Universal Base Images” (UBI) documentation
- OpenContainers, image-spec annotations
What’s next
§5 turns the next knob over: now that you’ve decided what runtime base ships, decide what compiler output goes into it. LTO and PGO are the two big build-time levers; both produce smaller, faster binaries; both add non-trivial build time. The PGO pipeline has a specific workload-collection step that’s where most attempts go wrong.