← Site Taming the JVM

60-Minute Deep Dive

TAMING THE JVM

Optimizing Java Workloads
on OpenShift & Kubernetes

Quarkus 3.33.1 LTS Java 21 G1GC / ZGC / Shenandoah AppCDS Virtual Threads

Based on: Optimizing Cloud Native Java  |  SRE with Java Microservices  |  Quarkus 3.33.1 LTS

github.com/patterncatalyst/quarkus-optimization

Agenda

01

Container-Native JVM Fundamentals

02

Right-Sizing Java Workloads

03

Garbage Collection Optimization

04

Startup Time Reduction (AppCDS)

05

Observability & Instrumentation

06

Autoscaling Integration

07

Systematic Tuning & Cost ROI

Bonus

Leyden · gRPC · Latency · Panama · Valhalla

Why Java + Kubernetes = Complexity

60%of Java apps
overprovision memory
4–8stypical JVM cold
start on Kubernetes
2–3×infrastructure waste
from poor bin-packing
$$$unnecessary cloud
spend each month
Default JVM reads /proc/meminfo and sees the NODE's full RAM — claims 64 GB heap inside a 512 MB container → OOMKill

Container-Native JVM Fundamentals

❌ Before

# Hardcoded — breaks with resize / VPA
-Xms512m -Xmx2048m

JVM reads /proc/meminfo → host RAM
Claims 64GB inside 512MB container

✅ Java 21

-XX:MaxRAMPercentage=75.0
-XX:InitialRAMPercentage=50.0
-XX:MinRAMPercentage=25.0
-XX:NativeMemoryTracking=summary
UseContainerSupport is ON by default in Java 21. Reads cgroup limits correctly. cgroup v2 (RHEL 9 / OCP 4.14+): reads /sys/fs/cgroup/memory.max

JVM Memory Regions — Six Buckets, Not One

RegionTypical SizeControlled By
Heap (Old + Young Gen)50–75%MaxRAMPercentage
Metaspace50–200 MB-XX:MaxMetaspaceSize=256m
Platform Thread Stacks1 MB/thread-Xss or Virtual Threads
Native Memory (JIT, GC)100–300 MB
Direct ByteBuffersVariesNetty / NIO config
GC Bookkeeping50–100 MB
Java 21 Virtual Threads: stacks live in heap as tiny continuations — eliminates 1MB/thread platform thread stack budget for I/O-bound workloads

Right-Sizing Java Workloads

requests — Scheduling Guarantee

  • Scheduler uses this to find a node
  • Set to P50 steady-state RSS
  • Too high → pods can't schedule
  • Too low → CPU throttle on full node

limits — Hard Ceiling

  • Memory exceeded → OOMKill (exit 137)
  • CPU exceeded → throttled (not killed)
  • Set memory limit 25-30% above P99 RSS
  • Set CPU limit 2-4× request to absorb GC spikes
Demo 07: 7-workload analysis · 4 nodes → 2 nodes · +67% pod density · $6,720/month saving · 17× ROI

GC in Containers: Four Challenges

CPU Throttling Extends GC

CPU limits throttle GC threads mid-pause. 100ms G1GC → 400ms under throttle. Set limit ≥ 2× request.

ParallelGCThreads Default

JVM defaults to host CPU count. 64-core node + 4 CPU limit = 64 threads competing for 4 CPUs.

GC-Induced HPA Thrash

GC pause → CPU spike → HPA fires → new pods GC → repeat. Scale on RPS, not CPU.

Heap Sizing vs GC Pressure

Small heap = frequent GC. Too large = infrequent but long GC. Start at MaxRAMPercentage=75.

GC Selection Guide

CollectorPauseBest ForKey Flags
G1GC 50–300ms General purpose, Temurin/Corretto default -XX:+UseG1GC -XX:MaxGCPauseMillis=200
Shenandoah 1–20ms UBI9 default — Red Hat images ship this -XX:+UseShenandoahGC
ZGC (Gen) <1ms Low-latency APIs, any heap size, HPA stability -XX:+UseZGC -XX:+ZGenerational
Serial GC STW CLI tools, batch, <256MB heap only -XX:+UseSerialGC
Note: UBI9 ships Shenandoah. Demos 02 and 06 override to -XX:+UseG1GC / -XX:+UseZGC for clean comparison.

Startup Time Reduction

Spring Boot vs Quarkus Baseline

Spring Boot 4.0.5~4–8s
Quarkus 3.33.1 JVM~0.3–0.8s
Quarkus + AppCDS (JDK 21)~0.15–0.4s
Quarkus + Leyden (JDK 25)~148ms (Demo 04)

One Property

# application.properties
quarkus.package.jar.aot.enabled=true
# Build + train
mvn verify   # (not package)
# → runs @QuarkusIntegrationTest
# → writes target/quarkus-app/app.aot

Virtual Threads — @RunOnVirtualThread

@Path("/allocate")
@ApplicationScoped
public class GcResource {
    @GET
    @RunOnVirtualThread   // ← One annotation. Done.
    public AllocResponse allocate(
            @QueryParam("mb") int mb) {
        return doHeavyWork(mb);
    }
}

Container sizing impact

  • Platform thread stacks: 1MB each
  • 200 threads = 200MB off-heap
  • Virtual thread stacks → in heap
  • 10,000 concurrent I/O tasks, same memory
resources:
  requests:
    memory: "256Mi"  # Was 512Mi
  limits:
    memory: "512Mi"

Observability — You Can't Tune What You Can't See

JFR (JDK Flight Recorder)

Built-in, <1% overhead. GC events, allocations, IO. jcmd pid JFR.start

Cryostat (OpenShift)

OpenShift-native JFR management via Kubernetes operator. Auto-discover pods via annotation.

OTel → Grafana LGTM

quarkus-micrometer-opentelemetry — single extension, all telemetry via OTLP

Essential Metrics

jvm_gc_pause_seconds P99 >500ms → switch GC
jvm_memory_used_bytes heap + off-heap

Required: quarkus.micrometer.distribution.percentiles-histogram.jvm.gc.pause=true — without this, Grafana GC panels show no data

Autoscaling — HPA with JVM-Aware Metrics

spec:
  minReplicas: 2     # NEVER 1 — single pod + GC STW = 100% downtime
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 120  # Absorb GC CPU spikes up to 2min
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
  metrics:
  - type: External
    external:
      metric: { name: http_requests_per_second }
      target: { type: AverageValue, averageValue: "50" }
  - type: External
    external:
      metric: { name: jvm_memory_used_ratio }
      target: { type: AverageValue, averageValue: "0.80" }

Systematic Tuning Workflow

1
Instrument
2
Baseline
3
Diagnose
4
Tune
5
Validate
40-60%Memory reduction
after right-sizing
2-3×Pod density
per node
55%Startup reduction
with AppCDS
$$$Node savings
from bin-packing

DEMO 01

Container-Aware
Heap Sizing

  • Run WITHOUT UseContainerSupport → JVM claims host RAM
  • Run WITH UseContainerSupport + MaxRAMPercentage=75 → respects 512MB
  • Live jcmd output showing heap sizes before and after
  • OOMKill simulation when JVM ignores container limits
cd demo-01-heap-sizing
./demo.sh

DEMO 02

GC Monitoring
with Prometheus

  • Quarkus 3.33.1 + quarkus-micrometer-opentelemetry + Grafana LGTM
  • Live GC pause histograms at /q/metrics
  • Generate GC pressure — watch metrics AND traces simultaneously
  • G1GC vs Generational ZGC side-by-side pause comparison
  • Virtual threads: 500 concurrent tasks, minimal platform thread count
cd quarkus-demo-02-gc-monitoring
./demo.sh   # starts podman-compose stack

DEMO 03

AppCDS Startup
Acceleration

  • Quarkus baseline: ~0.3-0.8s (already 10× faster than Spring Boot)
  • quarkus.package.jar.aot.enabled=true — one property
  • Maven plugin handles training on @QuarkusIntegrationTest suite
  • Quarkus + AOT Cache: ~0.15-0.4s (30-50% additional gain)
  • Progression: AppCDS (JDK 21) → Leyden -XX:AOTCache (JDK 25)
cd quarkus-demo-03-appcds
./demo.sh

Key Takeaways

  1. Always enable UseContainerSupport + MaxRAMPercentage — hardcoded -Xmx is a container anti-pattern
  2. Right-size first, then tune — measure RSS + off-heap before setting requests/limits
  3. Match GC to workload — G1GC general, ZGC/Shenandoah for latency-sensitive APIs
  4. Quarkus AppCDS: one propertyquarkus.package.jar.aot.enabled=true. Already 5-10× faster than Spring Boot
  5. Observe before you tune — JFR + Cryostat + Prometheus validates every change
  6. Autoscale on RPS not CPU — GC pauses lie to HPA. Use @RunOnVirtualThread
  7. Quantify savings — track cost per namespace to show business value from engineering work

Resources & Q&A

📗 Optimizing Cloud Native Java — Benjamin Evans et al. · O'Reilly

📗 SRE with Java Microservices — Jonathan Schneider · O'Reilly

🔗 Demo Repo: github.com/patterncatalyst/quarkus-optimization

Grafana JVM: Dashboard 4701
JVM Tuning: Red Hat JVM Guide

Questions?

Project Leyden — JVM AOT Cache

ReleaseJEPWhat it addsGain
JDK 24483AOT class loading & linking~40% startup
JDK 25 LTS514+515Ergonomics + JIT method profiles~75% startup (Demo 04: 609→148ms)
JDK 26516ZGC support — no longer have to choose+ZGC compat
FuturePre-compiled native code in cacheInstant peak perf
Quarkus 3.33.1: quarkus.package.jar.aot.enabled=true — one property, all JDK versions, cache automatically richer on each upgrade

DEMO 04 · JDK 25 LTS

Quarkus + Project Leyden
AOT Cache

  • One property: quarkus.package.jar.aot.enabled=true
  • Build trains on @QuarkusIntegrationTest suite
  • Output: app.aot alongside quarkus-run.jar
  • 609ms → 148ms startup (−75%)
cd quarkus-demo-04-leyden
./demo.sh   # JDK 25 required (in container)

REST vs gRPC — Inside the Cluster

REST / JSON

  • HTTP/1.1 (or 2)
  • JSON text (~400 bytes)
  • New connection per request
  • SSE / WebSocket only for streaming
  • curl friendly ✅
  • Browser native ✅

gRPC / Protobuf

  • HTTP/2 always
  • Binary Protobuf (~40 bytes)
  • Multiplexed, persistent
  • Built-in streaming (4 modes) ✅
  • Needs grpcurl / Postman
  • Generated stubs
⚠️ Localhost caveat: gRPC unary is SLOWER than REST on localhost — network cost is zero. gRPC wins streaming and high concurrency (c=500) even locally.

Why the JVM Breaks Latency SLAs

G1GC — Default

  • Young GC pause: 10–200ms
  • Mixed GC pause: 50–500ms
  • Full GC (worst): 1–10s
  • Pauses SCALE with heap size
  • CPU spike → HPA false scale-out

ZGC Generational — JDK 21+

  • All pauses: <1ms
  • Scales with thread count, not heap
  • Load barrier overhead: ~5-15%
  • Smooth CPU profile → no HPA thrash
-XX:+UseZGC
-XX:+ZGenerational

Cost Impact Analysis & Business Case

$80,640

annual saving · 2 nodes × $0.384/hr × 8,760 hrs · this cluster alone

💰 Direct savings

2 nodes eliminated · $1,120 → $560/month

⏱ Engineering cost

~4 hours · rolling restarts · 17× ROI

📈 Indirect benefits

HPA stability · VPA trustworthy · correct thresholds

🏢 At scale

10 clusters = $67,200/year · OpenShift Cost Management

Project Panama — The End of JNI

JNI (1996)

  • Write Java + C header + C wrapper
  • Compile C per platform/arch
  • Manual native memory — leaks kill JVM
  • JNI crash = no Java stack trace
  • sun.misc.Unsafe: private API, breaks each JDK

Panama FFM (JDK 22 — finalized)

try (Arena arena = Arena.ofConfined()) {
  MemorySegment data =
    arena.allocateFrom(JAVA_DOUBLE, arr);
  int result = (int) methodHandle
    .invoke(data, arr.length, outP99);
} // freed here — zero leaks possible

Project Valhalla — Closing the 30-Year Gap

Today — Value Class

// Heap object — pointer in array
// 8-byte header per element
// GC-tracked — every allocation
record Point(double x, double y) {}

Valhalla (preview JDK 25+)

// Stored inline — no header
// x0,y0,x1,y1 densely packed
// GC never sees it
value class Point {
  double x; double y;
}

📉 Memory

List<double>: 1× vs List<Double>: 3×. Pod requests cut up to 50%.

♻️ GC Pressure

Value types: zero heap allocation, zero GC tracking. HPA stays quiet.

⚡ Cache Performance

Sequential memory access. L1/L2 cache-friendly. SIMD-friendly layout.

📅 Timeline

Preview JDK 25+. Universal generics (List<int>) after primitive classes. Stable ~JDK 27-29.

Common JVM Anti-Patterns on Kubernetes

🧠 Memory

  • ❌ Hardcoded -Xmx/-Xms
  • ❌ MaxRAMPercentage=90 — starves off-heap
  • ❌ No -XX:MaxMetaspaceSize

⚙️ GC & CPU

  • ❌ Default ParallelGCThreads on large node
  • ❌ CPU-based HPA with Java workloads
  • ❌ minReplicas: 1 in HPA
  • ❌ No stabilizationWindowSeconds

🚀 AOT / Startup

  • ❌ @QuarkusTest for AOT training
  • ❌ Manual -XX:AOTCache in Dockerfile
  • ❌ mvn package instead of mvn verify
  • ❌ Ignoring JDK version on rebuild

👁 Observability

  • ❌ No GC pause histogram
  • ❌ Separate prometheus + otel extensions
  • ❌ No PrometheusRule on jvm_gc_pause
  • ❌ Tuning JVM flags without baseline

Anti-Pattern Remediation

✅ Memory Fixes

  • -XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0
  • → Use 75%, not 90% — reserve 25% for off-heap
  • → Add -XX:MaxMetaspaceSize=256m

✅ GC & CPU Fixes

  • -XX:ParallelGCThreads=N (= CPU request)
  • → HPA on RPS, not CPU (KEDA or Prometheus Adapter)
  • minReplicas: 2 minimum
  • stabilizationWindowSeconds: 120

✅ AOT / Startup Fixes

  • → Use @QuarkusIntegrationTest, not @QuarkusTest
  • → Don't add -XX:AOTCache — Quarkus sets it automatically
  • → Run mvn verify, not mvn package
  • → Pin JDK minor version in Dockerfile FROM

✅ Observability Fixes

  • percentiles-histogram.jvm.gc.pause=true
  • → Use quarkus-micrometer-opentelemetry (unified)
  • → PrometheusRule: GC P99 >500ms for 2m → alert
  • → Baseline first. Change one flag. Measure again.