60-Minute Deep Dive

TAMING THE JVM

Optimizing Java Workloads
on OpenShift & Kubernetes

Quarkus 3.33.1 LTS Java 21 G1GC / ZGC / Shenandoah AppCDS Virtual Threads

Based on: Optimizing Cloud Native Java | SRE with Java Microservices | Quarkus 3.33.1 LTS

github.com/patterncatalyst/quarkus-optimization

Agenda

01

Container-Native JVM Fundamentals

02

Right-Sizing Java Workloads

03

Garbage Collection Optimization

04

Startup Time Reduction (AppCDS)

05

Observability & Instrumentation

06

Autoscaling Integration

07

Systematic Tuning & Cost ROI

Bonus

Leyden · gRPC · Latency · Panama · Valhalla

THE PROBLEM

Why Java + Kubernetes = Complexity

60%of Java apps
overprovision memory

4–8stypical JVM cold
start on Kubernetes

2–3×infrastructure waste
from poor bin-packing

$$$unnecessary cloud
spend each month

Default JVM reads /proc/meminfo and sees the NODE's full RAM — claims 64 GB heap inside a 512 MB container → OOMKill

SECTION 01

Container-Native JVM Fundamentals

❌ Before

# Hardcoded — breaks with resize / VPA
-Xms512m -Xmx2048m

JVM reads /proc/meminfo → host RAM
Claims 64GB inside 512MB container

✅ Java 21

-XX:MaxRAMPercentage=75.0
-XX:InitialRAMPercentage=50.0
-XX:MinRAMPercentage=25.0
-XX:NativeMemoryTracking=summary

UseContainerSupport is ON by default in Java 21. Reads cgroup limits correctly. cgroup v2 (RHEL 9 / OCP 4.14+): reads /sys/fs/cgroup/memory.max

SECTION 01

JVM Memory Regions — Six Buckets, Not One

Region	Typical Size	Controlled By
Heap (Old + Young Gen)	50–75%	`MaxRAMPercentage`
Metaspace	50–200 MB	`-XX:MaxMetaspaceSize=256m`
Platform Thread Stacks	1 MB/thread	`-Xss` or Virtual Threads
Native Memory (JIT, GC)	100–300 MB	—
Direct ByteBuffers	Varies	Netty / NIO config
GC Bookkeeping	50–100 MB	—

Java 21 Virtual Threads: stacks live in heap as tiny continuations — eliminates 1MB/thread platform thread stack budget for I/O-bound workloads

SECTION 02

Right-Sizing Java Workloads

requests — Scheduling Guarantee

Scheduler uses this to find a node
Set to P50 steady-state RSS
Too high → pods can't schedule
Too low → CPU throttle on full node

limits — Hard Ceiling

Memory exceeded → OOMKill (exit 137)
CPU exceeded → throttled (not killed)
Set memory limit 25-30% above P99 RSS
Set CPU limit 2-4× request to absorb GC spikes

Demo 07: 7-workload analysis · 4 nodes → 2 nodes · +67% pod density · $6,720/month saving · 17× ROI

SECTION 03

GC in Containers: Four Challenges

CPU Throttling Extends GC

CPU limits throttle GC threads mid-pause. 100ms G1GC → 400ms under throttle. Set limit ≥ 2× request.

ParallelGCThreads Default

JVM defaults to host CPU count. 64-core node + 4 CPU limit = 64 threads competing for 4 CPUs.

GC-Induced HPA Thrash

GC pause → CPU spike → HPA fires → new pods GC → repeat. Scale on RPS, not CPU.

Heap Sizing vs GC Pressure

Small heap = frequent GC. Too large = infrequent but long GC. Start at MaxRAMPercentage=75.

SECTION 03

GC Selection Guide

Collector	Pause	Best For	Key Flags
G1GC	50–300ms	General purpose, Temurin/Corretto default	`-XX:+UseG1GC -XX:MaxGCPauseMillis=200`
Shenandoah	1–20ms	UBI9 default — Red Hat images ship this	`-XX:+UseShenandoahGC`
ZGC (Gen)	<1ms	Low-latency APIs, any heap size, HPA stability	`-XX:+UseZGC -XX:+ZGenerational`
Serial GC	STW	CLI tools, batch, <256MB heap only	`-XX:+UseSerialGC`

Note: UBI9 ships Shenandoah. Demos 02 and 06 override to -XX:+UseG1GC / -XX:+UseZGC for clean comparison.

SECTION 04

Startup Time Reduction

Spring Boot vs Quarkus Baseline

Spring Boot 4.0.5	~4–8s
Quarkus 3.33.1 JVM	~0.3–0.8s
Quarkus + AppCDS (JDK 21)	~0.15–0.4s
Quarkus + Leyden (JDK 25)	~148ms (Demo 04)

One Property

# application.properties
quarkus.package.jar.aot.enabled=true

# Build + train
mvn verify   # (not package)
# → runs @QuarkusIntegrationTest
# → writes target/quarkus-app/app.aot

SECTION 04

Virtual Threads — @RunOnVirtualThread

@Path("/allocate")
@ApplicationScoped
public class GcResource {
    @GET
    @RunOnVirtualThread   // ← One annotation. Done.
    public AllocResponse allocate(
            @QueryParam("mb") int mb) {
        return doHeavyWork(mb);
    }
}

Container sizing impact

Platform thread stacks: 1MB each
200 threads = 200MB off-heap
Virtual thread stacks → in heap
10,000 concurrent I/O tasks, same memory

resources:
  requests:
    memory: "256Mi"  # Was 512Mi
  limits:
    memory: "512Mi"

SECTION 05

Observability — You Can't Tune What You Can't See

JFR (JDK Flight Recorder)

Built-in, <1% overhead. GC events, allocations, IO. jcmd pid JFR.start

Cryostat (OpenShift)

OpenShift-native JFR management via Kubernetes operator. Auto-discover pods via annotation.

OTel → Grafana LGTM

quarkus-micrometer-opentelemetry — single extension, all telemetry via OTLP

Essential Metrics

jvm_gc_pause_seconds P99 >500ms → switch GC
jvm_memory_used_bytes heap + off-heap

Required: quarkus.micrometer.distribution.percentiles-histogram.jvm.gc.pause=true — without this, Grafana GC panels show no data

SECTION 06

Autoscaling — HPA with JVM-Aware Metrics

spec:
  minReplicas: 2     # NEVER 1 — single pod + GC STW = 100% downtime
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 120  # Absorb GC CPU spikes up to 2min
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
  metrics:
  - type: External
    external:
      metric: { name: http_requests_per_second }
      target: { type: AverageValue, averageValue: "50" }
  - type: External
    external:
      metric: { name: jvm_memory_used_ratio }
      target: { type: AverageValue, averageValue: "0.80" }

SECTION 07

Systematic Tuning Workflow

Instrument

→

Baseline

→

Diagnose

→

Tune

→

Validate

40-60%Memory reduction
after right-sizing

2-3×Pod density
per node

55%Startup reduction
with AppCDS

$$$Node savings
from bin-packing

DEMO 01

Container-Aware
Heap Sizing

Run WITHOUT UseContainerSupport → JVM claims host RAM
Run WITH UseContainerSupport + MaxRAMPercentage=75 → respects 512MB
Live jcmd output showing heap sizes before and after
OOMKill simulation when JVM ignores container limits

cd demo-01-heap-sizing
./demo.sh

DEMO 02

GC Monitoring
with Prometheus

Quarkus 3.33.1 + quarkus-micrometer-opentelemetry + Grafana LGTM
Live GC pause histograms at /q/metrics
Generate GC pressure — watch metrics AND traces simultaneously
G1GC vs Generational ZGC side-by-side pause comparison
Virtual threads: 500 concurrent tasks, minimal platform thread count

cd quarkus-demo-02-gc-monitoring
./demo.sh   # starts podman-compose stack

DEMO 03

AppCDS Startup
Acceleration

Quarkus baseline: ~0.3-0.8s (already 10× faster than Spring Boot)
quarkus.package.jar.aot.enabled=true — one property
Maven plugin handles training on @QuarkusIntegrationTest suite
Quarkus + AOT Cache: ~0.15-0.4s (30-50% additional gain)
Progression: AppCDS (JDK 21) → Leyden -XX:AOTCache (JDK 25)

cd quarkus-demo-03-appcds
./demo.sh

Key Takeaways

Always enable UseContainerSupport + MaxRAMPercentage — hardcoded -Xmx is a container anti-pattern
Right-size first, then tune — measure RSS + off-heap before setting requests/limits
Match GC to workload — G1GC general, ZGC/Shenandoah for latency-sensitive APIs
Quarkus AppCDS: one property — quarkus.package.jar.aot.enabled=true. Already 5-10× faster than Spring Boot
Observe before you tune — JFR + Cryostat + Prometheus validates every change
Autoscale on RPS not CPU — GC pauses lie to HPA. Use @RunOnVirtualThread
Quantify savings — track cost per namespace to show business value from engineering work

Resources & Q&A

📗 Optimizing Cloud Native Java — Benjamin Evans et al. · O'Reilly

📗 SRE with Java Microservices — Jonathan Schneider · O'Reilly

🔗 Demo Repo: github.com/patterncatalyst/quarkus-optimization

Quarkus AOT: quarkus.io/guides/aot

Grafana JVM: Dashboard 4701

Virtual Threads: quarkus.io/guides/virtual-threads

JVM Tuning: Red Hat JVM Guide

Questions?

LEYDEN

Project Leyden — JVM AOT Cache

Release	JEP	What it adds	Gain
JDK 24	483	AOT class loading & linking	~40% startup
JDK 25 LTS	514+515	Ergonomics + JIT method profiles	~75% startup (Demo 04: 609→148ms)
JDK 26	516	ZGC support — no longer have to choose	+ZGC compat
Future	—	Pre-compiled native code in cache	Instant peak perf

Quarkus 3.33.1: quarkus.package.jar.aot.enabled=true — one property, all JDK versions, cache automatically richer on each upgrade

DEMO 04 · JDK 25 LTS

Quarkus + Project Leyden
AOT Cache

One property: quarkus.package.jar.aot.enabled=true
Build trains on @QuarkusIntegrationTest suite
Output: app.aot alongside quarkus-run.jar
609ms → 148ms startup (−75%)

cd quarkus-demo-04-leyden
./demo.sh   # JDK 25 required (in container)

gRPC

REST vs gRPC — Inside the Cluster

REST / JSON

HTTP/1.1 (or 2)
JSON text (~400 bytes)
New connection per request
SSE / WebSocket only for streaming
curl friendly ✅
Browser native ✅

gRPC / Protobuf

HTTP/2 always
Binary Protobuf (~40 bytes)
Multiplexed, persistent
Built-in streaming (4 modes) ✅
Needs grpcurl / Postman
Generated stubs

⚠️ Localhost caveat: gRPC unary is SLOWER than REST on localhost — network cost is zero. gRPC wins streaming and high concurrency (c=500) even locally.

LOW LATENCY

Why the JVM Breaks Latency SLAs

G1GC — Default

Young GC pause: 10–200ms
Mixed GC pause: 50–500ms
Full GC (worst): 1–10s
Pauses SCALE with heap size
CPU spike → HPA false scale-out

ZGC Generational — JDK 21+

All pauses: <1ms
Scales with thread count, not heap
Load barrier overhead: ~5-15%
Smooth CPU profile → no HPA thrash

-XX:+UseZGC
-XX:+ZGenerational

RIGHT-SIZING

Cost Impact Analysis & Business Case

$80,640

annual saving · 2 nodes × $0.384/hr × 8,760 hrs · this cluster alone

💰 Direct savings

2 nodes eliminated · $1,120 → $560/month

⏱ Engineering cost

~4 hours · rolling restarts · 17× ROI

📈 Indirect benefits

HPA stability · VPA trustworthy · correct thresholds

🏢 At scale

10 clusters = $67,200/year · OpenShift Cost Management

PANAMA

Project Panama — The End of JNI

JNI (1996)

Write Java + C header + C wrapper
Compile C per platform/arch
Manual native memory — leaks kill JVM
JNI crash = no Java stack trace
sun.misc.Unsafe: private API, breaks each JDK

Panama FFM (JDK 22 — finalized)

try (Arena arena = Arena.ofConfined()) {
  MemorySegment data =
    arena.allocateFrom(JAVA_DOUBLE, arr);
  int result = (int) methodHandle
    .invoke(data, arr.length, outP99);
} // freed here — zero leaks possible

VALHALLA

Project Valhalla — Closing the 30-Year Gap

Today — Value Class

// Heap object — pointer in array
// 8-byte header per element
// GC-tracked — every allocation
record Point(double x, double y) {}

Valhalla (preview JDK 25+)

// Stored inline — no header
// x0,y0,x1,y1 densely packed
// GC never sees it
value class Point {
  double x; double y;
}

📉 Memory

List<double>: 1× vs List<Double>: 3×. Pod requests cut up to 50%.

♻️ GC Pressure

Value types: zero heap allocation, zero GC tracking. HPA stays quiet.

⚡ Cache Performance

Sequential memory access. L1/L2 cache-friendly. SIMD-friendly layout.

📅 Timeline

Preview JDK 25+. Universal generics (List<int>) after primitive classes. Stable ~JDK 27-29.

ANTI-PATTERNS

Common JVM Anti-Patterns on Kubernetes

🧠 Memory

❌ Hardcoded -Xmx/-Xms
❌ MaxRAMPercentage=90 — starves off-heap
❌ No -XX:MaxMetaspaceSize

⚙️ GC & CPU

❌ Default ParallelGCThreads on large node
❌ CPU-based HPA with Java workloads
❌ minReplicas: 1 in HPA
❌ No stabilizationWindowSeconds

🚀 AOT / Startup

❌ @QuarkusTest for AOT training
❌ Manual -XX:AOTCache in Dockerfile
❌ mvn package instead of mvn verify
❌ Ignoring JDK version on rebuild

👁 Observability

❌ No GC pause histogram
❌ Separate prometheus + otel extensions
❌ No PrometheusRule on jvm_gc_pause
❌ Tuning JVM flags without baseline

ANTI-PATTERNS

Anti-Pattern Remediation

✅ Memory Fixes

→ -XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0
→ Use 75%, not 90% — reserve 25% for off-heap
→ Add -XX:MaxMetaspaceSize=256m

✅ GC & CPU Fixes

→ -XX:ParallelGCThreads=N (= CPU request)
→ HPA on RPS, not CPU (KEDA or Prometheus Adapter)
→ minReplicas: 2 minimum
→ stabilizationWindowSeconds: 120

✅ AOT / Startup Fixes

→ Use @QuarkusIntegrationTest, not @QuarkusTest
→ Don't add -XX:AOTCache — Quarkus sets it automatically
→ Run mvn verify, not mvn package
→ Pin JDK minor version in Dockerfile FROM

✅ Observability Fixes

→ percentiles-histogram.jvm.gc.pause=true
→ Use quarkus-micrometer-opentelemetry (unified)
→ PrometheusRule: GC P99 >500ms for 2m → alert
→ Baseline first. Change one flag. Measure again.

TAMING THE JVM

Optimizing Java Workloadson OpenShift & Kubernetes

Agenda

01

02

03

04

05

06

07

Bonus

Why Java + Kubernetes = Complexity

Container-Native JVM Fundamentals

JVM Memory Regions — Six Buckets, Not One

Right-Sizing Java Workloads

requests — Scheduling Guarantee

limits — Hard Ceiling

GC in Containers: Four Challenges

CPU Throttling Extends GC

ParallelGCThreads Default

GC-Induced HPA Thrash

Heap Sizing vs GC Pressure

GC Selection Guide

Startup Time Reduction

Spring Boot vs Quarkus Baseline

One Property

Virtual Threads — @RunOnVirtualThread

Container sizing impact

Observability — You Can't Tune What You Can't See

JFR (JDK Flight Recorder)

Cryostat (OpenShift)

OTel → Grafana LGTM

Essential Metrics

Autoscaling — HPA with JVM-Aware Metrics

Systematic Tuning Workflow

Container-AwareHeap Sizing

GC Monitoringwith Prometheus

AppCDS StartupAcceleration

Key Takeaways

Resources & Q&A

Questions?

Project Leyden — JVM AOT Cache

Quarkus + Project LeydenAOT Cache

REST vs gRPC — Inside the Cluster

REST / JSON

gRPC / Protobuf

Why the JVM Breaks Latency SLAs

G1GC — Default

ZGC Generational — JDK 21+

Cost Impact Analysis & Business Case

💰 Direct savings

⏱ Engineering cost

📈 Indirect benefits

🏢 At scale

Project Panama — The End of JNI

JNI (1996)

Panama FFM (JDK 22 — finalized)

Project Valhalla — Closing the 30-Year Gap

Today — Value Class

Valhalla (preview JDK 25+)

📉 Memory

♻️ GC Pressure

⚡ Cache Performance

📅 Timeline

Common JVM Anti-Patterns on Kubernetes

Anti-Pattern Remediation

Optimizing Java Workloads
on OpenShift & Kubernetes

Container-Aware
Heap Sizing

GC Monitoring
with Prometheus

AppCDS Startup
Acceleration

Quarkus + Project Leyden
AOT Cache