27.1 JIT Compilation Fundamentals
Just-In-Time (JIT) compilation is the cornerstone of Java's performance, dynamically optimizing bytecode into native machine code at runtime.
What is JIT Compilation?
// Understanding JIT Compilation
public class JITFundamentals {
public static void printJITConcepts() {
System.out.println("=== JIT COMPILATION OVERVIEW ===");
System.out.println("\n--- EXECUTION MODES ---");
System.out.println("\n1. INTERPRETATION");
System.out.println(" - Bytecode executed directly by interpreter");
System.out.println(" - Slow but starts immediately");
System.out.println(" - No compilation overhead");
System.out.println(" - Used for cold code paths");
System.out.println("\n2. JIT COMPILATION");
System.out.println(" - Bytecode compiled to native machine code");
System.out.println(" - Fast execution after compilation");
System.out.println(" - Compilation overhead upfront");
System.out.println(" - Used for hot code paths");
System.out.println("\n3. AHEAD-OF-TIME (AOT)");
System.out.println(" - Pre-compiled to native code");
System.out.println(" - Fast startup, no warmup");
System.out.println(" - Less optimization opportunities");
System.out.println(" - GraalVM Native Image");
System.out.println("\n--- WHY JIT? ---");
System.out.println("✓ Profile-guided optimization");
System.out.println(" - Compiler sees actual runtime behavior");
System.out.println(" - Optimizes for real usage patterns");
System.out.println(" - Adapts to changing workloads");
System.out.println("\n✓ Aggressive optimizations");
System.out.println(" - Inlining based on actual call sites");
System.out.println(" - Escape analysis for stack allocation");
System.out.println(" - Speculative optimizations");
System.out.println("\n✓ Platform-specific code");
System.out.println(" - Generates code for actual CPU");
System.out.println(" - Uses CPU-specific instructions (SSE, AVX)");
System.out.println(" - Optimal register allocation");
}
public static void printJVMArchitecture() {
System.out.println("\n=== JVM EXECUTION ARCHITECTURE ===");
System.out.println("\n--- EXECUTION FLOW ---");
System.out.println("1. Java source → javac → Bytecode (.class)");
System.out.println("2. JVM loads bytecode");
System.out.println("3. Interpreter executes bytecode");
System.out.println("4. Profiler monitors execution");
System.out.println("5. Hot methods identified");
System.out.println("6. JIT compiler generates native code");
System.out.println("7. Subsequent calls use native code");
System.out.println("\n--- CODE CACHE ---");
System.out.println("Purpose: Stores compiled native code");
System.out.println("Size: Default ~240MB (varies by JVM/platform)");
System.out.println("Segments:");
System.out.println(" - Non-method code (JVM internals)");
System.out.println(" - Profiled code (C1 compiled)");
System.out.println(" - Non-profiled code (C2 compiled)");
System.out.println("\nWhen full:");
System.out.println(" ⚠ JIT compilation stops");
System.out.println(" ⚠ Methods stay interpreted");
System.out.println(" ⚠ Performance degrades");
System.out.println("\nMonitoring:");
System.out.println(" jstat -compiler <pid>");
System.out.println(" -XX:+PrintCodeCache");
}
}
Bytecode to Native Code
// Bytecode and Native Code Concepts
public class BytecodeToNative {
// Example method
public static int add(int a, int b) {
return a + b;
}
public static void demonstrateBytecode() {
System.out.println("=== BYTECODE REPRESENTATION ===");
System.out.println("\nJava source:");
System.out.println(" public static int add(int a, int b) {");
System.out.println(" return a + b;");
System.out.println(" }");
System.out.println("\nBytecode (javap -c):");
System.out.println(" 0: iload_0 // Load 'a' from local variable 0");
System.out.println(" 1: iload_1 // Load 'b' from local variable 1");
System.out.println(" 2: iadd // Integer add");
System.out.println(" 3: ireturn // Return integer");
System.out.println("\n--- INTERPRETER EXECUTION ---");
System.out.println("Interpreter reads each bytecode:");
System.out.println(" 1. Fetch bytecode instruction");
System.out.println(" 2. Decode instruction");
System.out.println(" 3. Execute operation");
System.out.println(" 4. Move to next instruction");
System.out.println(" 5. Repeat");
System.out.println("\nOverhead:");
System.out.println(" • Instruction dispatch");
System.out.println(" • Stack operations");
System.out.println(" • No CPU register usage");
System.out.println(" • ~10-100x slower than native");
System.out.println("\n--- JIT COMPILED (x86-64 assembly pseudocode) ---");
System.out.println("Native machine code:");
System.out.println(" mov eax, edi ; Move 'a' to register");
System.out.println(" add eax, esi ; Add 'b' to register");
System.out.println(" ret ; Return");
System.out.println("\nBenefits:");
System.out.println(" ✓ Direct CPU execution");
System.out.println(" ✓ Register allocation");
System.out.println(" ✓ No interpretation overhead");
System.out.println(" ✓ ~10-100x faster than interpreter");
}
}
C1 and C2 Compilers
// JIT Compilers Overview
public class JITCompilers {
public static void printCompilerDifferences() {
System.out.println("=== C1 VS C2 COMPILERS ===");
System.out.println("\n--- C1 COMPILER (CLIENT) ---");
System.out.println("Purpose: Fast compilation for quick warmup");
System.out.println("\nCharacteristics:");
System.out.println(" ✓ Fast compilation (~100ms per method)");
System.out.println(" ✓ Moderate optimizations");
System.out.println(" ✓ Includes profiling instrumentation");
System.out.println(" ✗ Less aggressive optimizations");
System.out.println("\nOptimizations:");
System.out.println(" • Basic inlining");
System.out.println(" • Constant folding");
System.out.println(" • Dead code elimination");
System.out.println(" • Local value numbering");
System.out.println(" • Profiling for C2");
System.out.println("\nUse case:");
System.out.println(" - Quick startup");
System.out.println(" - Client applications");
System.out.println(" - Warm up tier for C2");
System.out.println("\n--- C2 COMPILER (SERVER) ---");
System.out.println("Purpose: Maximum performance for hot code");
System.out.println("\nCharacteristics:");
System.out.println(" ✓ Aggressive optimizations");
System.out.println(" ✓ Peak performance");
System.out.println(" ✗ Slow compilation (seconds per method)");
System.out.println(" ✗ Higher memory usage");
System.out.println("\nOptimizations:");
System.out.println(" • Aggressive inlining (multiple levels)");
System.out.println(" • Escape analysis");
System.out.println(" • Loop optimizations (unrolling, vectorization)");
System.out.println(" • Global value numbering");
System.out.println(" • Range check elimination");
System.out.println(" • Lock coarsening/elision");
System.out.println(" • Intrinsic methods");
System.out.println(" • Speculative optimizations");
System.out.println("\nUse case:");
System.out.println(" - Long-running applications");
System.out.println(" - Server workloads");
System.out.println(" - Maximum throughput");
System.out.println("\n--- COMPARISON ---");
System.out.println();
System.out.println("| Aspect | C1 | C2 |");
System.out.println("|-----------------|-----------------|-------------------|");
System.out.println("| Compilation | Fast (~100ms) | Slow (seconds) |");
System.out.println("| Code quality | Moderate | Excellent |");
System.out.println("| Memory usage | Low | High |");
System.out.println("| Inlining depth | 2-3 levels | 8+ levels |");
System.out.println("| Profiling | Yes (adds data) | No (uses data) |");
System.out.println("| Use case | Warmup | Peak performance |");
}
}
Compilation Thresholds
// Compilation Triggers
public class CompilationThresholds {
public static void printThresholdConcepts() {
System.out.println("=== COMPILATION THRESHOLDS ===");
System.out.println("\n--- INVOCATION COUNTER ---");
System.out.println("Tracks method invocations");
System.out.println("Threshold: Method compiled after N invocations");
System.out.println("\nDefault thresholds:");
System.out.println(" C1 (Tier 3): ~2,000 invocations");
System.out.println(" C2 (Tier 4): ~10,000 invocations");
System.out.println("\n--- BACK-EDGE COUNTER ---");
System.out.println("Tracks loop iterations");
System.out.println("Purpose: Compile hot loops even without many method calls");
System.out.println("\nExample:");
System.out.println(" void processData() {");
System.out.println(" // Called once");
System.out.println(" for (int i = 0; i < 1_000_000; i++) {");
System.out.println(" // Loop body executes 1M times");
System.out.println(" // Back-edge counter triggers compilation");
System.out.println(" }");
System.out.println(" }");
System.out.println("\n--- ON-STACK REPLACEMENT (OSR) ---");
System.out.println("Definition: Replace interpreted code while method is running");
System.out.println("\nScenario:");
System.out.println(" 1. Long-running loop in interpreter");
System.out.println(" 2. Back-edge counter reaches threshold");
System.out.println(" 3. JIT compiles loop");
System.out.println(" 4. Execution jumps from interpreter to compiled code");
System.out.println(" 5. Loop continues in compiled code");
System.out.println("\nBenefit:");
System.out.println(" ✓ Don't wait for method to complete");
System.out.println(" ✓ Long-running methods benefit immediately");
System.out.println("\n--- TUNING THRESHOLDS ---");
System.out.println("\nCompileThreshold (C2 only, no tiered):");
System.out.println(" -XX:CompileThreshold=10000");
System.out.println(" Default: 10,000");
System.out.println(" Lower: Earlier compilation");
System.out.println(" Higher: More profiling, later compilation");
System.out.println("\nTiered compilation thresholds:");
System.out.println(" -XX:Tier3InvocationThreshold=2000");
System.out.println(" -XX:Tier4InvocationThreshold=15000");
System.out.println("\n⚠ CAUTION");
System.out.println(" • Lower thresholds = faster warmup");
System.out.println(" • But less profiling data");
System.out.println(" • May hurt peak performance");
System.out.println(" • Usually leave at defaults");
}
// Example: Method that will be compiled
public static long hotMethod(int n) {
long sum = 0;
for (int i = 0; i < n; i++) {
sum += i;
}
return sum;
}
public static void demonstrateCompilation() {
System.out.println("\n=== COMPILATION DEMONSTRATION ===");
System.out.println("\nRun with: -XX:+PrintCompilation");
System.out.println("Output format:");
System.out.println(" timestamp compile_id tier method_name size");
System.out.println("\nExample output:");
System.out.println(" 100 1 3 java.lang.String::hashCode (55 bytes)");
System.out.println(" 150 2 4 java.util.HashMap::get (23 bytes)");
System.out.println(" ^^^ ^^^ ^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^");
System.out.println(" time id tier method (size in bytes)");
// Trigger compilation by calling method many times
System.out.println("\nCalling hotMethod() repeatedly to trigger compilation...");
for (int i = 0; i < 20000; i++) {
hotMethod(1000);
}
System.out.println("Method should be compiled by now (check -XX:+PrintCompilation output)");
}
}
Code Cache
// Code Cache Management
public class CodeCacheManagement {
public static void printCodeCacheDetails() {
System.out.println("=== CODE CACHE ===");
System.out.println("\n--- WHAT IS CODE CACHE? ---");
System.out.println("Memory region for compiled native code");
System.out.println("Stores:");
System.out.println(" • JIT-compiled methods");
System.out.println(" • JVM internal code");
System.out.println(" • Adapter stubs");
System.out.println(" • Native wrappers");
System.out.println("\n--- CODE CACHE SEGMENTS ---");
System.out.println("\n1. Non-nmethods segment");
System.out.println(" - JVM internal code");
System.out.println(" - Adapters, stubs");
System.out.println(" - Size: ~8MB");
System.out.println("\n2. Profiled code segment");
System.out.println(" - C1-compiled code with profiling");
System.out.println(" - Size: ~120MB");
System.out.println("\n3. Non-profiled code segment");
System.out.println(" - C2-compiled code (peak optimization)");
System.out.println(" - Size: ~120MB");
System.out.println("\nTotal default: ~240-250MB");
System.out.println("\n--- CODE CACHE FULL ---");
System.out.println("When code cache fills up:");
System.out.println(" ⚠ JIT compilation stops");
System.out.println(" ⚠ New methods stay interpreted");
System.out.println(" ⚠ Performance degrades significantly");
System.out.println(" ⚠ Warning: 'CodeCache is full'");
System.out.println("\nCauses:");
System.out.println(" • Too many compiled methods");
System.out.println(" • Large methods");
System.out.println(" • Aggressive inlining");
System.out.println("\n--- CONFIGURATION ---");
System.out.println("\nSet total code cache size:");
System.out.println(" -XX:ReservedCodeCacheSize=512m");
System.out.println(" Default: ~240MB");
System.out.println(" Range: 2MB - 2GB");
System.out.println("\nSet initial size:");
System.out.println(" -XX:InitialCodeCacheSize=256m");
System.out.println("\nSegment sizing (advanced):");
System.out.println(" -XX:NonNMethodCodeHeapSize=8m");
System.out.println(" -XX:ProfiledCodeHeapSize=120m");
System.out.println(" -XX:NonProfiledCodeHeapSize=120m");
System.out.println("\n--- MONITORING ---");
System.out.println("\n1. jstat:");
System.out.println(" jstat -compiler <pid>");
System.out.println(" Shows compilation count and failed compilations");
System.out.println("\n2. PrintCodeCache flag:");
System.out.println(" -XX:+PrintCodeCache");
System.out.println(" Prints code cache statistics on exit");
System.out.println("\n3. JMX MBean:");
System.out.println(" java.lang:type=MemoryPool,name=Code Cache");
System.out.println("\n4. JFR events:");
System.out.println(" jdk.CodeCacheFull");
System.out.println(" jdk.CodeCacheStatistics");
}
}
Interpretation vs Compilation
// Performance Comparison
public class InterpretationVsCompilation {
public static void printPerformanceComparison() {
System.out.println("=== INTERPRETATION VS COMPILATION ===");
System.out.println("\n--- INTERPRETER ---");
System.out.println("Advantages:");
System.out.println(" ✓ Instant startup");
System.out.println(" ✓ No compilation overhead");
System.out.println(" ✓ Low memory usage");
System.out.println(" ✓ Portable (same bytecode everywhere)");
System.out.println("\nDisadvantages:");
System.out.println(" ✗ Slow execution (~10-100x slower)");
System.out.println(" ✗ Instruction dispatch overhead");
System.out.println(" ✗ No CPU registers used");
System.out.println(" ✗ Stack-based operations");
System.out.println("\n--- JIT COMPILATION ---");
System.out.println("Advantages:");
System.out.println(" ✓ Fast execution (native speed)");
System.out.println(" ✓ CPU registers");
System.out.println(" ✓ Platform-specific optimizations");
System.out.println(" ✓ Profile-guided optimization");
System.out.println("\nDisadvantages:");
System.out.println(" ✗ Compilation overhead");
System.out.println(" ✗ Warmup time required");
System.out.println(" ✗ Memory for compiled code");
System.out.println(" ✗ Code cache management");
System.out.println("\n--- PERFORMANCE NUMBERS ---");
System.out.println("Typical speedup (interpreted → C2):");
System.out.println(" • Simple methods: 10-20x");
System.out.println(" • Loop-heavy code: 50-100x");
System.out.println(" • Numeric computations: 100x+");
System.out.println(" • With escape analysis: 1000x+ (stack allocation)");
System.out.println("\n--- WHEN TO USE EACH ---");
System.out.println("\nUse interpreter:");
System.out.println(" • Cold code (rarely executed)");
System.out.println(" • One-time initialization");
System.out.println(" • Short-lived applications");
System.out.println(" • Testing/debugging");
System.out.println("\nUse JIT compilation:");
System.out.println(" • Hot code (frequently executed)");
System.out.println(" • Long-running applications");
System.out.println(" • Server workloads");
System.out.println(" • Performance-critical code");
}
// Benchmark example
public static long sumArray(int[] array) {
long sum = 0;
for (int value : array) {
sum += value;
}
return sum;
}
public static void demonstrateWarmup() {
System.out.println("\n=== WARMUP DEMONSTRATION ===");
int[] data = new int[1_000_000];
for (int i = 0; i < data.length; i++) {
data[i] = i;
}
// First runs: interpreted (slow)
System.out.println("Initial runs (interpreted):");
long start = System.nanoTime();
for (int i = 0; i < 100; i++) {
sumArray(data);
}
long interpretedTime = System.nanoTime() - start;
System.out.println("Time: " + interpretedTime / 1_000_000 + "ms");
// More runs to trigger compilation
for (int i = 0; i < 10000; i++) {
sumArray(data);
}
// After compilation: compiled (fast)
System.out.println("\nAfter warmup (compiled):");
start = System.nanoTime();
for (int i = 0; i < 100; i++) {
sumArray(data);
}
long compiledTime = System.nanoTime() - start;
System.out.println("Time: " + compiledTime / 1_000_000 + "ms");
System.out.println("\nSpeedup: " +
(double)interpretedTime / compiledTime + "x");
}
}
Best Practices
- Enable tiered compilation: Default in modern JVMs, provides best balance.
- Allow warmup time: Let JIT compilers optimize hot code naturally.
- Monitor code cache: Ensure it doesn't fill up in production.
- Use PrintCompilation for analysis: Understand what's being compiled.
- Don't tune thresholds prematurely: Defaults are well-tuned.
- Profile before optimizing: Use JFR to identify actual bottlenecks.
- Keep hot methods small: Easier to inline and optimize.
- Avoid megamorphic call sites: Limit polymorphism in hot loops.
- Test with realistic workloads: Synthetic benchmarks don't reflect production.
- Consider AOT for startup: GraalVM Native Image when warmup is unacceptable.