27.1 JIT Compilation Fundamentals

Just-In-Time (JIT) compilation is the cornerstone of Java's performance, dynamically optimizing bytecode into native machine code at runtime.

What is JIT Compilation?

// Understanding JIT Compilation
public class JITFundamentals {

    public static void printJITConcepts() {
        System.out.println("=== JIT COMPILATION OVERVIEW ===");

        System.out.println("\n--- EXECUTION MODES ---");

        System.out.println("\n1. INTERPRETATION");
        System.out.println("   - Bytecode executed directly by interpreter");
        System.out.println("   - Slow but starts immediately");
        System.out.println("   - No compilation overhead");
        System.out.println("   - Used for cold code paths");

        System.out.println("\n2. JIT COMPILATION");
        System.out.println("   - Bytecode compiled to native machine code");
        System.out.println("   - Fast execution after compilation");
        System.out.println("   - Compilation overhead upfront");
        System.out.println("   - Used for hot code paths");

        System.out.println("\n3. AHEAD-OF-TIME (AOT)");
        System.out.println("   - Pre-compiled to native code");
        System.out.println("   - Fast startup, no warmup");
        System.out.println("   - Less optimization opportunities");
        System.out.println("   - GraalVM Native Image");

        System.out.println("\n--- WHY JIT? ---");
        System.out.println("✓ Profile-guided optimization");
        System.out.println("  - Compiler sees actual runtime behavior");
        System.out.println("  - Optimizes for real usage patterns");
        System.out.println("  - Adapts to changing workloads");

        System.out.println("\n✓ Aggressive optimizations");
        System.out.println("  - Inlining based on actual call sites");
        System.out.println("  - Escape analysis for stack allocation");
        System.out.println("  - Speculative optimizations");

        System.out.println("\n✓ Platform-specific code");
        System.out.println("  - Generates code for actual CPU");
        System.out.println("  - Uses CPU-specific instructions (SSE, AVX)");
        System.out.println("  - Optimal register allocation");
    }

    public static void printJVMArchitecture() {
        System.out.println("\n=== JVM EXECUTION ARCHITECTURE ===");

        System.out.println("\n--- EXECUTION FLOW ---");
        System.out.println("1. Java source → javac → Bytecode (.class)");
        System.out.println("2. JVM loads bytecode");
        System.out.println("3. Interpreter executes bytecode");
        System.out.println("4. Profiler monitors execution");
        System.out.println("5. Hot methods identified");
        System.out.println("6. JIT compiler generates native code");
        System.out.println("7. Subsequent calls use native code");

        System.out.println("\n--- CODE CACHE ---");
        System.out.println("Purpose: Stores compiled native code");
        System.out.println("Size: Default ~240MB (varies by JVM/platform)");
        System.out.println("Segments:");
        System.out.println("  - Non-method code (JVM internals)");
        System.out.println("  - Profiled code (C1 compiled)");
        System.out.println("  - Non-profiled code (C2 compiled)");

        System.out.println("\nWhen full:");
        System.out.println("  ⚠ JIT compilation stops");
        System.out.println("  ⚠ Methods stay interpreted");
        System.out.println("  ⚠ Performance degrades");

        System.out.println("\nMonitoring:");
        System.out.println("  jstat -compiler <pid>");
        System.out.println("  -XX:+PrintCodeCache");
    }
}

Bytecode to Native Code

// Bytecode and Native Code Concepts
public class BytecodeToNative {

    // Example method
    public static int add(int a, int b) {
        return a + b;
    }

    public static void demonstrateBytecode() {
        System.out.println("=== BYTECODE REPRESENTATION ===");

        System.out.println("\nJava source:");
        System.out.println("  public static int add(int a, int b) {");
        System.out.println("      return a + b;");
        System.out.println("  }");

        System.out.println("\nBytecode (javap -c):");
        System.out.println("  0: iload_0        // Load 'a' from local variable 0");
        System.out.println("  1: iload_1        // Load 'b' from local variable 1");
        System.out.println("  2: iadd           // Integer add");
        System.out.println("  3: ireturn        // Return integer");

        System.out.println("\n--- INTERPRETER EXECUTION ---");
        System.out.println("Interpreter reads each bytecode:");
        System.out.println("  1. Fetch bytecode instruction");
        System.out.println("  2. Decode instruction");
        System.out.println("  3. Execute operation");
        System.out.println("  4. Move to next instruction");
        System.out.println("  5. Repeat");

        System.out.println("\nOverhead:");
        System.out.println("  • Instruction dispatch");
        System.out.println("  • Stack operations");
        System.out.println("  • No CPU register usage");
        System.out.println("  • ~10-100x slower than native");

        System.out.println("\n--- JIT COMPILED (x86-64 assembly pseudocode) ---");
        System.out.println("Native machine code:");
        System.out.println("  mov eax, edi     ; Move 'a' to register");
        System.out.println("  add eax, esi     ; Add 'b' to register");
        System.out.println("  ret              ; Return");

        System.out.println("\nBenefits:");
        System.out.println("  ✓ Direct CPU execution");
        System.out.println("  ✓ Register allocation");
        System.out.println("  ✓ No interpretation overhead");
        System.out.println("  ✓ ~10-100x faster than interpreter");
    }
}

C1 and C2 Compilers

// JIT Compilers Overview
public class JITCompilers {

    public static void printCompilerDifferences() {
        System.out.println("=== C1 VS C2 COMPILERS ===");

        System.out.println("\n--- C1 COMPILER (CLIENT) ---");
        System.out.println("Purpose: Fast compilation for quick warmup");

        System.out.println("\nCharacteristics:");
        System.out.println("  ✓ Fast compilation (~100ms per method)");
        System.out.println("  ✓ Moderate optimizations");
        System.out.println("  ✓ Includes profiling instrumentation");
        System.out.println("  ✗ Less aggressive optimizations");

        System.out.println("\nOptimizations:");
        System.out.println("  • Basic inlining");
        System.out.println("  • Constant folding");
        System.out.println("  • Dead code elimination");
        System.out.println("  • Local value numbering");
        System.out.println("  • Profiling for C2");

        System.out.println("\nUse case:");
        System.out.println("  - Quick startup");
        System.out.println("  - Client applications");
        System.out.println("  - Warm up tier for C2");

        System.out.println("\n--- C2 COMPILER (SERVER) ---");
        System.out.println("Purpose: Maximum performance for hot code");

        System.out.println("\nCharacteristics:");
        System.out.println("  ✓ Aggressive optimizations");
        System.out.println("  ✓ Peak performance");
        System.out.println("  ✗ Slow compilation (seconds per method)");
        System.out.println("  ✗ Higher memory usage");

        System.out.println("\nOptimizations:");
        System.out.println("  • Aggressive inlining (multiple levels)");
        System.out.println("  • Escape analysis");
        System.out.println("  • Loop optimizations (unrolling, vectorization)");
        System.out.println("  • Global value numbering");
        System.out.println("  • Range check elimination");
        System.out.println("  • Lock coarsening/elision");
        System.out.println("  • Intrinsic methods");
        System.out.println("  • Speculative optimizations");

        System.out.println("\nUse case:");
        System.out.println("  - Long-running applications");
        System.out.println("  - Server workloads");
        System.out.println("  - Maximum throughput");

        System.out.println("\n--- COMPARISON ---");
        System.out.println();
        System.out.println("| Aspect          | C1              | C2                 |");
        System.out.println("|-----------------|-----------------|-------------------|");
        System.out.println("| Compilation     | Fast (~100ms)   | Slow (seconds)    |");
        System.out.println("| Code quality    | Moderate        | Excellent         |");
        System.out.println("| Memory usage    | Low             | High              |");
        System.out.println("| Inlining depth  | 2-3 levels      | 8+ levels         |");
        System.out.println("| Profiling       | Yes (adds data) | No (uses data)    |");
        System.out.println("| Use case        | Warmup          | Peak performance  |");
    }
}

Compilation Thresholds

// Compilation Triggers
public class CompilationThresholds {

    public static void printThresholdConcepts() {
        System.out.println("=== COMPILATION THRESHOLDS ===");

        System.out.println("\n--- INVOCATION COUNTER ---");
        System.out.println("Tracks method invocations");
        System.out.println("Threshold: Method compiled after N invocations");

        System.out.println("\nDefault thresholds:");
        System.out.println("  C1 (Tier 3): ~2,000 invocations");
        System.out.println("  C2 (Tier 4): ~10,000 invocations");

        System.out.println("\n--- BACK-EDGE COUNTER ---");
        System.out.println("Tracks loop iterations");
        System.out.println("Purpose: Compile hot loops even without many method calls");

        System.out.println("\nExample:");
        System.out.println("  void processData() {");
        System.out.println("    // Called once");
        System.out.println("    for (int i = 0; i < 1_000_000; i++) {");
        System.out.println("      // Loop body executes 1M times");
        System.out.println("      // Back-edge counter triggers compilation");
        System.out.println("    }");
        System.out.println("  }");

        System.out.println("\n--- ON-STACK REPLACEMENT (OSR) ---");
        System.out.println("Definition: Replace interpreted code while method is running");

        System.out.println("\nScenario:");
        System.out.println("  1. Long-running loop in interpreter");
        System.out.println("  2. Back-edge counter reaches threshold");
        System.out.println("  3. JIT compiles loop");
        System.out.println("  4. Execution jumps from interpreter to compiled code");
        System.out.println("  5. Loop continues in compiled code");

        System.out.println("\nBenefit:");
        System.out.println("  ✓ Don't wait for method to complete");
        System.out.println("  ✓ Long-running methods benefit immediately");

        System.out.println("\n--- TUNING THRESHOLDS ---");

        System.out.println("\nCompileThreshold (C2 only, no tiered):");
        System.out.println("  -XX:CompileThreshold=10000");
        System.out.println("  Default: 10,000");
        System.out.println("  Lower: Earlier compilation");
        System.out.println("  Higher: More profiling, later compilation");

        System.out.println("\nTiered compilation thresholds:");
        System.out.println("  -XX:Tier3InvocationThreshold=2000");
        System.out.println("  -XX:Tier4InvocationThreshold=15000");

        System.out.println("\n⚠ CAUTION");
        System.out.println("  • Lower thresholds = faster warmup");
        System.out.println("  • But less profiling data");
        System.out.println("  • May hurt peak performance");
        System.out.println("  • Usually leave at defaults");
    }

    // Example: Method that will be compiled
    public static long hotMethod(int n) {
        long sum = 0;
        for (int i = 0; i < n; i++) {
            sum += i;
        }
        return sum;
    }

    public static void demonstrateCompilation() {
        System.out.println("\n=== COMPILATION DEMONSTRATION ===");

        System.out.println("\nRun with: -XX:+PrintCompilation");
        System.out.println("Output format:");
        System.out.println("  timestamp compile_id tier method_name size");

        System.out.println("\nExample output:");
        System.out.println("  100   1       3       java.lang.String::hashCode (55 bytes)");
        System.out.println("  150   2       4       java.util.HashMap::get (23 bytes)");
        System.out.println("  ^^^   ^^^     ^^^     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^");
        System.out.println("  time  id      tier    method (size in bytes)");

        // Trigger compilation by calling method many times
        System.out.println("\nCalling hotMethod() repeatedly to trigger compilation...");
        for (int i = 0; i < 20000; i++) {
            hotMethod(1000);
        }
        System.out.println("Method should be compiled by now (check -XX:+PrintCompilation output)");
    }
}

Code Cache

// Code Cache Management
public class CodeCacheManagement {

    public static void printCodeCacheDetails() {
        System.out.println("=== CODE CACHE ===");

        System.out.println("\n--- WHAT IS CODE CACHE? ---");
        System.out.println("Memory region for compiled native code");
        System.out.println("Stores:");
        System.out.println("  • JIT-compiled methods");
        System.out.println("  • JVM internal code");
        System.out.println("  • Adapter stubs");
        System.out.println("  • Native wrappers");

        System.out.println("\n--- CODE CACHE SEGMENTS ---");

        System.out.println("\n1. Non-nmethods segment");
        System.out.println("   - JVM internal code");
        System.out.println("   - Adapters, stubs");
        System.out.println("   - Size: ~8MB");

        System.out.println("\n2. Profiled code segment");
        System.out.println("   - C1-compiled code with profiling");
        System.out.println("   - Size: ~120MB");

        System.out.println("\n3. Non-profiled code segment");
        System.out.println("   - C2-compiled code (peak optimization)");
        System.out.println("   - Size: ~120MB");

        System.out.println("\nTotal default: ~240-250MB");

        System.out.println("\n--- CODE CACHE FULL ---");
        System.out.println("When code cache fills up:");
        System.out.println("  ⚠ JIT compilation stops");
        System.out.println("  ⚠ New methods stay interpreted");
        System.out.println("  ⚠ Performance degrades significantly");
        System.out.println("  ⚠ Warning: 'CodeCache is full'");

        System.out.println("\nCauses:");
        System.out.println("  • Too many compiled methods");
        System.out.println("  • Large methods");
        System.out.println("  • Aggressive inlining");

        System.out.println("\n--- CONFIGURATION ---");

        System.out.println("\nSet total code cache size:");
        System.out.println("  -XX:ReservedCodeCacheSize=512m");
        System.out.println("  Default: ~240MB");
        System.out.println("  Range: 2MB - 2GB");

        System.out.println("\nSet initial size:");
        System.out.println("  -XX:InitialCodeCacheSize=256m");

        System.out.println("\nSegment sizing (advanced):");
        System.out.println("  -XX:NonNMethodCodeHeapSize=8m");
        System.out.println("  -XX:ProfiledCodeHeapSize=120m");
        System.out.println("  -XX:NonProfiledCodeHeapSize=120m");

        System.out.println("\n--- MONITORING ---");

        System.out.println("\n1. jstat:");
        System.out.println("   jstat -compiler <pid>");
        System.out.println("   Shows compilation count and failed compilations");

        System.out.println("\n2. PrintCodeCache flag:");
        System.out.println("   -XX:+PrintCodeCache");
        System.out.println("   Prints code cache statistics on exit");

        System.out.println("\n3. JMX MBean:");
        System.out.println("   java.lang:type=MemoryPool,name=Code Cache");

        System.out.println("\n4. JFR events:");
        System.out.println("   jdk.CodeCacheFull");
        System.out.println("   jdk.CodeCacheStatistics");
    }
}

Interpretation vs Compilation

// Performance Comparison
public class InterpretationVsCompilation {

    public static void printPerformanceComparison() {
        System.out.println("=== INTERPRETATION VS COMPILATION ===");

        System.out.println("\n--- INTERPRETER ---");
        System.out.println("Advantages:");
        System.out.println("  ✓ Instant startup");
        System.out.println("  ✓ No compilation overhead");
        System.out.println("  ✓ Low memory usage");
        System.out.println("  ✓ Portable (same bytecode everywhere)");

        System.out.println("\nDisadvantages:");
        System.out.println("  ✗ Slow execution (~10-100x slower)");
        System.out.println("  ✗ Instruction dispatch overhead");
        System.out.println("  ✗ No CPU registers used");
        System.out.println("  ✗ Stack-based operations");

        System.out.println("\n--- JIT COMPILATION ---");
        System.out.println("Advantages:");
        System.out.println("  ✓ Fast execution (native speed)");
        System.out.println("  ✓ CPU registers");
        System.out.println("  ✓ Platform-specific optimizations");
        System.out.println("  ✓ Profile-guided optimization");

        System.out.println("\nDisadvantages:");
        System.out.println("  ✗ Compilation overhead");
        System.out.println("  ✗ Warmup time required");
        System.out.println("  ✗ Memory for compiled code");
        System.out.println("  ✗ Code cache management");

        System.out.println("\n--- PERFORMANCE NUMBERS ---");
        System.out.println("Typical speedup (interpreted → C2):");
        System.out.println("  • Simple methods: 10-20x");
        System.out.println("  • Loop-heavy code: 50-100x");
        System.out.println("  • Numeric computations: 100x+");
        System.out.println("  • With escape analysis: 1000x+ (stack allocation)");

        System.out.println("\n--- WHEN TO USE EACH ---");

        System.out.println("\nUse interpreter:");
        System.out.println("  • Cold code (rarely executed)");
        System.out.println("  • One-time initialization");
        System.out.println("  • Short-lived applications");
        System.out.println("  • Testing/debugging");

        System.out.println("\nUse JIT compilation:");
        System.out.println("  • Hot code (frequently executed)");
        System.out.println("  • Long-running applications");
        System.out.println("  • Server workloads");
        System.out.println("  • Performance-critical code");
    }

    // Benchmark example
    public static long sumArray(int[] array) {
        long sum = 0;
        for (int value : array) {
            sum += value;
        }
        return sum;
    }

    public static void demonstrateWarmup() {
        System.out.println("\n=== WARMUP DEMONSTRATION ===");

        int[] data = new int[1_000_000];
        for (int i = 0; i < data.length; i++) {
            data[i] = i;
        }

        // First runs: interpreted (slow)
        System.out.println("Initial runs (interpreted):");
        long start = System.nanoTime();
        for (int i = 0; i < 100; i++) {
            sumArray(data);
        }
        long interpretedTime = System.nanoTime() - start;
        System.out.println("Time: " + interpretedTime / 1_000_000 + "ms");

        // More runs to trigger compilation
        for (int i = 0; i < 10000; i++) {
            sumArray(data);
        }

        // After compilation: compiled (fast)
        System.out.println("\nAfter warmup (compiled):");
        start = System.nanoTime();
        for (int i = 0; i < 100; i++) {
            sumArray(data);
        }
        long compiledTime = System.nanoTime() - start;
        System.out.println("Time: " + compiledTime / 1_000_000 + "ms");

        System.out.println("\nSpeedup: " + 
            (double)interpretedTime / compiledTime + "x");
    }
}

Best Practices

  • Enable tiered compilation: Default in modern JVMs, provides best balance.
  • Allow warmup time: Let JIT compilers optimize hot code naturally.
  • Monitor code cache: Ensure it doesn't fill up in production.
  • Use PrintCompilation for analysis: Understand what's being compiled.
  • Don't tune thresholds prematurely: Defaults are well-tuned.
  • Profile before optimizing: Use JFR to identify actual bottlenecks.
  • Keep hot methods small: Easier to inline and optimize.
  • Avoid megamorphic call sites: Limit polymorphism in hot loops.
  • Test with realistic workloads: Synthetic benchmarks don't reflect production.
  • Consider AOT for startup: GraalVM Native Image when warmup is unacceptable.