27.2 Tiered Compilation and Optimization Levels
Tiered compilation combines the fast startup of C1 with the peak performance of C2 through progressive optimization levels.
Tiered Compilation Overview
// Tiered Compilation Concepts
public class TieredCompilation {
public static void printTieredCompilationLevels() {
System.out.println("=== TIERED COMPILATION LEVELS ===");
System.out.println("\n--- TIER 0: INTERPRETER ---");
System.out.println("Initial execution mode");
System.out.println(" • Executes bytecode directly");
System.out.println(" • Minimal profiling");
System.out.println(" • Slowest execution");
System.out.println(" • Zero compilation overhead");
System.out.println("\n--- TIER 1: C1 WITHOUT PROFILING ---");
System.out.println("Simple client compilation");
System.out.println(" • Fast compilation");
System.out.println(" • Basic optimizations");
System.out.println(" • No profiling instrumentation");
System.out.println(" • Used for trivial methods");
System.out.println("\n--- TIER 2: C1 WITH LIMITED PROFILING ---");
System.out.println("Client compilation with some profiling");
System.out.println(" • Fast compilation");
System.out.println(" • Basic optimizations");
System.out.println(" • Limited profiling (invocation counters)");
System.out.println(" • Rare in practice (often skipped)");
System.out.println("\n--- TIER 3: C1 WITH FULL PROFILING ---");
System.out.println("Client compilation with comprehensive profiling");
System.out.println(" • Fast compilation (~100ms)");
System.out.println(" • Moderate optimizations");
System.out.println(" • Full profiling instrumentation");
System.out.println(" • Collects data for C2");
System.out.println("\nProfiled data:");
System.out.println(" - Branch taken/not-taken frequencies");
System.out.println(" - Type profiles (receiver types at call sites)");
System.out.println(" - Exception throwing frequency");
System.out.println(" - Loop iteration counts");
System.out.println("\n--- TIER 4: C2 (PEAK OPTIMIZATION) ---");
System.out.println("Server compilation for maximum performance");
System.out.println(" • Slow compilation (seconds)");
System.out.println(" • Aggressive optimizations");
System.out.println(" • Uses profiling data from Tier 3");
System.out.println(" • No profiling overhead in generated code");
System.out.println("\n--- TYPICAL PROGRESSION ---");
System.out.println();
System.out.println("Method execution:");
System.out.println(" Tier 0 (Interpreter)");
System.out.println(" ↓ (after ~200 invocations)");
System.out.println(" Tier 3 (C1 with profiling)");
System.out.println(" ↓ (collect profile data)");
System.out.println(" ↓ (after ~15,000 invocations)");
System.out.println(" Tier 4 (C2 peak optimization)");
System.out.println("\nTrivial methods:");
System.out.println(" Tier 0 → Tier 1 (C1 without profiling)");
System.out.println(" (Skip Tier 3, not worth C2 compilation)");
}
}
// Tiered Compilation Configuration
class TieredCompilationConfig {
/*
# Enable tiered compilation (default in modern JVMs)
java -XX:+TieredCompilation MyApp
# Disable tiered compilation (use C2 only)
java -XX:-TieredCompilation MyApp
# Control compilation tiers
java -XX:TieredStopAtLevel=3 MyApp # Stop at C1 (no C2)
java -XX:TieredStopAtLevel=1 MyApp # Only simple C1
# Adjust tier thresholds
java -XX:Tier3InvocationThreshold=2000 MyApp
java -XX:Tier4InvocationThreshold=15000 MyApp
# Observe tier transitions
java -XX:+PrintCompilation MyApp
*/
}
Profiling and Profile-Guided Optimization
// Profiling Concepts
public class ProfilingConcepts {
public static void printProfilingTypes() {
System.out.println("=== PROFILING IN JIT COMPILATION ===");
System.out.println("\n--- TYPE PROFILING ---");
System.out.println("Tracks receiver types at call sites");
System.out.println("\nExample:");
System.out.println(" interface Animal { void speak(); }");
System.out.println(" class Dog implements Animal { void speak() { ... } }");
System.out.println(" class Cat implements Animal { void speak() { ... } }");
System.out.println(" ");
System.out.println(" Animal animal = ...;");
System.out.println(" animal.speak(); // Profile: Which type is 'animal'?");
System.out.println("\nProfile scenarios:");
System.out.println(" 1. Monomorphic: 100% Dog");
System.out.println(" → Inline Dog.speak() with guard");
System.out.println(" 2. Bimorphic: 50% Dog, 50% Cat");
System.out.println(" → Inline both with type checks");
System.out.println(" 3. Polymorphic: Many types");
System.out.println(" → Virtual call (no inlining)");
System.out.println(" 4. Megamorphic: >3 types");
System.out.println(" → Virtual call (no optimization)");
System.out.println("\n--- BRANCH PROFILING ---");
System.out.println("Tracks which branches are taken");
System.out.println("\nExample:");
System.out.println(" if (condition) {");
System.out.println(" fastPath(); // Taken 99% of time");
System.out.println(" } else {");
System.out.println(" slowPath(); // Taken 1% of time");
System.out.println(" }");
System.out.println("\nOptimization:");
System.out.println(" • Optimize for hot branch (fastPath)");
System.out.println(" • Inline fastPath()");
System.out.println(" • Cold branch becomes uncommon trap");
System.out.println(" • CPU branch predictor benefits");
System.out.println("\n--- EXCEPTION PROFILING ---");
System.out.println("Tracks exception throwing frequency");
System.out.println("\nExample:");
System.out.println(" try {");
System.out.println(" riskyOperation();");
System.out.println(" } catch (Exception e) {");
System.out.println(" handleError(e);");
System.out.println(" }");
System.out.println("\nProfile:");
System.out.println(" • If exceptions are rare (<1%):");
System.out.println(" → Optimize for no-exception path");
System.out.println(" → Exception handling becomes uncommon trap");
System.out.println(" • If exceptions are frequent:");
System.out.println(" → Less aggressive optimization");
System.out.println("\n--- LOOP PROFILING ---");
System.out.println("Tracks loop iteration counts");
System.out.println("\nExample:");
System.out.println(" for (int i = 0; i < n; i++) {");
System.out.println(" process(array[i]);");
System.out.println(" }");
System.out.println("\nProfile data:");
System.out.println(" • Average iteration count");
System.out.println(" • Loop trip count");
System.out.println(" • Back-edge frequency");
System.out.println("\nOptimizations enabled:");
System.out.println(" • Loop unrolling");
System.out.println(" • Vectorization (SIMD)");
System.out.println(" • Range check elimination");
}
}
Method Inlining
// Method Inlining
public class MethodInlining {
public static void printInliningConcepts() {
System.out.println("=== METHOD INLINING ===");
System.out.println("\n--- WHAT IS INLINING? ---");
System.out.println("Replace method call with method body");
System.out.println("\nBefore inlining:");
System.out.println(" int result = add(a, b);");
System.out.println(" // Call overhead");
System.out.println("\nAfter inlining:");
System.out.println(" int result = a + b;");
System.out.println(" // No call overhead");
System.out.println("\n--- WHY INLINE? ---");
System.out.println("✓ Eliminates call overhead");
System.out.println(" - No stack frame creation");
System.out.println(" - No parameter passing");
System.out.println(" - No return handling");
System.out.println("\n✓ Enables further optimizations");
System.out.println(" - Constant folding across calls");
System.out.println(" - Dead code elimination");
System.out.println(" - Register allocation across calls");
System.out.println("\n✓ Better CPU cache utilization");
System.out.println(" - More code in instruction cache");
System.out.println(" - Better branch prediction");
System.out.println("\n--- INLINING HEURISTICS ---");
System.out.println("\nMethod size:");
System.out.println(" • Trivial: <35 bytecodes → Always inline");
System.out.println(" • Small: <325 bytecodes → Likely inline");
System.out.println(" • Large: >325 bytecodes → Rarely inline");
System.out.println(" • Huge: >8000 bytecodes → Never inline");
System.out.println("\nInlining depth:");
System.out.println(" • C1: 2-3 levels deep");
System.out.println(" • C2: 8-9 levels deep");
System.out.println("\nFrequency:");
System.out.println(" • Hot call sites: Prioritize for inlining");
System.out.println(" • Cold call sites: Skip inlining");
System.out.println("\n--- VIRTUAL METHOD INLINING ---");
System.out.println("\nChallenge:");
System.out.println(" interface Shape { double area(); }");
System.out.println(" Shape shape = ...; // Unknown type");
System.out.println(" double a = shape.area(); // Which implementation?");
System.out.println("\nSolution: Speculative inlining");
System.out.println(" 1. Profile: 99% calls are Circle.area()");
System.out.println(" 2. Generate code:");
System.out.println(" if (shape instanceof Circle) {");
System.out.println(" return <inlined Circle.area()>");
System.out.println(" } else {");
System.out.println(" return shape.area(); // Uncommon trap");
System.out.println(" }");
System.out.println("\n--- INLINING CONTROLS ---");
System.out.println("\nMaxInlineSize (default 35 bytecodes):");
System.out.println(" -XX:MaxInlineSize=50");
System.out.println("\nFreqInlineSize (hot methods, default 325):");
System.out.println(" -XX:FreqInlineSize=400");
System.out.println("\nMaxInlineLevel (depth, default 9):");
System.out.println(" -XX:MaxInlineLevel=12");
System.out.println("\nInlineSmallCode (default 2000):");
System.out.println(" -XX:InlineSmallCode=2500");
System.out.println("\n--- OBSERVING INLINING ---");
System.out.println("\nPrintInlining:");
System.out.println(" -XX:+UnlockDiagnosticVMOptions");
System.out.println(" -XX:+PrintInlining");
System.out.println("\nOutput:");
System.out.println(" @ 10 MyClass::method (20 bytes) inline");
System.out.println(" @ 15 OtherClass::helper (200 bytes) too big");
}
// Example: Inlining candidates
public static int add(int a, int b) {
// Trivial method - always inlined
return a + b;
}
public static int calculate(int x) {
// Small method - likely inlined
int result = add(x, 10); // add() will be inlined here
return result * 2;
}
public static void demonstrateInlining() {
System.out.println("\n=== INLINING EXAMPLE ===");
System.out.println("\nOriginal code:");
System.out.println(" int result = calculate(5);");
System.out.println("\nAfter inlining calculate():");
System.out.println(" int result = add(5, 10) * 2;");
System.out.println("\nAfter inlining add():");
System.out.println(" int result = (5 + 10) * 2;");
System.out.println("\nAfter constant folding:");
System.out.println(" int result = 30;");
System.out.println("\n✓ Inlining enabled further optimization!");
}
}
Loop Optimizations
// Loop Optimization Techniques
public class LoopOptimizations {
public static void printLoopOptimizations() {
System.out.println("=== LOOP OPTIMIZATIONS ===");
System.out.println("\n--- 1. LOOP UNROLLING ---");
System.out.println("Reduce loop overhead by expanding iterations");
System.out.println("\nOriginal:");
System.out.println(" for (int i = 0; i < 8; i++) {");
System.out.println(" sum += array[i];");
System.out.println(" }");
System.out.println("\nUnrolled (factor 4):");
System.out.println(" for (int i = 0; i < 8; i += 4) {");
System.out.println(" sum += array[i];");
System.out.println(" sum += array[i+1];");
System.out.println(" sum += array[i+2];");
System.out.println(" sum += array[i+3];");
System.out.println(" }");
System.out.println("\nBenefits:");
System.out.println(" ✓ Fewer loop iterations (less overhead)");
System.out.println(" ✓ Better instruction-level parallelism");
System.out.println(" ✓ More efficient CPU pipeline usage");
System.out.println("\n--- 2. LOOP VECTORIZATION (SIMD) ---");
System.out.println("Use CPU vector instructions (SSE, AVX)");
System.out.println("\nScalar (one element at a time):");
System.out.println(" for (int i = 0; i < n; i++) {");
System.out.println(" c[i] = a[i] + b[i];");
System.out.println(" }");
System.out.println("\nVectorized (8 elements at once with AVX):");
System.out.println(" for (int i = 0; i < n; i += 8) {");
System.out.println(" // Single CPU instruction processes 8 ints");
System.out.println(" v_c = v_a + v_b; // 256-bit vector add");
System.out.println(" }");
System.out.println("\nRequirements:");
System.out.println(" • Simple loop body");
System.out.println(" • No dependencies between iterations");
System.out.println(" • Contiguous memory access");
System.out.println(" • CPU supports SIMD (SSE2, AVX, AVX2, AVX-512)");
System.out.println("\nSpeedup: 2-8x for numeric operations");
System.out.println("\n--- 3. RANGE CHECK ELIMINATION ---");
System.out.println("Remove redundant array bounds checks");
System.out.println("\nWith bounds checks:");
System.out.println(" for (int i = 0; i < array.length; i++) {");
System.out.println(" sum += array[i]; // Bounds check on every access");
System.out.println(" }");
System.out.println("\nOptimized (bounds check eliminated):");
System.out.println(" // JIT proves i is always in bounds");
System.out.println(" for (int i = 0; i < array.length; i++) {");
System.out.println(" sum += array[i]; // No bounds check!");
System.out.println(" }");
System.out.println("\nConditions:");
System.out.println(" • Loop variable starts at 0");
System.out.println(" • Loop condition uses array.length");
System.out.println(" • No mutation of loop variable inside loop");
System.out.println("\n--- 4. LOOP INVARIANT CODE MOTION ---");
System.out.println("Move constant calculations out of loop");
System.out.println("\nOriginal:");
System.out.println(" for (int i = 0; i < n; i++) {");
System.out.println(" result[i] = array[i] * (x + y); // x+y computed every iteration");
System.out.println(" }");
System.out.println("\nOptimized:");
System.out.println(" int temp = x + y; // Moved out of loop");
System.out.println(" for (int i = 0; i < n; i++) {");
System.out.println(" result[i] = array[i] * temp;");
System.out.println(" }");
System.out.println("\n--- 5. LOOP STRENGTH REDUCTION ---");
System.out.println("Replace expensive operations with cheaper ones");
System.out.println("\nOriginal:");
System.out.println(" for (int i = 0; i < n; i++) {");
System.out.println(" result[i] = i * stride; // Multiplication every iteration");
System.out.println(" }");
System.out.println("\nOptimized:");
System.out.println(" int index = 0;");
System.out.println(" for (int i = 0; i < n; i++) {");
System.out.println(" result[i] = index;");
System.out.println(" index += stride; // Addition instead of multiplication");
System.out.println(" }");
}
// Example: Vectorizable loop
public static void addArrays(int[] a, int[] b, int[] c) {
// JIT can vectorize this with AVX
for (int i = 0; i < a.length && i < b.length && i < c.length; i++) {
c[i] = a[i] + b[i];
}
}
// Example: Range check elimination
public static long sumArray(int[] array) {
long sum = 0;
// JIT eliminates bounds checks
for (int i = 0; i < array.length; i++) {
sum += array[i];
}
return sum;
}
}
Escape Analysis
// Escape Analysis
public class EscapeAnalysis {
public static void printEscapeAnalysisConcepts() {
System.out.println("=== ESCAPE ANALYSIS ===");
System.out.println("\n--- WHAT IS ESCAPE ANALYSIS? ---");
System.out.println("Determines if object escapes method scope");
System.out.println("\nObject escapes if:");
System.out.println(" • Returned from method");
System.out.println(" • Stored in static field");
System.out.println(" • Stored in heap object");
System.out.println(" • Passed to another thread");
System.out.println("\nObject doesn't escape if:");
System.out.println(" • Only used locally in method");
System.out.println(" • Not returned or stored");
System.out.println(" • Not shared with other methods");
System.out.println("\n--- OPTIMIZATIONS ENABLED ---");
System.out.println("\n1. STACK ALLOCATION");
System.out.println(" Allocate non-escaping objects on stack");
System.out.println("\n Benefits:");
System.out.println(" ✓ No GC pressure");
System.out.println(" ✓ Automatic deallocation (stack pop)");
System.out.println(" ✓ Better cache locality");
System.out.println(" ✓ Faster allocation");
System.out.println("\n2. SCALAR REPLACEMENT");
System.out.println(" Replace object with its fields");
System.out.println("\n Example:");
System.out.println(" Point p = new Point(x, y);");
System.out.println(" return p.x + p.y;");
System.out.println("\n Optimized:");
System.out.println(" // No object allocation!");
System.out.println(" int p_x = x;");
System.out.println(" int p_y = y;");
System.out.println(" return p_x + p_y;");
System.out.println("\n3. LOCK ELISION");
System.out.println(" Remove synchronization on non-escaping objects");
System.out.println("\n Example:");
System.out.println(" StringBuffer sb = new StringBuffer();");
System.out.println(" sb.append(\"hello\"); // synchronized");
System.out.println(" sb.append(\"world\"); // synchronized");
System.out.println(" return sb.toString();");
System.out.println("\n Optimized:");
System.out.println(" // sb doesn't escape, locks removed");
System.out.println(" StringBuffer sb = new StringBuffer();");
System.out.println(" sb.append(\"hello\"); // no lock");
System.out.println(" sb.append(\"world\"); // no lock");
System.out.println(" return sb.toString();");
}
// Example: Non-escaping object (eligible for optimization)
public static int nonEscapingExample(int x, int y) {
Point p = new Point(x, y); // Doesn't escape
return p.x + p.y; // JIT can scalar replace
}
// Example: Escaping object (not eligible)
public static Point escapingExample(int x, int y) {
Point p = new Point(x, y);
return p; // Escapes via return
}
static class Point {
int x, y;
Point(int x, int y) {
this.x = x;
this.y = y;
}
}
public static void demonstrateEscapeAnalysis() {
System.out.println("\n=== ESCAPE ANALYSIS DEMONSTRATION ===");
System.out.println("\nNon-escaping object:");
System.out.println(" Point p = new Point(5, 10);");
System.out.println(" int sum = p.x + p.y;");
System.out.println(" ✓ Can be stack-allocated or scalar-replaced");
System.out.println(" ✓ No heap allocation");
System.out.println(" ✓ No GC overhead");
System.out.println("\nEscaping object:");
System.out.println(" Point p = new Point(5, 10);");
System.out.println(" return p;");
System.out.println(" ✗ Must be heap-allocated");
System.out.println(" ✗ Subject to GC");
System.out.println("\nPerformance impact:");
System.out.println(" Scalar replacement: 100-1000x faster");
System.out.println(" (Avoids allocation and GC entirely)");
}
}
// Escape Analysis Configuration
class EscapeAnalysisConfig {
/*
# Enable escape analysis (default)
java -XX:+DoEscapeAnalysis MyApp
# Disable for testing
java -XX:-DoEscapeAnalysis MyApp
# Observe escape analysis decisions
java -XX:+UnlockDiagnosticVMOptions
-XX:+PrintEscapeAnalysis MyApp
*/
}
Best Practices
- Enable tiered compilation: Default provides best warmup and peak performance.
- Allow adequate warmup: Run representative workload before measuring performance.
- Keep hot methods small: Easier to inline and optimize.
- Use final and private: Helps JIT prove call targets.
- Avoid megamorphic call sites: Limit interface implementations in hot loops.
- Write simple loops: Enable vectorization and other optimizations.
- Monitor code cache: Ensure sufficient space for compiled code.
- Profile with realistic data: Synthetic benchmarks mislead JIT.
- Use PrintCompilation sparingly: Only for analysis, not production.
- Trust the JIT: Micro-optimizations often unnecessary or counterproductive.