41.1 Vector API Overview

Exploit SIMD (Single Instruction, Multiple Data) hardware for high‑performance numeric computations.

Status note: The API’s module and stability level can vary by JDK release. Many versions expose it as jdk.incubator.vector. Check your JDK documentation and enable modules as needed.

Vector Species

// Module may be jdk.incubator.vector in some JDKs
import jdk.incubator.vector.*;

static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_PREFERRED;

Species define vector shape (element type and lane count).

Vectorized Loop

void addArrays(float[] a, float[] b, float[] c) {
  int i = 0;
  int upperBound = SPECIES.loopBound(a.length);

  for (; i < upperBound; i += SPECIES.length()) {
    FloatVector va = FloatVector.fromArray(SPECIES, a, i);
    FloatVector vb = FloatVector.fromArray(SPECIES, b, i);
    va.add(vb).intoArray(c, i);
  }

  // Handle tail
  for (; i < a.length; i++) {
    c[i] = a[i] + b[i];
  }
}

Masking

VectorMask<Float> mask = SPECIES.indexInRange(i, a.length);
FloatVector va = FloatVector.fromArray(SPECIES, a, i, mask);

Reductions

Sum an array with vector accumulation + tail:

float sum(float[] a) {
  int i = 0;
  int ub = SPECIES.loopBound(a.length);
  FloatVector acc = FloatVector.zero(SPECIES);
  for (; i < ub; i += SPECIES.length()) {
    acc = acc.add(FloatVector.fromArray(SPECIES, a, i));
  }
  float total = acc.reduceLanes(VectorOperators.ADD);
  for (; i < a.length; i++) total += a[i];
  return total;
}

Performance

  • Achieves 2–8x speedup on supported hardware (AVX2, AVX-512, NEON).
  • Requires JVM intrinsics and hardware support.
  • Fallback to scalar on unsupported platforms.

Verification

Use JMH or JVM diagnostic flags to confirm vector intrinsics (flags vary by JDK and may change):

java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining MyVectorApp

Look for vectorized operations and confirm perf with microbenchmarks.

Guidance

  • Use Vector API for compute-intensive numeric loops (image processing, ML, simulations).
  • Profile to confirm speedup (depends on hardware, JVM, and data layout).
  • Keep vector code simple; complex control flow limits vectorization.
  • Use SPECIES.loopBound() and masks to handle array tails correctly.