41.3 Memory Ops: Load/Store, Gather/Scatter

Efficient memory access is critical. The Vector API supports contiguous loads/stores and indexed gather/scatter.


41.3.1 Contiguous Load/Store

import jdk.incubator.vector.*;
static final VectorSpecies<Float> S = FloatVector.SPECIES_PREFERRED;

FloatVector v = FloatVector.fromArray(S, a, i);
v.intoArray(out, i);

Use S.loopBound(n) for main loop and handle tails with masks.


41.3.2 Slicing and Reuse

int lanes = S.length();
for (int i = 0, ub = S.loopBound(a.length); i < ub; i += lanes) {
  FloatVector va = FloatVector.fromArray(S, a, i);
  // reuse va for multiple operations
}

41.3.3 Gather (Indexed Load)

// indices[] holds element positions to load
IntVector idx = IntVector.fromArray(IntVector.SPECIES_PREFERRED, indices, j);
FloatVector gathered = FloatVector.gather(S, a, idx);

Gather pulls from scattered positions. Performance depends on memory locality.


41.3.4 Scatter (Indexed Store)

FloatVector data = FloatVector.fromArray(S, src, i);
IntVector targets = IntVector.fromArray(IntVector.SPECIES_PREFERRED, indices, j);
data.scatter(out, targets);

Scatter writes to scattered positions. Consider conflict resolution if indices repeat.


41.3.5 Alignment and Stride

  • Aim for contiguous access aligned to cache lines
  • Strided access (every k‑th) can be slower; consider transposing or packing
  • Gather/scatter is powerful but may be memory‑bound