41.3 Memory Ops: Load/Store, Gather/Scatter
Efficient memory access is critical. The Vector API supports contiguous loads/stores and indexed gather/scatter.
41.3.1 Contiguous Load/Store
import jdk.incubator.vector.*;
static final VectorSpecies<Float> S = FloatVector.SPECIES_PREFERRED;
FloatVector v = FloatVector.fromArray(S, a, i);
v.intoArray(out, i);
Use S.loopBound(n) for main loop and handle tails with masks.
41.3.2 Slicing and Reuse
int lanes = S.length();
for (int i = 0, ub = S.loopBound(a.length); i < ub; i += lanes) {
FloatVector va = FloatVector.fromArray(S, a, i);
// reuse va for multiple operations
}
41.3.3 Gather (Indexed Load)
// indices[] holds element positions to load
IntVector idx = IntVector.fromArray(IntVector.SPECIES_PREFERRED, indices, j);
FloatVector gathered = FloatVector.gather(S, a, idx);
Gather pulls from scattered positions. Performance depends on memory locality.
41.3.4 Scatter (Indexed Store)
FloatVector data = FloatVector.fromArray(S, src, i);
IntVector targets = IntVector.fromArray(IntVector.SPECIES_PREFERRED, indices, j);
data.scatter(out, targets);
Scatter writes to scattered positions. Consider conflict resolution if indices repeat.
41.3.5 Alignment and Stride
- Aim for contiguous access aligned to cache lines
- Strided access (every k‑th) can be slower; consider transposing or packing
- Gather/scatter is powerful but may be memory‑bound