Lane Crossing Issues in Wide SIMD Instruction Sets
Pack and Merge: Key Operations in Transposition
- The key operation in Parabix transposition (
s2p
) is the Horizontal SIMDpack
operation: pack \(2n\)-bit fields into \(n\) bits. - The key operation in Parabix reverse transposition (
p2s
) is the | IDISA Expansionmerge
operation.
Pure Merge in SSE2
The SSE2 instruction set has an operation punpcklwd
which
is a pure merge operation. The Intel Intrinsic guide
has the following definition.
__m128i _mm_unpacklo_epi16 (__m128i a, __m128i b) #include "emmintrin.h" Instruction: punpcklwd xmm, xmm CPUID Flags: SSE2
INTERLEAVE_WORDS(src1[127:0], src2[127:0]){ dst[15:0] := src1[15:0] dst[31:16] := src2[15:0] dst[47:32] := src1[31:16] dst[63:48] := src2[31:16] dst[79:64] := src1[47:32] dst[95:80] := src2[47:32] dst[111:96] := src1[63:48] dst[127:112] := src2[63:48] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
AVX2: Merge/Unpack Within Two 128-bit Lanes
With AVX2, the unpack instruction was extended to 256 bits, but was defined to operate within separate lanes.
__m256i _mm256_unpacklo_epi16 (__m256i a, __m256i b) #include "immintrin.h" Instruction: vpunpcklwd ymm, ymm, ymm CPUID Flags: AVX2
dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128]) dst[MAX:256] := 0
Because it is defined within lanes, it cannot be used to directly
implement the simd_merge
operation. In order to implement
the simd_merge
operation, additional work is necessary
to swap the low 128 bits of the first register with the high 128 bits of the second register.
AVX-512BW: Merge/Unpack Within Four 128-bit Lanes
AVX-512 continues the extension of SIMD operations to 512 bits, but now has 4 128-bit lanes. This adds further cost to the implementation of simd-merge
.
__m512i _mm512_unpacklo_epi16 (__m512i a, __m512i b) #include "immintrin.h" Instruction: vpunpcklwd CPUID Flags: AVX512BW
dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128]) dst[383:256] := INTERLEAVE_WORDS(a[383:256], b[383:256]) dst[511:384] := INTERLEAVE_WORDS(a[511:384], b[511:384])
Updated Wed March 07 2018, 08:13 by cameron.