Hoisting
Intrinsic Hoisting - Introductory Examples
Notes and Resources
- This is a good introduction to the hoisting issue and some problems.
- I think that pmuludq should essentially be hoisted to the following pure vertical operation on <2 x i64>.
define <2 x i64> @muludq(<2 x i64> %a, <2 x i64> %b) {
entry:
%0 = and <2 x i64> %a, <i64 4294967295, i64 4294967295>
%1 = and <2 x i64> %b, <i64 4294967295, i64 4294967295>
%2 = mul <2 x i64> %0, %1
ret <2 x i64> %2
}
This is then translated by llc to the familiar sequence with 3 calls to pmuludq intrinsic.
LCPI0_0:
.quad 4294967295 ## 0xffffffff
.quad 4294967295 ## 0xffffffff
.section __TEXT,__text,regular,pure_instructions
.globl _muludq5
.align 4, 0x90
_muludq5: ## @muludq5
.cfi_startproc
## BB#0: ## %entry
movdqa LCPI0_0(%rip), %xmm2 ## xmm2 = [4294967295,4294967295]
pand %xmm2, %xmm0
pand %xmm2, %xmm1
movdqa %xmm0, %xmm2
pmuludq %xmm1, %xmm2
movdqa %xmm1, %xmm3
psrlq $32, %xmm3
pmuludq %xmm0, %xmm3
psllq $32, %xmm3
paddq %xmm3, %xmm2
psrlq $32, %xmm0
pmuludq %xmm1, %xmm0
psllq $32, %xmm0
paddq %xmm2, %xmm0
retq
The three pmuludq are not a scalarization, but the generic way of using pmuludq to implement <2 x i64> multiplication. The problem here is that the LLVM backend cannot figure out that the psrlq $32 operations generate an all zeroes value in each case.
Updated Tue Feb. 09 2016, 12:09 by cameron.