Hoisting
Intrinsic Hoisting - Introductory Examples
Notes and Resources
- This is a good introduction to the hoisting issue and some problems.
- I think that pmuludq should essentially be hoisted to the following pure vertical operation on <2 x i64>.
define <2 x i64> @muludq(<2 x i64> %a, <2 x i64> %b) { entry: %0 = and <2 x i64> %a, <i64 4294967295, i64 4294967295> %1 = and <2 x i64> %b, <i64 4294967295, i64 4294967295> %2 = mul <2 x i64> %0, %1 ret <2 x i64> %2 }
This is then translated by llc to the familiar sequence with 3 calls to pmuludq intrinsic.
LCPI0_0: .quad 4294967295 ## 0xffffffff .quad 4294967295 ## 0xffffffff .section __TEXT,__text,regular,pure_instructions .globl _muludq5 .align 4, 0x90 _muludq5: ## @muludq5 .cfi_startproc ## BB#0: ## %entry movdqa LCPI0_0(%rip), %xmm2 ## xmm2 = [4294967295,4294967295] pand %xmm2, %xmm0 pand %xmm2, %xmm1 movdqa %xmm0, %xmm2 pmuludq %xmm1, %xmm2 movdqa %xmm1, %xmm3 psrlq $32, %xmm3 pmuludq %xmm0, %xmm3 psllq $32, %xmm3 paddq %xmm3, %xmm2 psrlq $32, %xmm0 pmuludq %xmm1, %xmm0 psllq $32, %xmm0 paddq %xmm2, %xmm0 retq
The three pmuludq are not a scalarization, but the generic way of using pmuludq to implement <2 x i64> multiplication. The problem here is that the LLVM backend cannot figure out that the psrlq $32 operations generate an all zeroes value in each case.
Updated Tue Feb. 09 2016, 12:09 by cameron.