Hoisting

Intrinsic Hoisting - Introductory Examples

Notes and Resources

This is a good introduction to the hoisting issue and some problems.
I think that pmuludq should essentially be hoisted to the following pure vertical operation on <2 x i64>.

define <2 x i64> @muludq(<2 x i64> %a, <2 x i64> %b)  {
entry:
  %0 = and <2 x i64> %a, <i64 4294967295, i64 4294967295>
  %1 = and <2 x i64> %b, <i64 4294967295, i64 4294967295>
  %2 = mul <2 x i64> %0, %1
  ret <2 x i64> %2
}

This is then translated by llc to the familiar sequence with 3 calls to pmuludq intrinsic.

LCPI0_0:
        .quad   4294967295              ## 0xffffffff
        .quad   4294967295              ## 0xffffffff
        .section        __TEXT,__text,regular,pure_instructions
        .globl  _muludq5
        .align  4, 0x90
_muludq5:                               ## @muludq5
        .cfi_startproc
## BB#0:                                ## %entry
        movdqa  LCPI0_0(%rip), %xmm2    ## xmm2 = [4294967295,4294967295]
        pand    %xmm2, %xmm0
        pand    %xmm2, %xmm1
        movdqa  %xmm0, %xmm2
        pmuludq %xmm1, %xmm2
        movdqa  %xmm1, %xmm3
        psrlq   $32, %xmm3
        pmuludq %xmm0, %xmm3
        psllq   $32, %xmm3
        paddq   %xmm3, %xmm2
        psrlq   $32, %xmm0
        pmuludq %xmm1, %xmm0
        psllq   $32, %xmm0
        paddq   %xmm2, %xmm0
        retq

The three pmuludq are not a scalarization, but the generic way of using pmuludq to implement <2 x i64> multiplication. The problem here is that the LLVM backend cannot figure out that the psrlq $32 operations generate an all zeroes value in each case.

Updated Tue Feb. 09 2016, 12:09 by cameron.

Simon Fraser University
Engaging the World

CourSys

Hoisting

Notes and Resources