Not logged in. Login

LLVM Back-end

Parabix/LLVM Back-end issues for AVX 512 - Introduction

Notes and Resources

Long stream addition is documented in the paper: Robert D. Cameron, Thomas C. Shermer, Arrvindh Shriraman, Kenneth S. Herdy, Dan Lin, Benjamin R. Hull, Meng Lin. Bitwise data parallelism in regular expression matching,” PACT 2014, Edmonton, Canada, August 25-27, 2014.

The icgrep code to perform long-stream addition using 256 bits (AVX-2) is found beginning at line 160 in the CarryManager::addCarryInCarryOut function. Note that it uses a slight simplification of the MatchStar logic, relying on the properties of carry generation.

AVX-512 has some facilities that appear to support long-stream addition with even fewer instructions than AVX2.

  • AVX-512 has a set of 8 opmask/writemask registers to control SIMD operations on 32-bit or 64-bit fields.
  • The results of full SIMD compare operations (e.g., icmp ne <8 x i64>) are written directly to opmask registers and do not require a separate "signmask" operation.
  • Bitwise logic and addition can be applied on opmask registers.
  • The step of broadcasting "increment" bits to be added to each 64-bit field can be simplified to a single VPADDQ with an increment mask and a merging-blending operation, together with an embedded broadast operand of 1.

The AVX-512 manual can be found on-line by searching for the Intel® Architecture Instruction Set Extensions Programming Reference.

Updated Tue Feb. 16 2016, 16:03 by cameron.