LLVM Back-End Improvements for Parabix
IDISA Overrides
The IDISA builders of the Parabix technology include many processor-specific overrides of operations that could otherwise be specified as LLVM IR. These overrides improve Parabix performance.
An example is esimd_mergeh
in the
IDISA AVX builder.
Value * IDISA_AVX2_Builder::esimd_mergeh(unsigned fw, Value * a, Value * b) { #if LLVM_VERSION_INTEGER < LLVM_VERSION_CODE(6, 0, 0) if ((fw == 128) && (mBitBlockWidth == 256)) { Value * vperm2i128func = Intrinsic::getDeclaration(getModule(), Intrinsic::x86_avx2_vperm2i128); return CreateCall(vperm2i128func, {fwCast(64, a), fwCast(64, b), getInt8(0x31)}); } #endif // Otherwise use default SSE logic. return IDISA_SSE_Builder::esimd_mergeh(fw, a, b); }
The IDISA SSE builder has no special logic for esimd_mergeh
so it defaults to the generic idisa builder logic instead.
Value * IDISA_Builder::esimd_mergeh(unsigned fw, Value * a, Value * b) { if (fw < 8) report_fatal_error("Unsupported field width: mergeh " + std::to_string(fw)); const auto field_count = mBitBlockWidth / fw; Constant * Idxs[field_count]; for (unsigned i = 0; i < field_count / 2; i++) { Idxs[2 * i] = getInt32(i + field_count / 2); // selects elements from first reg. Idxs[2 * i + 1] = getInt32(i + field_count / 2 + field_count); // selects elements from second reg. } return CreateShuffleVector(fwCast(fw, a), fwCast(fw, b), ConstantVector::get({Idxs, field_count})); }
This will create a single shufflevector
for the esimd_mergeh
operation, but LLVM 3.8 does not recognize that this operation can be implemented by vperm2i128
instruction for the case of <2 x i128> vectors.
LLVM 6.0.0 has removed support for the Intrinsic::x86_avx2_vperm2i128)
; it may be that LLVM can now
correctly recognize the shufflevector
pattern (icgrep/Parabix compiles, but does not correctly create an execution engine for LLVM 6.0.0 at present.)
Override Elimination
The goal of this project is to identify the overrides used in Parabix technology, determine whether recent versions of LLVM have improved support for the underlying "pure" IR solutions (ones not using processor-specific intrinsics), and to identify the cases that still need to be addressed. For each of these cases, then consider the necessary modifications to LLVM back-ends to directly recognize the IR patterns corresponding to the default IDISA library code and to generate optimized code in each case.