Not logged in. Login

LLVM Back-End Improvements for Parabix

IDISA Overrides

The IDISA builders of the Parabix technology include many processor-specific overrides of operations that could otherwise be specified as LLVM IR. These overrides improve Parabix performance.

An example is esimd_mergeh in the IDISA AVX builder.

Value * IDISA_AVX2_Builder::esimd_mergeh(unsigned fw, Value * a, Value * b) {
#if LLVM_VERSION_INTEGER < LLVM_VERSION_CODE(6, 0, 0)
    if ((fw == 128) && (mBitBlockWidth == 256)) {
        Value * vperm2i128func = Intrinsic::getDeclaration(getModule(), Intrinsic::x86_avx2_vperm2i128);
        return CreateCall(vperm2i128func, {fwCast(64, a), fwCast(64, b), getInt8(0x31)});
    }
#endif
    // Otherwise use default SSE logic.
    return IDISA_SSE_Builder::esimd_mergeh(fw, a, b);
}

The IDISA SSE builder has no special logic for esimd_mergeh so it defaults to the generic idisa builder logic instead.

Value * IDISA_Builder::esimd_mergeh(unsigned fw, Value * a, Value * b) {    
    if (fw < 8) report_fatal_error("Unsupported field width: mergeh " + std::to_string(fw));
    const auto field_count = mBitBlockWidth / fw;
    Constant * Idxs[field_count];
    for (unsigned i = 0; i < field_count / 2; i++) {
        Idxs[2 * i] = getInt32(i + field_count / 2); // selects elements from first reg.
        Idxs[2 * i + 1] = getInt32(i + field_count / 2 + field_count); // selects elements from second reg.
    }
    return CreateShuffleVector(fwCast(fw, a), fwCast(fw, b), ConstantVector::get({Idxs, field_count}));
}

This will create a single shufflevector for the esimd_mergeh operation, but LLVM 3.8 does not recognize that this operation can be implemented by vperm2i128 instruction for the case of <2 x i128> vectors.

LLVM 6.0.0 has removed support for the Intrinsic::x86_avx2_vperm2i128); it may be that LLVM can now correctly recognize the shufflevector pattern (icgrep/Parabix compiles, but does not correctly create an execution engine for LLVM 6.0.0 at present.)

Override Elimination

The goal of this project is to identify the overrides used in Parabix technology, determine whether recent versions of LLVM have improved support for the underlying "pure" IR solutions (ones not using processor-specific intrinsics), and to identify the cases that still need to be addressed. For each of these cases, then consider the necessary modifications to LLVM back-ends to directly recognize the IR patterns corresponding to the default IDISA library code and to generate optimized code in each case.

Updated Mon Jan. 29 2018, 07:48 by cameron.