Not logged in. Login

Compilers of the Parabix Framework

The Parabix Framework contains several compilers that are used to generate code for various applications. The most important of these is the Pablo language compiler, which is useful for all kinds of applications involving bitwise data parallelism.

In addition to the Pablo compiler, there are several compiles that generate Pablo code. These include:

  • The Character Code compiler.
  • The Unicode Transformation Format (UTF) Compiler.
  • The Unicode Property Compiler.
  • The Bixnum Compiler.
  • The Regular Expression Compiler.

The Pablo Language and Compiler

Pablo is a language for expressing transformations on stream sets. In general, a Pablo kernel takes one or more stream sets as input and produces stream sets (or possibly scalar values) as output. Pablo kernels are purely functional. That is, the only effect of a Pablo kernel is the output it produces in response to its inputs. There are no side effects that are visible to the pipeline or other kernels in any way.

Here is a simple Pablo program for counting the number of lines in a Unix line file given the basis bits representation of the file content.

type BasisBits = <i1>[8]

kernel LineCount :: [BasisBits bb] -> [i64 lc] {
    or = bb[0] | bb[2] | bb[4] | bb[5] | bb[6] | bb[7]
    and = bb[1] & bb[3]
    lf = ~or & and

    lc = Count(lf)
}

In Unix, lines are terminated by the LF character hex 0A. To recognize line feed characters, both bit 1 and bit 3 at a character position must be 1 bits, and the other bits must all be 0 bits. This is achieved by the bitwise logic in the program computing the variable lf. The Pablo Count operation produces the population count of a given bit stream. By counting the number of LF characters, we determine the number of lines.

Most of Pablo programs are built dynamically using a Pablo builder. An instance of the PabloKernel class declares a new function that can create a Pablo kernel. The constructor for a Pablo kernel declares the input and outputs of the kernel and also gives it a unique name. The name must be unique so that the code can be saved and reused without recompiling if it is needed again. The logic of the Pablo kernel is created by its generatePabloMethod(), which also dynamically calls the Pablo compiler to generate LLVM code.

A simple example of a Pablo kernel is Invert takes a single bit stream as input and produces the logical inverse of that stream. This can be generated using the Pablo builder from within the generatePabloMethod.

class Invert : public PabloKernel {
public:
    Invert(KernelBuilder & b, StreamSet * mask, StreamSet * inverted)
        : PabloKernel(b, "Invert",
                      {Binding{"mask", mask}},
                      {Binding{"inverted", inverted}}) {}
protected:
    void generatePabloMethod() override;
};

void Invert::generatePabloMethod() {
    pablo::PabloBuilder pb(getEntryScope());
    PabloAST * mask = getInputStreamSet("mask")[0];
    PabloAST * inverted = pb.createInFile(pb.createNot(mask));
    Var * outVar = getOutputStreamVar("inverted");
    pb.createAssign(pb.createExtract(outVar, pb.getInteger(0)), inverted);
}

The actual compilation of a Pablo kernel takes place through integration of a call to the kernel in a Parabix pipeline.

    StreamSet * inverted = P->CreateStreamSet(1);
    P->CreateKernelCall<Invert>(mask, inverted);

Character Code Compilers

Character Code Compilers Wiki Page

The Unicode Property Compiler

Unicode Property Database and Compilers

The Unicode Transformation Format (UTF) Compiler

This compiler is used to compile full Unicode character classes represented in standard Unicode formats such as UTF-8 or UTF-16.

The Bixnum Compiler

This is a compiler for manipulating representations of integers in parallel bit stream format.

The nfd.cpp program illustrates the use of bixnums in the HangulNFD kernel.

The Regular Expression Compiler

The regular expression compiler is the heart of the icgrep program.

Updated Tue Dec. 17 2024, 05:25 by cameron.