Compilers of the Parabix Framework
The Parabix Framework contains several compilers that are used to generate code for various applications. The most important of these is the Pablo language compiler, which is useful for all kinds of applications involving bitwise data parallelism.
In addition to the Pablo compiler, there are several compiles that generate Pablo code. These include:
- The Character Code compiler.
- The Unicode Transformation Format (UTF) Compiler.
- The Unicode Property Compiler.
- The Bixnum Compiler.
- The Regular Expression Compiler.
The Pablo Language and Compiler
Pablo is a language for expressing transformations on stream sets. In general, a Pablo kernel takes one or more stream sets as input and produces stream sets (or possibly scalar values) as output. Pablo kernels are purely functional. That is, the only effect of a Pablo kernel is the output it produces in response to its inputs. There are no side effects that are visible to the pipeline or other kernels in any way.
Here is a simple Pablo program for counting the number of lines in a Unix line file given the basis bits representation of the file content.
type BasisBits = <i1>[8] kernel LineCount :: [BasisBits bb] -> [i64 lc] { or = bb[0] | bb[2] | bb[4] | bb[5] | bb[6] | bb[7] and = bb[1] & bb[3] lf = ~or & and lc = Count(lf) }
In Unix, lines are terminated by the LF character hex 0A. To recognize
line feed characters, both bit 1 and bit 3 at a character position must
be 1 bits, and the other bits must all be 0 bits. This is achieved
by the bitwise logic in the program computing the variable lf
.
The Pablo Count
operation produces the population count of a
given bit stream. By counting the number of LF characters, we determine
the number of lines.
Most of Pablo programs are built dynamically using a Pablo builder.
An instance of the PabloKernel
class declares a new function
that can create a Pablo kernel. The constructor for a Pablo kernel
declares the input and outputs of the kernel and also gives it a
unique name. The name must be unique so that the code can be saved
and reused without recompiling if it is needed again. The logic
of the Pablo kernel is created by its generatePabloMethod()
,
which also dynamically calls the Pablo compiler to generate LLVM code.
A simple example of a Pablo kernel is Invert
takes a single
bit stream as input and produces the logical inverse of that stream.
This can be generated using the Pablo
builder from within the generatePabloMethod
.
class Invert : public PabloKernel { public: Invert(KernelBuilder & b, StreamSet * mask, StreamSet * inverted) : PabloKernel(b, "Invert", {Binding{"mask", mask}}, {Binding{"inverted", inverted}}) {} protected: void generatePabloMethod() override; }; void Invert::generatePabloMethod() { pablo::PabloBuilder pb(getEntryScope()); PabloAST * mask = getInputStreamSet("mask")[0]; PabloAST * inverted = pb.createInFile(pb.createNot(mask)); Var * outVar = getOutputStreamVar("inverted"); pb.createAssign(pb.createExtract(outVar, pb.getInteger(0)), inverted); }
The actual compilation of a Pablo kernel takes place through integration of a call to the kernel in a Parabix pipeline.
StreamSet * inverted = P->CreateStreamSet(1); P->CreateKernelCall<Invert>(mask, inverted);
Character Code Compilers
Character Code Compilers Wiki Page
The Unicode Property Compiler
Unicode Property Database and Compilers
The Unicode Transformation Format (UTF) Compiler
This compiler is used to compile full Unicode character classes represented in standard Unicode formats such as UTF-8 or UTF-16.
The Bixnum Compiler
This is a compiler for manipulating representations of integers in parallel bit stream format.
The nfd.cpp program illustrates the use of bixnums in the HangulNFD kernel.
The Regular Expression Compiler
The regular expression compiler is the heart of the icgrep program.