Not logged in. Login

Parabix Kernels

Kernels + Stream Sets = Parabix Programs

Parabix programs are composed of kernels operating on stream sets of parallel streams.

A stream is an arbitrary-length vector of iN fields, where N = 1 in the case of bit streams and N = 8 in the case of byte streams. The Parabix framework is generally concerned with streams in which the field widths are powers of 2.

A stream set is a set of streams all of the same field width and same length, designed to be processed together in one-to-one correspondence.

Kernels are computational abstractions that take one or more stream sets as inputs and produce one or more stream sets as outputs.

Source Kernels and Sink Kernels

Source Kernels

Parabix pipeline programs always begin with a kernel that produces an output stream set from an external source. Such a kernel is called a source kernel to reflect its role in the Parabix pipeline: it has no input stream set, but generates an output stream set as the first step in processing.

  • MMapSourceKernel: file input using mmap
  • ReadSourceKernel: file input using read (including stdin)
  • FDSourceKernel: unified kernel using either mmap or read
  • MemorySourceKernel: input from a memory buffer

Sink Kernels

The final kernel in a Parabix pipeline program is always a sink kernel, one that takes a stream set as input, but produce no stream sets as output. Sink kernels direct their output externally.

  • StdOutKernel
  • FileSinkKernel

Fixed-Rate Bitstream Kernels: Pablo

Pablo is designed to automatically handle all buffering and boundary-crossing when kernels are restricted as follows:

  • There is a one-to-one correspondence between input stream sets and output stream sets.
  • The fundamental operations are bitwise logic, stream shift (Advance) and addition (MatchStar, ScanThru).
  • Well-suited to search problems: 1 bit per position marks locations of found items.
    • regular expression matching
    • edit distance
    • parsing
  • Pablo Extensions (work in progress)
  • Handle arbitrary iN streams.
    • s2p in Pablo
    • Direct CC builder

Sequential Bitscan Kernels

After bitstreams have been computed in parallel, sequential processing of identified positions may be required.

If bit stream density is low, sequential bit scanning is very fast.

  • bsf instruction identifies the next one bit in a 64-bit word.
  • Additional BMI (bit manipulation instructions) introduced by Intel in 2013.

ScanMatch Kernel

For example, icgrep has a parallel regular expression matching phase followed by a sequential output phase.

  1. Find matches within lines and mark the ends of matched lines with a bitstream (bitwise data parallel)
  2. Scan through line breaks and matched lines bitstreams to produce output.

Early Termination: The UntilN Kernel

  • Shortens a bit stream to include only those positions up to and including the Nth one bit.

To Do:

  • Modify ScanMatch to print lines surrounding a match (grep context parameters -A, -B, -C)
  • Create a programming model/language for generalized sequential scanning.

Parabix/LLVM Kernel Programming

Processing Rates

When stream sets are not processed in strict one-to-one correspondence, processing rate attributes must be declared.

Example: base64 application.

  • This application produces an ASCII-encoded representation of arbitrary binary files.
  • Four output bytes for every three input bytes.
  • The application is build with three kernels with rate declarations:
    • expand3_4 input: FixedRate(3), output FixedRate(4).
    • radix64 default (FixedRate(1)) for input and output
    • base64 input: FixedRate(1), output RoundUpTo(4)

Example: u8u16 application

  • Input: UTF-8 code units (bytes)
  • Output: UTF-16 code units (doublebytes)
  • BoundedRate(0, 1) used in P2SKernelWithCompressedOutput

Output rate attribute: Add1()

  • Used when final carry-out bit is significant.
  • Example: icgrep

Block-Oriented Kernel Interface

  • Simplest interface for kernel programming at LLVM level.
  • Programmer writes the logic for processing a single block of input.
  • Programmer must explicitly store any state information that must be carried from block to block.

Multiblock Kernel Interface

  • General efficient interface for kernel programming.
  • Always called with either
    • a whole multiple number of blocks of data, or
    • a final partial block.
  • Pipeline makes sure that there is sufficient data on all input buffers and sufficient space on all output buffers according to declared processing rates.
  • Programmer must updated processed and produced item counts for any streams with inexact processing rates.

To Do: Generalized Filtering/Merging Kernels

Deletion/Extraction

  • A deletion mask marks positions that are to be deleted from one or more input streams.
  • Output stream sets are compressed, including only those bits at non-deleted positions.
  • An extraction mask is the logical complement of a deletion mask.
    • Define the popcount(extraction_stream) processing rate for output.
    • Note Intel BMI instructions include PEXT.
  • Parabix contains several implementations involving deletion.
    • Different algorithmic choices.
    • NEEDS unification/simplification

Expansion/Deposit

  • An deposit mask is the used to spread bits of a stream out.
  • Can be used to merge two stream sets:
    • 0 bits select items from stream set 1
    • 1 bits select items from stream set 2
  • To do: use the popcount(deposit_stream) processing rate for input.
  • Note Intel BMI instructions include PDEP.
  • Partially implemented, but needs work.
Updated Fri Feb. 02 2018, 07:26 by cameron.