Not logged in. Login

Symbolic Computing and the Parabix Project

Parabix: Parallel Bit Stream Technology

  • Parabix is a concept for high-performance streaming text processing.
  • It is designed to exploit the wide SIMD registers that exist on all modern processors.
    • SIMD = Single Instruction Multiple Data
    • SSE2: 128-bit registers on all Intel x86-64 processors
    • AVX2: 256-bit registers on Intel Haswell (2013) and later processors
    • AVX-512: 512-bit registers on the latest Intel processors
    • Neon: 128-bit registers on ARM processors
  • Parabix transforms byte-oriented text to 8 parallel bit streams.
    • Allocate one bit position per input byte.
    • Use wide SIMD registers to perform bitwise logic, shifting, addition.
    • Process 128 bytes at a time using 128-bit registers (SSE2).
    • Performance scales up with SIMD register width.
  • Parabix has been used to accelerate streaming text applications.
    • UTF-8 to UTF-16 transcoding
    • XML parsing
    • RNA search
    • Regular expression search
    • JSON parsing

Bitwise Data Parallelism

  • Parabix algorithms are based on the concept of bitwise data parallelism.
    • All bit positions in a block (e.g., 128 bits) are processed at the same time.
    • Information can flow between bit positions using internal carry propagation (addition operations) as well as using explicit shift operations.
    • Information can flow between blocks using explicit carry bits.

Parabix Programming Challenge

  • Traditional programming languages are not well suited to programming Parabix applications.
    • Little support for bit-stream programming.
    • Little support for SIMD programming.
    • Programmers must manage carry processing.
  • Parallel programming model is difficult: how do you handle all positions in a block simultaneously?
    • How can sequential dependencies be translated into carry propagation?

Symbolic Computing for Parabix

  • Goal: to create new languages and notations to allow streaming text processing applications to take advantage of Parabix techniques.
  • Ongoing work: student participation invited.
    • CMPT 384 term projects.
    • CMPT 415/6 projects.
    • NSERC Undergraduate Student Research Assistantships (USRA).
    • MSc and PhD projects.

Pablo

  • Pablo: a simple bit-stream language for Parabix applications.
    • Supports the concept of unbounded bit streams.
    • Supports the stream set model: sets of bit streams that are processed in one-to-one correspondence.
    • Programmers do not write buffer allocation or buffer-boundary code.
    • All block-to-block information flow is handled automatically by the Pablo compiler.
    • Programs written using the unbounded stream model can easily scale to different block widths.
  • Pablo was first used for programming the Parabix XML parser.
    • Python syntax was used and bit streams were represented as Python unbounded integers.
    • Python-Pablo compiler translated Pablo programs to C++ with SIMD intrinsics.
  • Parabix framework now uses Pablo as an internal representation.
    • Pablo code is generated by Parabix compilers.
    • The Parabix Character Class compiler can translate character classes into Pablo code.
    • The Unicode Property Compiler can translate Unicode class definitions into Pablo Code.
    • The regular expression compiler also generates Pablo code.
  • Parabix framework compiles Pablo programs to LLVM IR.
    • Pablo compiler automatically handles block-by-block processing and carry management.
  • Pablo Projects
    • Define concrete syntax, parsers, unparsers.
    • Incorporate Parabix Pablo extensions.
    • Support arbitrary iN streams, streams of N-bit fields, where N is a power of 2.

Regular Expressions

  • Regular expressions are specifications for sets of strings.
    • Programmers do not need to write low-level string matching functions.
    • Regular-expression tools and libraries can speed up application development.
    • Current regular expression syntax such as icgrep syntax is somewhat ugly.
    • Can more elegant and/or powerful notations be devised?
  • Parabix Grammar Support Projects
    • Design improved regular expression syntax/support.
    • Design a system for libraries of named regular expressions.
    • Design and build Parabix parser generators for BNF/EBNF.
    • Design extended grammar languages for transducers.
Updated Thu Oct. 11 2018, 11:58 by cameron.