Symbolic Computing and the Parabix Project
Parabix: Parallel Bit Stream Technology
- Parabix is a concept for high-performance streaming text processing.
- It is designed to exploit the wide SIMD registers that exist on all modern processors.
- SIMD = Single Instruction Multiple Data
- SSE2: 128-bit registers on all Intel x86-64 processors
- AVX2: 256-bit registers on Intel Haswell (2013) and later processors
- AVX-512: 512-bit registers on the latest Intel processors
- Neon: 128-bit registers on ARM processors
- Parabix transforms byte-oriented text to 8 parallel bit streams.
- Allocate one bit position per input byte.
- Use wide SIMD registers to perform bitwise logic, shifting, addition.
- Process 128 bytes at a time using 128-bit registers (SSE2).
- Performance scales up with SIMD register width.
- Parabix has been used to accelerate streaming text applications.
- UTF-8 to UTF-16 transcoding
- XML parsing
- RNA search
- Regular expression search
- JSON parsing
Bitwise Data Parallelism
- Parabix algorithms are based on the concept of bitwise data parallelism.
- All bit positions in a block (e.g., 128 bits) are processed at the same time.
- Information can flow between bit positions using internal carry propagation (addition operations) as well as using explicit shift operations.
- Information can flow between blocks using explicit carry bits.
Parabix Programming Challenge
- Traditional programming languages are not well suited to programming Parabix applications.
- Little support for bit-stream programming.
- Little support for SIMD programming.
- Programmers must manage carry processing.
- Parallel programming model is difficult: how do you handle all positions in a block simultaneously?
- How can sequential dependencies be translated into carry propagation?
Symbolic Computing for Parabix
- Goal: to create new languages and notations to allow streaming text processing applications to take advantage of Parabix techniques.
- Ongoing work: student participation invited.
- CMPT 384 term projects.
- CMPT 415/6 projects.
- NSERC Undergraduate Student Research Assistantships (USRA).
- MSc and PhD projects.
Pablo
- Pablo: a simple bit-stream language for Parabix applications.
- Supports the concept of unbounded bit streams.
- Supports the stream set model: sets of bit streams that are processed in one-to-one correspondence.
- Programmers do not write buffer allocation or buffer-boundary code.
- All block-to-block information flow is handled automatically by the Pablo compiler.
- Programs written using the unbounded stream model can easily scale to different block widths.
- Pablo was first used for programming the Parabix XML parser.
- Python syntax was used and bit streams were represented as Python unbounded integers.
- Python-Pablo compiler translated Pablo programs to C++ with SIMD intrinsics.
- Parabix framework now uses Pablo as an internal representation.
- Pablo code is generated by Parabix compilers.
- The Parabix Character Class compiler can translate character classes into Pablo code.
- The Unicode Property Compiler can translate Unicode class definitions into Pablo Code.
- The regular expression compiler also generates Pablo code.
- Parabix framework compiles Pablo programs to LLVM IR.
- Pablo compiler automatically handles block-by-block processing and carry management.
- A Haskell Haskell prototype for Pablo implements basic Pablo features.
- Pablo Projects
- Define concrete syntax, parsers, unparsers.
- Incorporate Parabix Pablo extensions.
- Support arbitrary
iN
streams, streams ofN
-bit fields, whereN
is a power of 2.
Regular Expressions
- Regular expressions are specifications for sets of strings.
- Programmers do not need to write low-level string matching functions.
- Regular-expression tools and libraries can speed up application development.
- Current regular expression syntax such as icgrep syntax is somewhat ugly.
- Can more elegant and/or powerful notations be devised?
- Parabix Grammar Support Projects
- Design improved regular expression syntax/support.
- Design a system for libraries of named regular expressions.
- Design and build Parabix parser generators for BNF/EBNF.
- Design extended grammar languages for transducers.
Updated Thu Oct. 11 2018, 11:58 by cameron.