Regula Language Design: Motivation
Regula Language Design: Motivation
Ultra-fast regular expression matching technology has been developed based on the Parabix framework.
The icgrep search program demonstrates this technology in a high-performance, Unicode-compliant grep replacement.
The performance of regular expression matching compiled with the Parabix-LLVM framework is better than possible using hand-written byte-at-a-time code written in C or even in assembly. One reason is that Parabix code exploits wide SIMD registers in modern processors to process 128 bytes at a time or 256 bytes at a time (AVX2).
High Performance, Robust Performance
- icgrep offers both high-speed regular expression matching as well as robust performance.
- The performance of other regular expression technologies can deteriorate badly in the case of some kinds of complicated regular expression (catastrophic backtracking).
- The parallel algorithms of icgrep naturally handles ambiguous nondeterminstic matching, efficiently producing bit streams of all possible matches at each step.
Language Design Opportunity
The performance of Parabix regular expression technology offers an interesting new possibility: design new regular expression application frameworks that address the software engineering challenges associated with regular expression libraries in present-day languages.
- Better syntax without backslash madness
- Libraries of named regular expressions representing standard concepts, such as ISO date formats, IETF email syntax, Unicode properties and so on.
- Combine regular expression matching with data manipulation features to build whole applications entirely with declarative code.