Not logged in. Login

Character Class Multiplexing

The process of character class multiplexing is to find a minimal set of character class basis streams from which all the character classes used in a particular regular expression can be computed. For example, the regular expression [a-z]+ingly! has seven nominal character classes: [a-z], [i], [n], [g], [l], [y], and [!]. By applying the multiplexing concept, however, a set of three character class bit streams can be computed which represent all the possibilities.

Exclusive Character Class Set

The first step in character class multiplexing is to determine a set of mutually-exclusive and collectively-exhaustive character classes that represent the nominal character classes as well as input characters that are not in any class. In this case, the class of all characters not present in the expression is [^a-z!], while the minimal set of mutually exclusive classes that include all the characters in the expression comprises [!], [a-fhjkmo-xz], [g], [i], [l], [n], and [y].

Multiplexed Encoding

Given a set of mutually exclusive and collectively exhaustive character classes of size \(N\), a \(\log_2 N\) bit encoding can be computed to represent the classes. Given the 8 classes of our running example, we can assign a 3-bit code to each class as follows.

  • 000: [^a-z!]
  • 001: [!]
  • 010: [a-fhjkmo-xz]
  • 011: [g]
  • 100: [i]
  • 101: [l]
  • 110: [n]
  • 111: [y]

Multiplexed Classes

In order to compute the final set of multiplexed basis streams, we can determine the appropriate character classes for each bit of the multiplexed encoding. Our running example then reduces to the following 3 character classes for the multiplexed encoding (little-endian bit numbering).

  • bit 2: [inly]
  • bit 1: [a-hjkm-z]
  • bit 0: [!gly]

Why Multiplex?

  • A reduced number of streams represent all the possible character classes.
  • More expensive operations can be implemented on the reduced stream set.
    • Advance and MatchStar are more expensive than bitwise logic.
    • Example: Advance on 7 streams: multiplex to 3 streams first.
Updated Fri March 09 2018, 09:30 by cameron.