Not logged in. Login

The Inside-Out Simplification Method

In many symbolic computing applications, a simplifier is a program that transforms one instance of a symbolic representation into a simpler, but semantically equivalent form.

For example, in the domain of algebraic expressions, simplification of the symbolic form \(2 \times x \times 3 \times power(x, 2)\) could yield \(6 \times power(x, 3)\).

Simplifiers typically work with two types of rules.

  1. Simplification rules.
  2. Rules for conversion to canonical form.

Simplification rules generally replace one form of symbolic structure with an equivalent, but smaller structure. Conversions to canonical form typically do not simplify a structure, but make it possible to more easily recognize simplifications in later steps.

Simplification of Regular Expressions

As an illustration of simplification and conversion to canonical form, consider the domain of regular expressions and the following set of rules.

data RE = Epsilon | Ch Char | Seq RE RE | Alt RE RE | Star RE 
          | Group RE deriving Show

Simplification Rules

  • Seq Epsilon r \(\rightarrow\) r
  • Seq r Epsilon \(\rightarrow\) r
  • Alt r r \(\rightarrow\) r
  • (Star (Star r)) \(\rightarrow\) Star r
  • (Star (Seq (Star r1) (Star r2))) \(\rightarrow\) Star (Alt r1 r2)
  • Group r \(\rightarrow\) r

Inside-Out Simplifier for Regular Expressions

Given a set of simplification rules, how do we construct a simplifier that systematically applies these rules to a given symbolic data object. The goal of systematic simplfication is to apply these rules as many times as necessary to yield a fully simplified symbolic form: one in which no further application of the simplification rules is possible.

In general, systematic simplification of a symbolic data domain can be achieved with the inside-out method. This method consists of two basic concepts.

  1. A recursive simplification step in which the subexpressions of an expression are first simplified (the "inside" of inside-out).
  2. A recombination step in which the expression is rebuilt using the simplified subexpressions and applying any relevant simplification rules (the "out" of inside-out).

A general organization of an inside-out simplifier is the following. First, there is one makeXXX function for each different type XXX of the symbolic domain. This makeXXX is responsible for building a symbolic object using the components of an XXX, but applying simplification rules if necessary.

Based on the structure of the RE domain, make- functions are defined with the following signatures.

makeChar :: Char -> RE
makeSeq :: RE -> RE -> RE
makeAlt :: RE -> RE -> RE
makeStar :: RE -> RE
makeGroup :: RE -> RE

Given these functions, the inside-out simplifier for regular expressions is a straightforward recursive function.

simplifyRE :: RE -> RE

simplifyRE Epsilon = Epsilon
simplifyRE (Ch c) = makeChar c
simplifyRE (Seq r1 r2) = makeSeq (simplifyRE r1) (simplifyRE r2)
simplifyRE (Alt r1 r2) = makeAlt (simplifyRE r1) (simplifyRE r2)
simplifyRE (Star r) = makeStar (simplifyRE r)
simplifyRE (Group r) = makeGroup (simplifyRE r)

How can we be sure that this simplifier systematically simplifies a regular expression. The key is structural induction and the correct implementation of the makeXXX functions. Each makeXXX function is giving the job of making a regular expression equivalent to an XXX of the given components, assuming that those components are in simplified form. That is, we assume that each component is thoroughly simplified, and simply build a new structure that satisfies the given simplification rules.

Following this approach, we can implement the regular expression simplifier using the following makeXXX functions.

First the makeChar function has a single equation, because we have no simplification rules defined in this case.

makeChar c = Ch c

Note that defining the makeChar function does allow us to modify our simplifier to introduce simplification rules for Char later on, if we wish.

Next, the makeSeq function has three equations as follows.

makeSeq Epsilon r = r
makeSeq r Epsilon = r
makeSeq r1 r2 = Seq r1 r2

The first two equations correspond to the two simplification rules we defined for Seq objects. The final equation is the default rule when there is no special simplification that applies for construction of a Seq from a given pair of simplified subobjects.

The makeAlt function follows a similar strategy, but with guarded equations. The first guarded equation corresponds to our defined simplification rule, while the second equation is the default (otherwise) case.

makeAlt r1 r2 
   | r1 == r2      = r1
   | otherwise     = Alt r1 r2

However, what exactly does == mean when applied to REs? We have not defined the equality operator REs, but we can by adding one more deriving declaration to our RE algebraic data definition.

data RE = Epsilon | Ch Char | Seq RE RE | Alt RE RE | Star RE 
          | Group RE deriving (Show, Eq)

This will give us an automatically defined (==) operator which checks for structural equality. (However, it does not implement a more complex notion of the equivalence of two REs, which means that the REs describe the same set of strings.)

For the Star forms, we again have three equations. The first two correspond to our two simplification rules and the third is the default case when no further simplifications apply.

makeStar (Star r) = Star r
makeStar (Seq (Star r1) (Star r2)) = makeStar (makeAlt r1 r2)
makeStar r = Star r

An important point here is the use of the makeStar and makeAlt functions on the right hand side of the second equation. If we had simply used Star (Alt r1 r2))), we would have a correct structure, but not necessarily one that is in fully simplified form.

Finally, Group structures can always be simplified out, so a single equation is all that is necessary.

makeGroup r = r

This completes the inside-out simplifier for regular expressions!

Updated Tue Sept. 25 2018, 12:52 by cameron.