The Inside-Out Simplification Method
In many symbolic computing applications, a simplifier is a program that transforms one instance of a symbolic representation into a simpler, but semantically equivalent form.
For example, in the domain of algebraic expressions, simplification of the symbolic form \(2 \times x \times 3 \times power(x, 2)\) could yield \(6 \times power(x, 3)\).
Simplifiers typically work with two types of rules.
- Simplification rules.
- Rules for conversion to canonical form.
Simplification rules generally replace one form of symbolic structure with an equivalent, but smaller structure. Conversions to canonical form typically do not simplify a structure, but make it possible to more easily recognize simplifications in later steps.
Simplification of Regular Expressions
As an illustration of simplification and conversion to canonical form, consider the domain of regular expressions and the following set of rules.
data RE = Epsilon | Ch Char | Seq RE RE | Alt RE RE | Star RE | Group RE deriving Show
Simplification Rules
Seq Epsilon r
\(\rightarrow\)r
Seq r Epsilon
\(\rightarrow\)r
Alt r r
\(\rightarrow\)r
(Star (Star r))
\(\rightarrow\)Star r
(Star (Seq (Star r1) (Star r2)))
\(\rightarrow\)Star (Alt r1 r2)
Group r
\(\rightarrow\)r
Inside-Out Simplifier for Regular Expressions
Given a set of simplification rules, how do we construct a simplifier that systematically applies these rules to a given symbolic data object. The goal of systematic simplfication is to apply these rules as many times as necessary to yield a fully simplified symbolic form: one in which no further application of the simplification rules is possible.
In general, systematic simplification of a symbolic data domain can be achieved with the inside-out method. This method consists of two basic concepts.
- A recursive simplification step in which the subexpressions of an expression are first simplified (the "inside" of inside-out).
- A recombination step in which the expression is rebuilt using the simplified subexpressions and applying any relevant simplification rules (the "out" of inside-out).
A general organization of an inside-out simplifier is the following.
First, there is one makeXXX
function for each different type XXX
of the symbolic domain. This makeXXX
is responsible for building a symbolic object using the components of an XXX
, but applying simplification rules if necessary.
Based on the structure of the RE domain, make-
functions are defined with the following signatures.
makeChar :: Char -> RE makeSeq :: RE -> RE -> RE makeAlt :: RE -> RE -> RE makeStar :: RE -> RE makeGroup :: RE -> RE
Given these functions, the inside-out simplifier for regular expressions is a straightforward recursive function.
simplifyRE :: RE -> RE simplifyRE Epsilon = Epsilon simplifyRE (Ch c) = makeChar c simplifyRE (Seq r1 r2) = makeSeq (simplifyRE r1) (simplifyRE r2) simplifyRE (Alt r1 r2) = makeAlt (simplifyRE r1) (simplifyRE r2) simplifyRE (Star r) = makeStar (simplifyRE r) simplifyRE (Group r) = makeGroup (simplifyRE r)
How can we be sure that this simplifier systematically simplifies a regular expression. The key is structural induction and the correct implementation of the makeXXX
functions. Each makeXXX
function is giving the job of making a regular expression equivalent to an XXX
of the given components, assuming that those components are in simplified form. That is, we assume that
each component is thoroughly simplified, and simply build a new structure that satisfies the given simplification rules.
Following this approach, we can implement the regular expression simplifier using the following makeXXX
functions.
First the makeChar
function has a single equation,
because we have no simplification rules defined in this case.
makeChar c = Ch c
Note that defining the makeChar
function does allow us
to modify our simplifier to introduce simplification rules for Char later on, if we wish.
Next, the makeSeq
function has three equations as follows.
makeSeq Epsilon r = r makeSeq r Epsilon = r makeSeq r1 r2 = Seq r1 r2
The first two equations correspond to the two simplification rules we defined for Seq
objects. The final equation is the default rule when there is no special simplification that applies for construction of a Seq
from a given pair of simplified subobjects.
The makeAlt
function follows a similar strategy, but with guarded equations. The first guarded equation corresponds to our defined simplification rule, while the second equation is the default
(otherwise
) case.
makeAlt r1 r2 | r1 == r2 = r1 | otherwise = Alt r1 r2
However, what exactly does ==
mean when applied to REs?
We have not defined the equality operator REs, but we can by
adding one more deriving
declaration to our RE algebraic data definition.
data RE = Epsilon | Ch Char | Seq RE RE | Alt RE RE | Star RE | Group RE deriving (Show, Eq)
This will give us an automatically defined (==) operator which checks for structural equality. (However, it does not implement a more complex notion of the equivalence of two REs, which means that the REs describe the same set of strings.)
For the Star
forms, we again have three equations. The first two correspond to our two simplification rules and the third is the
default case when no further simplifications apply.
makeStar (Star r) = Star r makeStar (Seq (Star r1) (Star r2)) = makeStar (makeAlt r1 r2) makeStar r = Star r
An important point here is the use of the makeStar
and makeAlt
functions on the right hand side of the second equation.
If we had simply used Star (Alt r1 r2)))
, we would have a correct structure, but not necessarily one that is in fully simplified form.
Finally, Group
structures can always be simplified out, so a single equation is all that is necessary.
makeGroup r = r
This completes the inside-out simplifier for regular expressions!