Canonical Forms
Canonical Forms
In symbolic computing, canonical forms are particular representations that are chosen among a set of equivalent representations.
For example, in the regular expression domain, the following structures may all be considered equivalent.
Seq (Seq (Seq (Ch 'a') (Ch 'b')) (Ch 'c')) (Ch 'd')) Seq (Seq (Ch 'a') (Seq (Ch 'b') (Ch 'c')) (Ch 'd')) Seq (Seq (Ch 'a') (Ch 'b')) (Seq (Ch 'c') (Ch 'd')) Seq (Ch 'a') (Seq (Ch 'b') (Seq (Ch 'c') (Ch 'd')))
These forms correspond to different parenthesizations:
((ab)c)d
((a(bc))d
(ab)(cd)
a(b(cd))
Given a set of equivalent representations, one is often chosen as a canonical form
to simplify subsequent processing. For example, we might choose the form
that always groups Seq
operators on the left. This can be
expressed by the following systematic rule for conversion to canonical forms.
\(\mathtt{Seq}\, e\, (\mathtt{Seq}\, f\, g)\) becomes \(\mathtt{Seq} \,(\mathtt{Seq}\, e \,f)\, g\)
The purpose of canonical forms is to reduce the number of cases we need to consider when defining other types of operations, such as simplifications.
Canonical Forms with Lists
Consider an alternative Haskell RE
definition.
data RE = Epsilon | Ch Char | Seq [RE] | Alt [RE] | Star RE | Group RE deriving Show
In this case, we have even more possible alternative forms.
Seq [Seq [Seq [Ch 'a', Ch 'b'], Ch 'c'], Ch 'd'] Seq [Seq [Ch 'a', Seq [Ch 'b', Ch 'c']], Ch 'd']] Seq [Seq [Ch 'a', Ch 'b'], Seq [Ch 'c', Ch 'd']] ... Seq [Ch 'a', Ch 'b', Ch 'c', Ch 'd']
However, when we uses lists, we normally prefer the flattened form so that there are no Seqs directly embedded within Seqs, for example.
Sorting
In some cases, the order of elements within a structure does not matter.
For example,
Alt [Ch 'a', Ch 'b', Ch 'c', Ch 'd']
and Alt [Ch 'd', Ch 'b', Ch 'a', Ch 'c']
are equivalent.
When this is true, it is often useful to define a sorted canonical form. That is, we specify a sort order for the elements of a structure.
- For example, we may sort a list of terms in a
Sum
expression, so that numeric terms always appear at the end. - We may sort a list of terms in a
Product
expression, so that numeric terms always appear at the beginning.
In general, a good approach is to sort terms by overall complexity so that terms of similar complexity are adjacent. This can often substantially reduce the work in applying other simplification rules.