Not logged in. Login

Canonical Forms

Canonical Forms

In symbolic computing, canonical forms are particular representations that are chosen among a set of equivalent representations.

For example, in the regular expression domain, the following structures may all be considered equivalent.

    Seq (Seq (Seq (Ch 'a') (Ch 'b')) (Ch 'c')) (Ch 'd'))
    Seq (Seq (Ch 'a') (Seq (Ch 'b') (Ch 'c')) (Ch 'd'))
    Seq (Seq (Ch 'a') (Ch 'b')) (Seq (Ch 'c') (Ch 'd'))
    Seq (Ch 'a') (Seq (Ch 'b') (Seq (Ch 'c') (Ch 'd')))

These forms correspond to different parenthesizations:

((ab)c)d
((a(bc))d
(ab)(cd)
a(b(cd))

Given a set of equivalent representations, one is often chosen as a canonical form to simplify subsequent processing. For example, we might choose the form that always groups Seq operators on the left. This can be expressed by the following systematic rule for conversion to canonical forms.

\(\mathtt{Seq}\, e\, (\mathtt{Seq}\, f\, g)\) becomes \(\mathtt{Seq} \,(\mathtt{Seq}\, e \,f)\, g\)

The purpose of canonical forms is to reduce the number of cases we need to consider when defining other types of operations, such as simplifications.

Canonical Forms with Lists

Consider an alternative Haskell RE definition.

data RE = Epsilon | Ch Char | Seq [RE] | Alt [RE] | Star RE | Group RE deriving Show

In this case, we have even more possible alternative forms.

    Seq [Seq [Seq [Ch 'a', Ch 'b'], Ch 'c'], Ch 'd']
    Seq [Seq [Ch 'a', Seq [Ch 'b', Ch 'c']], Ch 'd']]
    Seq [Seq [Ch 'a', Ch 'b'], Seq [Ch 'c', Ch 'd']]
...
    Seq [Ch 'a', Ch 'b', Ch 'c', Ch 'd']

However, when we uses lists, we normally prefer the flattened form so that there are no Seqs directly embedded within Seqs, for example.

Sorting

In some cases, the order of elements within a structure does not matter.

For example, Alt [Ch 'a', Ch 'b', Ch 'c', Ch 'd'] and Alt [Ch 'd', Ch 'b', Ch 'a', Ch 'c'] are equivalent.

When this is true, it is often useful to define a sorted canonical form. That is, we specify a sort order for the elements of a structure.

  • For example, we may sort a list of terms in a Sum expression, so that numeric terms always appear at the end.
  • We may sort a list of terms in a Product expression, so that numeric terms always appear at the beginning.

In general, a good approach is to sort terms by overall complexity so that terms of similar complexity are adjacent. This can often substantially reduce the work in applying other simplification rules.

Updated Tue Oct. 09 2018, 13:30 by cameron.