Assignment 1 Solutions - Matching Semantics
Extending the Matching Semantics
Now that we have decided how to represent our extensions and updated our parser to be able to parse the concrete syntax, we now just have to update the semantics of matching RE
s.
Escaped Metacharacters
Because the processing of backslash escapes is handled by the parser and there are no changes to our abstract syntax tree structure, we do not need any changes to the matching functions to handle escaped characters.
Any Metacharacter
To represent the "." metacharacter, we added a new alternative Any
to our RE
data type.
The changes to the match
function are straightforward, simply add anew equation for matching Any
. We can use a structure very similar to the equation for matching single characters.
match (Ch a) "" = False match (Ch a) (c : more_chars) = a == c && more_chars == []
However, we don't need to worry about which specific character
occurs as the first element of the string, so we just match using the
anonymous pattern variable _
.
match Any "" = False match Any (_ : more_chars) = more_chars == []
Option Metacharacter
To represent the "?" metacharacter, we translate occurrences
of R?
to
Alt R Epsilon
Since we already have definitions for the semantics of both
Alt
and Epsilon
, no further work is needed.
If instead, we had created a form Option RE
, we would
instead require new equations.
match (Option r) "" = True match (Option r) s = match r s
One-or-More Metacharacter
match (Plus r) "" = match r "" match (Plus r) s = match_any_split r (Star r) s
Character Class Expressions
Most of the work of the work for character class expressions is handled by the parser, which gives us a new representation for Ch
regular expressions, using the form Ch Bool [Char]
.
How do our equations for matching a Char need to be updated? Here are the originals.
match (Ch a) "" = False match (Ch a) (c : more_chars) = a == c && more_chars == []
Now we need to change the patterns so that we have a first field that represents whether the character class is negated or not, and a second field which is a list of characters.
match (Ch _ _) "" = False match (Ch True chars) (c : more_chars) = c in chars && more_chars == [] match (Ch False chars) (c : more_chars) = !(c in chars) && more_chars == []