Not logged in. Login

Assignment 2

Recursive Descent Parsing in Haskell

This is a group assignment worth 10% of the course mark. In addition, there will be a quiz based on the assignment worth 5% of the course mark.

Pablo Grammar

Consider the following grammar for the Pablo language.

<stmt> ::= <assign> | <if> | <while>
<assign> ::= <var> "=" <expr>
<if> ::= "if" <expr> ":" {<stmt>} ["else:" {<stmt>}] "."
<while> ::= "while" <expr> ":" {<stmt>} "."

<expr> ::= <term> | <expr> "|" <term> | <expr> "^" <term>
<term> ::= <factor> | <term> "&" <factor>
<factor> ::= "000..." | "111..." | <var> | "(" <expr> ")" | "~" <expr> 
             | "Advance" <expr> <int> | "MatchStar" <expr> <expr>

Example Pablo Code

Three assignments:

x = 000...
y = v | w & z
q = MatchStar x_1 cc_2

An if-statement:

if stream_7:
   x = stream_7 ^ y
else:
  x = 000...
.

Example Pablo File

Pablo Tokens

Variables and integer numerals are defined by the following regular expressions.

<var> ::= [A-Za-z_][A-Za-z_0-9]*
<int> ::= [0-9]+

The other tokens are the fixed operators and keywords shown in quote marks in the grammar above.

Assignment

Your task is to implement a recursive descent parser for this language, parsing Pablo programs into the following Haskell algebraic data types.

data PabloE = All(Int) | Var(String) | And(PabloE, PabloE) | Or(PabloE, PabloE) | Xor(PabloE, PabloE)
               | Not(PabloE) | Advance(PabloE, Int) | MatchStar(PabloE, PabloE)
   deriving Show

data PabloS = Assign(String, PabloE) |  If (PabloE, [PabloS], [PabloS])| While (PabloE, [PabloS])
   deriving Show

Your main parsing function parsePablo should accept an input stream as a single character string and produce a list of Pablo statements as its output.

parsePablo :: [Char] -> Maybe [PabloS]

You should implement your parser using two components, a tokenizer that returns a list of tokens extracted from the input string, and the recursive parsing procedures that each operate on token lists returning trees.

Your program together with the above definitions should be placed in a file named "pablo.hs".

Stay Tuned

More details about this assignment will be presented in class on Thursday, October 1, 2015:

  1. Left-Recursive Rules: this grammar has two left-recursive rules which require special treatment using the recursive descent parsing method.
  2. Abstract Syntax Trees versus Parse Trees: while full parse trees follow the grammar exactly, your Haskell program will produce flattened versions that eliminate unit productions.
  3. Careful: square brackets in the grammar mean optional phrases according to EBNF. Square brackets in the Haskell algebraic data type are for list types.
  4. Structure your parser as two components: a lexical analyzer which converts a character stream [Char] into a token stream [Token], and the recursive parsing procedures themselves which process the token stream. Define your own algebraic data type for tokens.
  5. Parsing procedures must do two things: consume the input tokens and return the constructed PabloS and PabloE abstract syntax trees. parse_stmt :: [Token] -> Maybe ([Token], PabloS)
  6. Do not use any string processing, tokenizing or parsing libraries for this assignment. You must program your lexical analyzer and your parsing functions by hand using Haskell pattern matching and recursion.
  7. You can follow the example of hgrep.hs for the main program.

Requirements

  1. Put all your code in a single file named pablo.hs.
  2. A main program reading from a file is not required.
  3. Include the PabloS and PabloE data declarations (algebraic data types) exactly as given above.
  4. Include your own Token algebraic data type.
  5. Include your lexical analyzer function pabloLexer and parsePablo functions which have exactly the following signatures.
pabloLexer:: [Char] -> [Token]

parsePablo :: [Char] -> Maybe [PabloS]

Due Date

Assignment 2 is due at Saturday October 10 2015, 23:59.

Notes

The Pablo language here is similar to that defined in research prototypes for regular expression matching on the Parabix website.

Updated Tue Oct. 06 2015, 08:18 by cameron.