Not logged in. Login

Parabix Parser Generator

Introductory Project Example

Notes and Resources

  • The introductory project example presents a well-conceived and feasible course project for a recursive descent parser generator.
  • Line and column numbers.
    • One of the practical issues in parser generation is keeping track of line numbers and column positions for error messages.
    • Using standard byte-at-a-time processing, a separate low-level routine is inserted before the lexical analyzer to keep track of line numbers and column positions.
    • With Parabix-style methods, we want to avoid the high-cost of sequentially examining each byte just to calculate line numbers and columns. One way of doing this is shown in the LineColTracker of xmlwf.
  • Unicode.
    • Newer programming languages, such as Go and Swift, allow full Unicode (UTF-8) input.
    • Bitstream code for handling UTF-8 is illustrated in u8check, for example.
  • Comment processing.
    • It is good to define a separate layer for comment processing, that quickly masks off the interior regions for comments.
    • The masks can be applied to eliminate lexical items for consideration. Once all remaining items in a block are masked off, the rest of the block can be skipped.
    • Defining a separate layer for comment processing ahead of lexical analysis also has the benefit that recursive comment syntax can be supported.
  • Bit scan primitives.
    • The operations to sequentially scan bit streams are an important target for optimized code generation.
    • There is recent work in icgrep to improve the sequential scanning for matched lines given the (line breaks, match results) pair of bitstreams: http://parabix.costar.sfu.ca/browser/icGREP/icgrep-devel/icgrep/kernels/scanmatchgen.cpp
    • To improve the efficiency of bit mask and scanning operations, Intel has quite recently introduced BMI instructions. There could be interesting back-end compiler work to take advantage of such instructions.
  • Parabix 2 Operations
    • A parser-generator for Parabix 1 style parsing (parallelization of lexical processing, sequential bit scanning for parsing) is an excellent scope for a course project.
    • Finding a way to incorporate Parabix 2 style operations for some parsing (such as parallel parsing of multiple XML start tags in xmlwf) would be a great extension to the project.
    • Going even further, it may be possible to parallelize some recursive parsing operations, such as the parallel parenthesis matching of the matchparens prototype.
  • Parallelization of Symbol Resolution.
    • A separate project can look at parallelization of symbol resolution. Using the results of that project within the parser generator system would be another interesting direction for extended work.
Updated Mon Feb. 08 2016, 12:49 by cameron.