Not logged in. Login

Prettyprinting

While unparsing of symbolic data can produce a textual representation that is faithful to the internal form, the output may not be easy to read. If the unparser always produces output as a single line of text, the output will often be many times larger than the available display width. If output lines are simply "wrapped around", the result may be particularly difficult to read, with tokens often split in two by line breaks.

Consider the following example Tree datatype and its unparseTree function.

data Tree = Node String [Tree]

unparseTree (Node s ts) = s ++ unparseBracket ts
unparseBracket [] = ""
unparseBracket ts = "[" ++ unparseTrees ts ++ "]"
unparseTrees [t] = unparseTree t
unparseTrees (t:ts) = unparseTree t ++ ", " ++ unparseTrees ts

Now suppose we have the following tree instance

t1 = Node "aaa" [Node "bbbbb" [Node "cc" [], Node "dd" []],
                 Node "eee" [],
                 Node "ffff" [Node "gg" [], Node "hhh" [], Node "ii" []]]

A simple putStrLn (unparse t1) can fit a sufficiently long line length.

aaa[bbbbb[cc, dd], eee, ffff[gg, hhh, ii]]

However, suppose our maximum width is 20. We might define a function for wrapping lines to this width.

linewrap width text 
   | length text <= width  = text ++ "\n"
   | otherwise             = take width text ++ "\n" ++ linewrap width (drop width text)

Now putStr (linewrap 20 (unparseTree t))) produces a rather ugly result.

aaa[bbbbb[cc, dd], e
ee, ffff[gg, hhh, ii
]]

The task of producing readable representations of symbolic data is known as prettyprinting or code beautifying.

A nice prettyprinting approach in Haskell is described in Wadler's paper A prettier printer. Examples and basic ideas in these notes are taken from that paper.

Unparsing to Token Streams

Rather than unparsing directly to streams of characters, unparsing to token streams can easily allow the creation of multiline output that avoids line break inserted into tokens. The idea is that each atomic unit (identifier, numeral, operator, etc.) of the symbolic representation is produced as a token. A stream of tokens can then be displayed in a multiline representation using the rule that a line break is inserted whenever the next token will not fit on the current line.

A Simple Formatter

More interesting than a printer using token streams is a simple formatter that will break the display of a Tree into multiple lines, making sure that lists of trees are broken over lines with appropriate indentation.

showTree (Node s ts) = s ++ nest (length s) (showBracket ts)
showBracket [] = ""
showBracket ts = "[" ++ nest 1 (showTrees ts) ++ "]"
showTrees [t] = showTree t
showTrees (t:ts) = showTree t ++ "," ++ "\n" ++ showTrees ts

nest indent "" = ""
nest indent ('\n':cs) = "\n" ++ replicate indent ' ' ++ nest indent cs
nest indent (c:cs) = c : (nest indent cs)

Using this formatter gives a decently pretty output for short lines.

*Main> putStrLn (showTree t1)
aaa[bbbbb[cc,
          dd],
    eee,
    ffff[gg,
         hhh,
         ii]]

A Simple Prettyprinter

An even better approach, however, is to base the formatting on the available linewidth. We first define a function to determine whether a given tree can be printed within a given width. It will return a nonnegative integer indicating how much space remains after printing the tree, if it fits. A negative value is returned as soon as it determines that the tree is too big to fit.

fitTree width (Node s []) = width - length s
fitTree width (Node s ts) = fitTrees (width - length s - 2) ts

fitTrees width [] = width
fitTrees width (t : ts)
    | width < 0   = width
    | null ts     = fitTree width t
    | otherwise   = fitTrees ((fitTree width t) - 2) ts

Using this function we can now produce a prettyprinter that can work with different output widths.

prettyTree width (Node s ts) 
    | fitTree width (Node s ts) >= 0  = unparseTree (Node s ts)
    | otherwise = s ++ nest (length s) (prettyBracket (width - length s) ts)
prettyBracket width [] = ""
prettyBracket width ts = "[" ++ nest 1 (prettyTrees (width - 1) ts) ++ "]"

prettyTrees width [t] = prettyTree width t
prettyTrees width (t:ts) = prettyTree width t ++ "," ++ "\n" ++ prettyTrees width ts
*Main> putStrLn (prettyTree 15 t1)
aaa[bbbbb[cc,
          dd],
    eee,
    ffff[gg,
         hhh,
         ii]]
*Main> putStrLn (prettyTree 30 t1)
aaa[bbbbb[cc, dd],
    eee,
    ffff[gg, hhh, ii]]
*Main> putStrLn (prettyTree 45 t1)
aaa[bbbbb[cc, dd], eee, ffff[gg, hhh, ii]]

More Elaborate Prettyprinting

See Wadler's paper for more prettyprinting ideas, efficiently choosing the best among many formatting options.

Updated Thu Nov. 01 2018, 12:23 by cameron.