Playing With Alex Again (Compiler Series Part IV)

Now that we have a scanner and parser I want to go back and add error reporting to both. For this post I’m only covering error reporting for the scanner. Our goal is simple, we want to abort on the first error the scanner encounters and print out the line and column number of the error.

In my previous post on the scanner I mentioned that we used the “basic” wrapper for Alex which provided a simple interface to the scanner that took a String and outputted a list of tokens.

For error reporting we need something more sophisticated. In particular the “posn” wrapper is almost perfect for our purposes. To use the “posn” wrapper we need to modify our Token type by annotating it with one property of type AlexPosn. The “posn” wrapper defines AlexPosn to contain the following three numbers, the absolute offset of the token, the line number of the token, and the column number of the token. Specifically

data AlexPosn = AlexPn !Int  -- absolute character offset
                       !Int  -- line number
                       !Int  -- column number

For now you can ignore the “!” in the “!Int”, for explanation of what it means go here.

This is exactly what we need for error reporting. To use the “posn” wrapper we have to modify our Token type to include an AlexPosn for representing the token position. For example

data Token =
     	TLeftBrace       |

becomes

data Token =
     	TLeftBrace AlexPosn     |

We also have to modify the scanner actions to include the passed AlexPosn value on each action. I’ve included the modified actions below.

tokens :-

  $white+				;
  "class"				{ \p s -> TClass p }
  "new"					{ \p s -> TNew p }
  "String"				{ \p s -> TString p }
  "static"				{ \p s -> TStatic p }
  "void"				{ \p s -> TVoid p }
  "main"				{ \p s -> TMain p }
  "return"              { \p s -> TReturn p }
  "public"				{ \p s -> TPublic p }
  "extends"				{ \p s -> TExtend p }
  "int"					{ \p s -> TInt p }
  "boolean"				{ \p s -> TBool p }
  "if"					{ \p s -> TIf p }
  "else"				{ \p s -> TElse p }
  "true"				{ \p s -> TTrue p }
  "false"				{ \p s -> TFalse p }
  "this"				{ \p s -> TThis p }
  "length"				{ \p s -> TLength p }
  "while"				{ \p s -> TWhile p }
  $digit+				{ \p s -> TIntLiteral p (read s) }
  "."                   { \p s -> TPeriod p }
  "&&"			        { \p s -> TOp p (head s) }
  "!"					{ \p s -> TNot p }
  [\+\-\*\/]            { \p s -> TOp p (head s) }
  "<"                   { \p s -> TComOp p (head s) }
  "="					{ \p s -> TEquals p }
  ";" 					{ \p s -> TSemiColon p }
  "("					{ \p s -> TLeftParen p }
  ")"					{ \p s -> TRightParen p }
  $alpha[$alpha $digit \_ \']*	{ \p s -> TIdent p s }
  @string 	       	    { \p s -> TStringLiteral p (init (tail s)) -- remove the leading and trailing double quotes }
  "{"	 	 	   		{ \p s -> TLeftBrace p }
  "}"					{ \p s -> TRightBrace p }
  ","					{ \p s -> TComma p }
  "["					{ \p s -> TLeftBrack p }
  "]"					{ \p s -> TRightBrack p }
  "System.out.println"  { \p s -> TPrint p }
-- Each action has type ::AlexPosn -> String -> Token

Picking one specific example.

$digit+				{ \p s -> TIntLiteral p (read s) }

The “\p s -> TIntLiteral p (read s)” is an anonymous function that is passed two arguments, a String, “s”, and an AlexPosn, “p”. The function returns the token “TIntLiteral” with p as the token position and (read s) as the value of the actual integer.

The only issue with the “posn” wrapper is that the “alexScanTokens” function does not include any information when an error is encountered; it simply aborts with the message “lexical error” with no information on the error line or column number. To fix this we define a new function “alexScanTokens2” that includes better error reporting. The below code defines two helper functions and the modified “alexScanTokens2”.

getLineNum :: AlexPosn -> Int
getLineNum (AlexPn offset lineNum colNum) = lineNum 

getColumnNum :: AlexPosn -> Int
getColumnNum (AlexPn offset lineNum colNum) = colNum

--alexScanTokens :: String -> [token]
alexScanTokens2 str = go (alexStartPos,'\n',str)
  where go inp@(pos,_,str) =
          case alexScan inp 0 of
                AlexEOF -> []
                AlexError _ -> error ("lexical error @ line " ++ show (getLineNum(pos)) ++ " and column " ++ show (getColumnNum(pos)))
                AlexSkip  inp' len     -> go inp'
                AlexToken inp' len act -> act pos (take len str) : go inp'

main = do
  s <- getContents
  print (alexScanTokens2 s)

The definition of “alexScanTokens2” may look scary at first if you’re a new Haskell programmer like me but if we break it down, it’s not so complicated.
First the two lines

alexScanTokens2 str = go (alexStartPos,'\n',str)
  where go inp@(pos,_,str) =

define “alexScanTokens2” to take a String, “str”, as input and defines it to return the value of “go (alexStartPos, ‘\n’, str)”.
The where clause defines what “go (alexStartPost, ‘\n’, str)” is. Which in this case it is

go inp@(pos,_,str)

Please note that the “inp@” gives “inp” the value of “(pos, _, str)”. That is “inp” is defined as a three-tuple. Saying it another way, the “@” is syntactic sugar for defining the variable “inp” as the value of “(pos, _, str)”. To figure out the case expression I recommend looking up the definition of “alexScan”, here, it’s under the “Basic Interface” section.

The only portion of the case expression that I modified from the default “alexScanTokens” function provided by the “posn” wrapper was to change the AlexError case to

AlexError _ -> error ("lexical error @ line " ++ show (getLineNum(pos)) ++ " and column " ++ show (getColumnNum(pos)))

As the code self evidently shows, a simple message is printed giving the line and column number of the error and the program then aborts.

To try out our new scanner you can download it at http://github.com/bjwbell/NewL-Compiler under the “scanner_with_error_reporting” directory. There’s a small readme file with instructions on how to test the scanner.