Having Fun with Happy (Compiler Series Part III)

The next step to writing a compiler is creating the parser. For the parser generator I’m using Happy which is the Haskell version of yacc.

To make following along with me easier I’ve uploaded the parser and other files for NewL to github.com at http://github.com/bjwbell/NewL-Compiler. The parser is in the parser directory. Please note that the parser contains no error checking. I’ll explain how to add error checking in a later blog post.

To play with the parser and test it go to http://github.com/bjwbell/NewL-Compiler/archives/master. It’ll ask you to download the source. Once you’ve downloaded the source and unzipped it, open a terminal and go to the parser directory.

To compile the parser you’ll need to have the Glasgow Haskell Compiler collection installed. You also need to have Happy and Alex installed. If you’re running Ubuntu all of them are available via the Ubuntu repositories. To compile the parser run

make

and to test the parser run

cat test1.newl | ./newl

It will output the parse tree representation of the test1.newl program.

Before going into how I wrote the parser I want to mention that to get the scanner and parser to play together you have to delete

main = do
  s <- getContents
  print (alexScanTokens s)

from scanner.x. Otherwise Alex would generate a standalone scanner.

For an explanation of how the parts to the parser fit together I strongly encourage you to read http://www.haskell.org/happy/doc/html/sec-using.html. It has a very well written explanation of writing a parser using Happy.

The first part of the parser is to declare the name and what modules we import.

{
module Main where
import Scanner
}

The important part is “import Scanner”. This imports the scanner that was declared in scanner.x. In particular notice that the name “Scanner” matches the module name declared in scanner.x. The second part is to declare the name of our parser, you can choose any name you like. We also declare the token type which must match the token type declared in scanner.x, finally we give the name of the error function.

%name newl
%tokentype { Token }
%error { parseError }

Now onto to the meat of the parser. We declare a list of tokens as I’ve done below.

%token
  "class"                      { TClass }
  "new"                        { TNew }
  "String"                     { TString }
  "static"                     { TStatic }
  "void"                       { TVoid }
  "main"                       { TMain }
  "public"                     { TPublic }
  "return"                     { TReturn }
  "extends"                    { TExtend }
  "int"	                       { TInt }
  "boolean"                    { TBool }
  "if"                         { TIf }
  "else"                       { TElse }
  "true"                       { TTrue }
  "false"                      { TFalse }
  "this"                       { TThis }
  "length"                     { TLength }
  "while"                      { TWhile }
  integer_literal              { TIntLiteral $$ }
  ident                        { TIdent $$ }
  "{"                          { TLeftBrace }
  "}"                          { TRightBrace }
  ","                          { TComma }
  "["                          { TLeftBrack }
  "]"				           { TRightBrack }
  op                           { TOp $$}
  comop                        { TComOp $$ }
  "("                          { TLeftParen }
  ")"                          { TRightParen }
  ";"                          { TSemiColon }
  "."                          { TPeriod }
  "!"                          { TNot }
  "="                          { TEquals }
  "System.out.println"         { TPrint }
%%

“The symbols on the left are the tokens as they will be referred to in the rest of the grammar, and to the right of each token enclosed in braces is a Haskell pattern that matches the token. The parser will expect to receive a stream of tokens, each of which will match one of the given patterns .” [1] (the definition of the Token datatype is given in scanner.x)

Now we declare the production rules to the grammar (to view the grammar in BNF notation see https://bjbell.wordpress.com/2010/01/20/writing-a-compiler-in-haskell-compiler-series-part-i/).

Program : 
        MainClass ClassDeclList { Program $1 $2 }
MainClass : 
          "class" ident "{" "public" "static" "void" "main" "(" "String" "[" "]" ident ")" "{" Statement "}" "}" { MClass $2 $12 $15 }


ClassDeclList :
          ClassDecl     { ClassDeclList $1 CEmpty }
          | ClassDecl ClassDeclList { ClassDeclList $1 $2 }
          |             { CEmpty }

ClassDecl : 
          "class" ident "{" VarDeclList MethodDeclList "}"                     { ClassDecl $2 "void" $4 $5 }
          | "class" ident "extends" ident "{" VarDeclList MethodDeclList "}"   { ClassDecl $2 $4 $6 $7 }


MethodDeclList :
     MethodDecl                   { MethodDeclList $1 MEmpty }
     | MethodDecl MethodDeclList  { MethodDeclList $1 $2 }
     |                            { MEmpty }

MethodDecl : 
     "public" Type ident "(" FormalList ")" "{" VarDeclList StatementList "return" Exp ";" "}" { MethodDecl $2 $3 $5 $8 $9 $11 }

VarDeclList :
     Type ident ";" { VarDeclList $1 $2 VEmpty }
     | Type ident ";" VarDeclList { VarDeclList $1 $2 $4 }
     |              { VEmpty }

FormalList :
     Type ident       { FormalList $1 $2 FEmpty }
     | Type ident FormalList { FormalList $1 $2 $3 }

Type :
     "int" "[" "]"    { TypeIntArray }
     | "boolean"      { TypeBoolean }
     | "int"          { TypeInt }
     | ident          { TypeIdent $1 }

Statement :
    "{" StatementList "}"                            { SList $2 }
    | "if" "(" Exp ")" Statement "else" Statement  { SIfElse $3 $5 $7 }
    | "while" "(" Exp ")" Statement                { SWhile $3 $5 }
    | "System.out.println" "(" Exp ")" ";"         { SPrint $3 }
    | ident "=" Exp ";"                              { SEqual $1 $3 }
    | ident "[" Exp "]" "=" Exp ";"                  { SArrayEqual $1 $3 $6 }

StatementList :
    Statement               { StatementList Empty $1 }
    | StatementList Statement   { StatementList $1 $2 }

Exp : 
    Exp op Exp                        { ExpOp $1 $2 $3}
    | Exp comop Exp                   { ExpComOp $1 $2 $3}
    | Exp "[" Exp "]"                 { ExpArray $1 $3}
    | Exp "." "length"                { ExpLength $1}
    | Exp "." ident "(" ExpList ")"   { ExpFCall $1 $3 $5}
    | integer_literal                 { ExpInt $1}
    | "true"                          { ExpBool True}
    | "false"                         { ExpBool False}
    | ident                           { ExpIdent $1}
    | "this"                          { ExpThis }
    | "new" "int" "[" Exp "]"         { ExpNewInt $4 }  
    | "new" ident "(" ")"             { ExpNewIdent $2}
    | "!" Exp                         { ExpNot $2}
    | "(" Exp ")"                     { ExpExp $2}

ExpList :
        Exp            { ExpListExp $1 }
        | Exp ExpRest  { ExpList $1 $2 }
        |              { ExpListEmpty }

ExpRest : 
     "," Exp      { ExpRest $2 }

“Each production consists of a non-terminal symbol on the left, followed by a colon, followed by one or more expansions on the right, separated by |. Each expansion has some Haskell code associated with it, enclosed in braces as usual.” [1]

“The way to think about a parser is with each symbol having a `value’: we defined the values of the tokens above, and the grammar defines the values of non-terminal symbols in terms of sequences of other symbols (either tokens or non-terminals).” [1]

With production rules declared we now declare the data types for them. Which I’ve listed below.

{
parseError :: [Token] -> a
parseError _ = error "Parse error"


data Program 
    = Program MainClass ClassDeclList
      deriving (Show, Eq)



data MainClass
    = MClass String String Statement
      deriving (Show, Eq)

data ClassDeclList
    = ClassDeclList ClassDecl ClassDeclList
    | CEmpty
  deriving (Show, Eq)

data ClassDecl = ClassDecl Ident Ident VarDeclList MethodDeclList
  deriving (Show, Eq)


data MethodDeclList
    = MethodDeclList MethodDecl MethodDeclList
    | MEmpty
    deriving (Show, Eq)
data MethodDecl
    = MethodDecl Type Ident FormalList VarDeclList StatementList Exp
    deriving (Show, Eq)

data VarDeclList =
    VarDeclList Type Ident VarDeclList
    | VEmpty
    deriving (Show, Eq)

data FormalList = 
    FormalList Type Ident FormalList
    | FEmpty
  deriving (Show, Eq)

data Type =
    TypeIntArray
    | TypeBoolean
    | TypeInt
    | TypeIdent Ident
    deriving (Show, Eq)

data Statement
    = Statement String
    | SList StatementList
    | SIfElse Exp Statement Statement
    | SWhile Exp Statement
    | SPrint Exp
    | SEqual Ident Exp
    | SArrayEqual Ident Exp Exp
    | StatementError
    deriving (Show, Eq)

data StatementList
    = StatementList StatementList Statement 
    | Empty
    deriving (Show, Eq)


data Exp
    = Exp String
    | ExpOp Exp Char Exp
    | ExpComOp Exp Char Exp
    | ExpArray Exp Exp -- "Exp [ Exp ]"
    | ExpFCall Exp Ident ExpList -- Exp . Ident ( ExpList )
    | ExpInt Int
    | ExpNewInt Exp
    | ExpBool Bool -- True or False
    | ExpIdent Ident
    | ExpNewIdent Ident -- new Ident ()
    | ExpExp Exp -- Exp ( Exp )
    | ExpThis
    | ExpNot Exp
    | ExpLength Exp
    | ExpError
    deriving (Show, Eq)

data Op
     = And
     | LessThan
     | Plus
     | Minus
     | Times
     deriving (Show, Eq)

type Ident = String
type Integer_Literal = Int
data ExpList
    = ExpList Exp ExpRest
    | ExpListEmpty
    | ExpListExp Exp
    deriving (Show, Eq)
data ExpRest
    = ExpRest Exp
    deriving (Show, Eq)



main = do 
  inStr <- getContents
  let parseTree = newl (alexScanTokens inStr)  
  putStrLn ("parseTree:" ++ show(parseTree))
  print "done"
}

I’ve also defined the “parseError” function and the main function.
The main function uses “getContents” to retrieve the input from stdin and then creates the parse tree using “newl (alexScanTokens inStr)”. Note the “newl” name is the same one we declared using “%name newl” at the beginning of the program.

With the above code we now have a working parser for the NewL language. At this point if there’s an error parsing it barfs out “parse error” and doesn’t tell us anything about the error. Also no type or semantic analysis is done. I’ll expand on both of these points in later posts.

References
[1] Chapter 2. Using Happy, http://www.haskell.org/happy/doc/html/sec-using.html

Advertisements

8 thoughts on “Having Fun with Happy (Compiler Series Part III)

  1. Pingback: Writing A Compiler In Haskell (Compiler Series Part I) « A blog on math

  2. I had seen this tutorials, when they were starting and I’m glad it has been advancing.
    I was using Parsec + Alex to do some things but I would like to use Alex + Happy. What I think that it is not obvious in this tutorial is how we can use the tokens from Alex, directly in Happy.

    When you say:
    “Now onto to the meat of the parser. We declare a list of tokens as I’ve done below.”
    I find this a repetition of the work that has already been done in Alex. Furthermore, in Yacc/Byson it was possible to use the list of tokens generated by lex/flex, by importing them. I see that you import the scanner but it has to be done that %%token thing.

    Could you please clarify these points for me?

    • Thanks for reading the series Eduardo. I don’t know of any way to use the token directly, I tried using one of them directly in the production rules and it didn’t work.
      The purpose of %token is for added clarity in the production rules. For example in

      %token
        "class"                      { TClass }
      

      on the left is “class” and on the right is the actual token TClass. This enables us to use “class” in our production rules.
      For example we use “class” in

      MainClass :
                "class" ident "{" "public" "static" "void" "main" "(" "String" "[" "]" ident ")" "{" Statement "}" "}" { MClass $2 $12 $15 }
      

      .

      Which is slightly clearer than using TClass would be since “class” is what the original source code contained.

      • Well, at least I am not the only one who tried to do that and was not able to sucessfully do it…!

        I understand perfectly that that is for added clarity. What I find, may I say “disturbing”, is that we are duplicating work unecessarily. I do believe that there has to be a better way of achieving this. Even flex and byson allow to simplify this so I do not find any reason why we shouldn’t be able to do this with Alex and Happy since, in a way, they are supposed to work together…

        Well, sorry for my rant but you are one of the few people who I’ve found on the web that is trying to use Alex with Happy and is giving examples of it. I think I will ask people from the Happy community how to do this.

  3. Pingback: 2010 in review « Math and More

  4. Hi I’m a beginner in this field. And I needed to learn it for some requirements in one of my course.
    Could you help me how to make “cat test1.newl | ./newl” execute? I can’t seem to do it.. Sorry, I still don’t know my way around… I’d really appreciate it… thanks

    • Hi, Jai. Yes, I can try and help you. First can you tell me a few details on your computer setup.

      What do you have for operating system? Ubuntu? Windows?. It’s easiest to get everything working under Ubuntu. Do you have Haskell, Git, and LLVM installed?

      You can check that Haskell is installed by opening the command line and executing “ghci”. To check that git is installed open a command line and execute “git –version” it should say something like “git version 1.7.6”.

      • Hi Bryan,
        I’m currently working on building a compiler for a custom language using Alex + Happy but I’m really new to these tools and Haskell more generally. I’m having trouble running the test file with your example scanner and parser on a Mac. Would be great if you could help! Thanks 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s