Playing with Alex (Compiler Series Part II)

We’re going to use Alex to write the scanner for NewL. Alex is the Haskell equivalent of Lex. The Alex manual is reasonably thorough and has several excellent examples of simple scanners written using it.

The basic idea for Alex is to pass it your Haskell code written up in the special Alex syntax and it spits out a plain Haskell source code file. There are several wrapper options available for using Alex as (1) a standalone scanner or (2) a module that gets compiled in with your parser. For this post we’ll be using Alex as a standalone scanner that takes input from standard in and outputs a list of tokens.

Alex like all other scanner generators uses regular expressions for specifying tokens. The basic format of an Alex file is

regexp { haskell-code }  

where haskell-code is an anonymous function that takes a string (the string that matched the regexp) and returns a token. A typically example is

 [0-9]+ { \s -> TIntLiteral s }

The \s is Haskell’s way of defining an anonymous function with one parameter named s. The “->” signals the start of the function body. Thus “\s -> TIntLiteral s” is a function that takes a String s as the argument and returns TIntLiteral s.

To define our tokens we make a new data type called Token with a list of the possible tokens.

 data Token = TIntLiteral String |
              TIdentifier String 

The above code defines two token types one is a literal int such as “10” and the other is an identifier such as “myVariable”.

Alex has a nice option %wrapper "basic" that defines the function alexScanTokens which takes a string and returns the list of tokens. Using this we define our main function as

main = do 
           s <- getContents
           print (alexScanTokens s) 

The main function gets the string from the standard input and puts it in s and then uses print (alexScanTokens s) to generate the list of tokens and print them out.
To test the scanner run

echo "10 \"StringLiteral\"" | ./a.out

the above command echos 10 “StringLiteral” to the standard output and then pipes it into the scanner, a.out (I assume a.out is the scanner executable). The output should be [TIntLiteral 10, TStringLiteral “StringLiteral”].

I’ve included a listing of the NewL Alex scanner below. I strongly recommend reading the Alex manual introduction. For compiling the listing use “alex Scanner.x” to generate the Haskell code and then run “ghc Scanner.hs” to generate the executable.


{
module Main (main) where
}

%wrapper "basic"

$digit = 0-9			-- digits
$alpha = [a-zA-Z]		-- alphabetic characters
$graphic    = $printable # $white

@string     = \" ($graphic # \")* \"



tokens :-

  $white+				;
  "class"				{ \s -> TClass }
  "new"				    { \s -> TNew }
  "String"				{ \s -> TString }
  "static"				{ \s -> TStatic }
  "void"				{ \s -> TVoid }
  "main"				{ \s -> TMain }
  "return"              { \s -> TReturn }
  "public"				{ \s -> TPublic }
  "extends"				{ \s -> TExtend }
  "int"					{ \s -> TInt }
  "boolean"				{ \s -> TBool }
  "if"					{ \s -> TIf }
  "else"				{ \s -> TElse }
  "true"				{ \s -> TTrue }
  "false"				{ \s -> TFalse }
  "this"				{ \s -> TThis }
  "length"				{ \s -> TLength }
  "while"				{ \s -> TWhile }
  $digit+				{ \s -> TIntLiteral (read s) }
  "."                   { \s -> TPeriod }
  "&&"				    { \s -> TOp (head s) }
  "!"					{ \s -> TNot }
  [\+\-\*\/]            { \s -> TOp (head s) }
  "<"                   { \s -> TComOp (head s) }
  "="					{ \s -> TEquals }
  ";" 					{ \s -> TSemiColon }
  "("					{ \s -> TLeftParen }
  ")"					{ \s -> TRightParen }
  $alpha[$alpha $digit \_ \']*	{ \s -> TIdent s }
  @string 	       	  		{ \s -> TStringLiteral (init (tail s)) -- remove the leading " and trailing " }
  "{"	 	 	   		{ \s -> TLeftBrace }
  "}"					{ \s -> TRightBrace }
  ","					{ \s -> TComma }
  "["					{ \s -> TLeftBrack }
  "]"					{ \s -> TRightBrack }
  "System.out.println"  { \s -> TPrint }
{
-- Each action has type :: String -> Token

-- The token type:
data Token =
    TLeftBrace  |
	TRightBrace |
	TComma |
	TLeftBrack |
	TRightBrack |
	TClass |
	TPublic |
	TString |
	TStatic |
	TVoid |
	TMain	 |
	TExtend |
	TInt |
	TBool |
	TIf |
	TElse |
	TTrue |
	TFalse |
	TThis |
	TLength |
	TWhile |
	TNew |
	TOp Char |
	TComOp Char |
    TNot |
	TEquals |
	TPeriod |
	TSemiColon |
	TLeftParen |
	TRightParen |
	TIdent String |
    TPrint |
	TIntLiteral Int |
	TStringLiteral String |
    TReturn
	deriving (Eq,Show)

main = do
  s <- getContents
  print (alexScanTokens s)

}
Advertisements

5 thoughts on “Playing with Alex (Compiler Series Part II)

  1. Pingback: Writing A Compiler In Haskell (Compiler Series Part I) « A blog on math

  2. Pingback: Playing With Alex Again (Compiler Series Part IV) « A blog on math

  3. Pingback: 2010 in review « Math and More

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s