Code Generation With LLVM (Compiler Series Part IX)

With type checking done, it’s time to generate code from the parse tree. Code generation typically consists of the following steps:
1. Generate intermediate code.
2. Generate the machine or byte code from the intermediate code.

The following are options for doing the above two tasks:

  • Generate intermediate and machine code for x86 or RISC using our own methods (tedious and error prone).
  • Generate the intermediate code and then byte code for either the JVM or CLI.
  • Outsource the problem to LLVM.

I’m a lazy fellow so I pick choice #3 of outsourcing the code generation to LLVM. This is a particularly apt choice given the exciting intersection of Haskell with LLVM recently via David Terei’s honors thesis on using LLVM as the backend for the GHC compiler. This work is ongoing at http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Backends/LLVM and for an explanation of why LLVM is used as the backend see http://blog.llvm.org/2010/05/glasgow-haskell-compiler-and-llvm.html.

Bryan O’Sullivan put together bindings to LLVM at http://code.haskell.org/llvm/. Bryan and the other author, Lennart Augustsson, also posted examples of using the bindings at http://www.serpentine.com/blog/2008/01/03/llvm-bindings-for-haskell/, http://augustss.blogspot.com/2009/01/llvm-llvm-low-level-virtual-machine-is.html, and http://augustss.blogspot.com/2009/01/llvm-arithmetic-so-we-want-to-compute-x.html.

I’m reading up on LLVM and will post more as I learn more about LLVM in general and the Haskell LLVM bindings in particular.

Setting a Cell Color with Open XML (SpreadsheetML)

As you may know Office 2007 or later uses “Office Open XML” as the default storage format. Working with Open XML can be annoying. Doing simple things such as setting a cell color is complicated. In this post I give the code for creating a speadsheet with the Open XML SDK 2.0.

For learning the structure of Open XML see “Open XML Explained“. The easiest way to create C# code for Open XML is to use the “Open XML SDK 2.0 Productivity Tool” included in the OpenXMLSDKTool.msi package. With this tool you open a xlsx document and view the structure of the document. It additionally has an option to generate C# code that will create a duplicate of the xlsx document.

To set the background color of a cell I created a xlsx document with the color of one cell set to red. I then opened the document in the Open XML Productivity tool and viewed the generated C# code. The code was very lengthy and I edited it down to the following:


        // Creates a SpreadsheetDocument.
        public void CreatePackage(string filePath)
        {
            using(SpreadsheetDocument package = SpreadsheetDocument.Create(filePath, SpreadsheetDocumentType.Workbook))
            {
                CreateParts(package);
            }
        }

        // Adds child parts and generates content of the specified part.
        private void CreateParts(SpreadsheetDocument document)
        {
            
            WorkbookPart workbookPart1 = document.AddWorkbookPart();
            GenerateWorkbookPart1Content(workbookPart1);

            WorksheetPart worksheetPart3 = workbookPart1.AddNewPart<WorksheetPart>("rId1");
            GenerateWorksheetPart3Content(worksheetPart3);

            WorkbookStylesPart workbookStylesPart1 = workbookPart1.AddNewPart<WorkbookStylesPart>("rId2");
            GenerateWorkbookStylesPart1Content(workbookStylesPart1);

        }
      

        // Generates content of workbookPart1.
        private void GenerateWorkbookPart1Content(WorkbookPart workbookPart1)
        {
            Workbook workbook1 = new Workbook();
            workbook1.AddNamespaceDeclaration("r", "http://schemas.openxmlformats.org/officeDocument/2006/relationships");
            
            
            
            Sheets sheets1 = new Sheets();
            Sheet sheet1 = new Sheet(){ Name = "Sheet1", SheetId = (UInt32Value)1U, Id = "rId1" };
            
            sheets1.Append(sheet1);
            
            workbook1.Append(sheets1);
            
            workbookPart1.Workbook = workbook1;
        }

        
        // Generates content of worksheetPart3.
        private void GenerateWorksheetPart3Content(WorksheetPart worksheetPart3)
        {
            Worksheet worksheet3 = new Worksheet();
            worksheet3.AddNamespaceDeclaration("r", "http://schemas.openxmlformats.org/officeDocument/2006/relationships");
            
            SheetData sheetData3 = new SheetData();

            Row row1 = new Row(){};

            Cell cell1 = new Cell() { };// StyleIndex = (UInt32Value)1U };
            CellValue cellValue1 = new CellValue();
            cell1.DataType = CellValues.String;
            cellValue1.Text = "A1";
            cell1.StyleIndex = 1U;
            cell1.Append(cellValue1);

            row1.Append(cell1);

            sheetData3.Append(row1);
            worksheet3.Append(sheetData3);
            
            worksheetPart3.Worksheet = worksheet3;
        }
        

        // Generates content of workbookStylesPart1.
        private void GenerateWorkbookStylesPart1Content(WorkbookStylesPart workbookStylesPart1)
        {
            Stylesheet stylesheet1 = new Stylesheet();

            Fonts fonts1 = new Fonts() { Count = (UInt32Value)2U };

            Font font1 = new Font();
            FontSize fontSize1 = new FontSize() { Val = 11D };
            Color color1 = new Color() { Theme = (UInt32Value)1U };
            FontName fontName1 = new FontName() { Val = "Calibri" };
            FontFamilyNumbering fontFamilyNumbering1 = new FontFamilyNumbering() { Val = 2 };
            FontScheme fontScheme1 = new FontScheme() { Val = FontSchemeValues.Minor };

            font1.Append(fontSize1);
            font1.Append(color1);
            font1.Append(fontName1);
            font1.Append(fontFamilyNumbering1);
            font1.Append(fontScheme1);

            Font font2 = new Font();
            FontSize fontSize2 = new FontSize() { Val = 11D };
            Color color2 = new Color() { Rgb = "FF006100" };
            FontName fontName2 = new FontName() { Val = "Calibri" };
            FontFamilyNumbering fontFamilyNumbering2 = new FontFamilyNumbering() { Val = 2 };
            FontScheme fontScheme2 = new FontScheme() { Val = FontSchemeValues.Minor };

            font2.Append(fontSize2);
            font2.Append(color2);
            font2.Append(fontName2);
            font2.Append(fontFamilyNumbering2);
            font2.Append(fontScheme2);

            fonts1.Append(font1);
            fonts1.Append(font2);

            Fills fills1 = new Fills() { Count = (UInt32Value)3U };

            Fill fill1 = new Fill();
            PatternFill patternFill1 = new PatternFill() { PatternType = PatternValues.None };

            fill1.Append(patternFill1);

            Fill fill2 = new Fill();
            PatternFill patternFill2 = new PatternFill() { PatternType = PatternValues.Gray125 };

            fill2.Append(patternFill2);

            Fill fill3 = new Fill();

            PatternFill patternFill3 = new PatternFill() { PatternType = PatternValues.Solid };
            ForegroundColor foregroundColor1 = new ForegroundColor() { Rgb = "FFFF0000" };            

            patternFill3.Append(foregroundColor1);

            fill3.Append(patternFill3);

            fills1.Append(fill1);
            fills1.Append(fill2);
            fills1.Append(fill3);

            Borders borders1 = new Borders() { Count = (UInt32Value)1U };

            Border border1 = new Border();
            LeftBorder leftBorder1 = new LeftBorder();
            RightBorder rightBorder1 = new RightBorder();
            TopBorder topBorder1 = new TopBorder();
            BottomBorder bottomBorder1 = new BottomBorder();
            DiagonalBorder diagonalBorder1 = new DiagonalBorder();

            border1.Append(leftBorder1);
            border1.Append(rightBorder1);
            border1.Append(topBorder1);
            border1.Append(bottomBorder1);
            border1.Append(diagonalBorder1);

            borders1.Append(border1);

            CellStyleFormats cellStyleFormats1 = new CellStyleFormats() { Count = (UInt32Value)2U };
            CellFormat cellFormat1 = new CellFormat() { NumberFormatId = (UInt32Value)0U, FontId = (UInt32Value)0U, FillId = (UInt32Value)0U, BorderId = (UInt32Value)0U };
            CellFormat cellFormat2 = new CellFormat() { NumberFormatId = (UInt32Value)0U, FontId = (UInt32Value)1U, FillId = (UInt32Value)2U, BorderId = (UInt32Value)0U, ApplyNumberFormat = false, ApplyBorder = false, ApplyAlignment = false, ApplyProtection = false };

            cellStyleFormats1.Append(cellFormat1);
            cellStyleFormats1.Append(cellFormat2);

            CellFormats cellFormats1 = new CellFormats() { Count = (UInt32Value)2U };
            CellFormat cellFormat3 = new CellFormat() { NumberFormatId = (UInt32Value)0U, FontId = (UInt32Value)0U, FillId = (UInt32Value)0U, BorderId = (UInt32Value)0U, FormatId = (UInt32Value)0U };
            CellFormat cellFormat4 = new CellFormat() { NumberFormatId = (UInt32Value)0U, FontId = (UInt32Value)1U, FillId = (UInt32Value)2U, BorderId = (UInt32Value)0U, FormatId = (UInt32Value)1U };

            cellFormats1.Append(cellFormat3);
            cellFormats1.Append(cellFormat4);

            CellStyles cellStyles1 = new CellStyles() { Count = (UInt32Value)2U };
            CellStyle cellStyle1 = new CellStyle() { Name = "Good", FormatId = (UInt32Value)1U, BuiltinId = (UInt32Value)26U };
            CellStyle cellStyle2 = new CellStyle() { Name = "Normal", FormatId = (UInt32Value)0U, BuiltinId = (UInt32Value)0U };

            cellStyles1.Append(cellStyle1);
            cellStyles1.Append(cellStyle2);
            DifferentialFormats differentialFormats1 = new DifferentialFormats() { Count = (UInt32Value)0U };
            TableStyles tableStyles1 = new TableStyles() { Count = (UInt32Value)0U, DefaultTableStyle = "TableStyleMedium9", DefaultPivotStyle = "PivotStyleLight16" };

            stylesheet1.Append(fonts1);
            stylesheet1.Append(fills1);
            stylesheet1.Append(borders1);
            stylesheet1.Append(cellStyleFormats1);
            stylesheet1.Append(cellFormats1);
            stylesheet1.Append(cellStyles1);
            stylesheet1.Append(differentialFormats1);
            stylesheet1.Append(tableStyles1);            
            workbookStylesPart1.Stylesheet = stylesheet1;
        }

Type Checking (Compiler Series Part VIII)

In the last post we finished adding the functions to create the symbol tables needed to start the type checking. With the symbol table functions completed it’s time to use them.

To type check a program we traverse the parse tree and check the type of each variable reference. Sounds simple and it is for our language, once we have the right framework setup.

Our framework consists of (1) creating the class symbol table to easily look up class definitions and (2) keeping track of the current context (to track the current context we use an association list called context). We also need helper functions to outsource some tasks such as checking that a class doesn’t declare duplicate variables.

On a high level we (1) create the parse tree, (2) create the class symbol table, and (3) call the type check function on the parse tree. I’ve listed the source code below.

main = do 
  inStr <- getContents
  let parseTree = newl (alexScanTokens2 inStr)  
  let defaultClasses = [("int", ClassSymbol "int" [] []), ("string", ClassSymbol "string" [] []), ("bool", ClassSymbol "bool" [] [])]
  let classes = defaultClasses ++ classSymbols parseTree
  let typeCheckingResult = typeCheck parseTree classes
  if typeCheckingResult == "Ok"
     then putStrLn "Semantic Analysis Results: Passed"
     else putStrLn ("Semantic Analysis Results: Failed, " ++ typeCheckingResult)
  print "done"
}

For type checking the parse tree we do the following:

  • Type check the main class which consists of type checking the single statement in
    public static void main(string[] params) { statement}

    Please note that the only context in the statement is the name of the argument parameters.

  • Type check each class in the program which consists of type checking the class variables and the class methods. For checking the methods we track which class the method is declared in via creating the context association list with the entry (“this”, className) where className is the name of the current class. I’ve listed the source code below.
    typeCheckClassDecl (ClassDecl className varDeclList methodDeclList) classes =
        if typeCheckVarDeclList varDeclList classes [("this", className)] == "Ok" 
        then typeCheckMethodDeclList methodDeclList classes [("this", className)]
        else "Fail2"
    

Doing the above type checking involves traversing down the parse tree and type checking more basic productions such as expressions. I’ll let you read the code for that since the code itself is reasonably self explanatory. The code listing is shown below. You can view the complete type checking program at github.

typeCheck (Program mainClass classDeclList) classes =  
 if typeCheckMainClass mainClass classes == "Ok" 
 then typeCheckClassDeclList classDeclList classes
 else "Fail1" ++ " " ++ (typeCheckMainClass mainClass classes)

typeCheckClassDeclList CEmpty classes = "Ok"
typeCheckClassDeclList (ClassDeclList classDecl classDeclList) classes = 
  if typeCheckClassDecl classDecl classes  == "Ok"
  then "Ok"
  else typeCheckClassDeclList classDeclList classes 

typeCheckClassDecl (ClassDecl className varDeclList methodDeclList) classes =
    if typeCheckVarDeclList varDeclList classes [("this", className)] == "Ok" 
    then typeCheckMethodDeclList methodDeclList classes [("this", className)]
    else "Fail2"

typeCheckVarDeclList VEmpty _ _ = "Ok" -- empty variable declaration list so automatically successful

typeCheckVarDeclList (VarDeclList theType ident varDeclList) classes context =
    if checkForDuplicateVarDeclarations (VarDeclList theType ident varDeclList) [] context == "Ok" && typeCheckVarDecl theType ident classes context == "Ok" 
    then typeCheckVarDeclList varDeclList classes context
    else "Fail3"

checkForDuplicateVarDeclarations VEmpty vars _ = "Ok"
checkForDuplicateVarDeclarations (VarDeclList theType ident varDeclList) [] context =
  checkForDuplicateVarDeclarations varDeclList [ident] context
checkForDuplicateVarDeclarartions (VarDeclList theType ident varDeclList) vars context =
  case elem ident vars of
    True -> error ("Double declaration of " ++ ident ++ " in " ++ (show (lookup "this" context)))
    False -> checkForDuplicateVarDeclarations varDeclList (ident : vars) context

getTypeName TypeBool = "bool"
getTypeName TypeInt = "int"
getTypeName TypeString = "string"
getTypeName (TypeIdent ident) = ident
getTypeName (TypeIdentArray ident) = ident


typeCheckVarDecl theType ident classes context = 
  case theType of
    TypeBool -> "Ok"
    TypeInt -> "Ok"
    TypeString -> "Ok"
    (TypeIdent typeName) -> case lookup typeName classes of
      Just classSym -> "Ok"
      Nothing -> error ("Unknown type " ++ typeName ++ " in the var declaration list of class " ++ (show (lookup "this" context)))
    (TypeIdentArray typeNameArray) -> let tName = reverse (drop 2 (reverse typeNameArray)) in -- drops the [] at the end of the typeNameArray.
      case lookup tName classes of
        Just classSym -> "Ok"
        Nothing -> error ("Unknown type " ++ typeNameArray ++ " in the var declaration list of class " ++ (show (lookup "this" context)))

typeCheckMethodDeclList MEmpty classes context = "Ok" -- no methods so automatically successful
typeCheckMethodDeclList (MethodDeclList methodDecl methodDeclList) classes context =
    if checkForDuplicateMethodDeclarations (MethodDeclList methodDecl methodDeclList) [] context == "Ok" && typeCheckMethodDecl methodDecl classes context == "Ok" 
    then typeCheckMethodDeclList methodDeclList classes context
    else "Fail4"

checkForDuplicateMethodDeclarations MEmpty _ _ = "Ok"
checkForDuplicateMethodDeclarations (MethodDeclList methodDecl methodDeclList) methods context =
  case elem (methodName methodDecl) methods of
    False -> checkForDuplicateMethodDeclarations methodDeclList ((methodName methodDecl) : methods) context
    True -> error("Duplicate method " ++ (methodName methodDecl))
  

typeCheckMethodDecl (MethodDecl theType methodName formalList varDeclList statementList exp) classes context =    
    let context2 = (getThisClassVariables classes context) ++ (getVarDeclListVariables varDeclList classes) ++ (getFormalListVariables formalList classes) ++ context
        typeName = getTypeName theType
    in
      if typeCheckStatementList statementList classes context2 == "Ok" 
      then 
        if typeName /=  (typeCheckExp exp classes context2)
        then error ("the type of method " ++ methodName ++ " does not match the return type of " ++ (show exp))
        else "Ok"
      else 
        "Fail5"          
                

getThisClassVariables classes context =
    case lookup "this" context of
      Nothing -> error("undeclared this in context: " ++ show context)
      Just thisTypeName -> cVarTypes thisTypeName classes

cVarTypes typeName classes = 
    case lookup typeName classes of
      Nothing -> error("undeclared type: " ++ typeName)
      Just classSym -> publicVariables classSym

getVarDeclListVariables VEmpty _ = [] -- empty variable declaration list

getVarDeclListVariables (VarDeclList theType ident varDeclList) classes = 
    let typeName = getTypeName theType 
    in 
      if lookup typeName classes == Nothing || checkForDuplicateVariableDecls varDeclList [(theType, ident)] /= "Ok"
      then error("unknown type " ++ typeName ++ " for variable " ++ ident)
      else (ident, getTypeName theType) : getVarDeclListVariables varDeclList classes

checkForDuplicateVariableDecls :: VarDeclList -> [(Type, Ident)] -> String
checkForDuplicateVariableDecls VEmpty _ = "Ok"
checkForDuplicateVariableDecls (VarDeclList theType ident varDeclList) varList =
  case elem (theType, ident) varList of
    False -> checkForDuplicateVariableDecls varDeclList ((theType, ident) : varList)
    True -> error("duplicate declaration of " ++ ident ++ " in " ++ show(varDeclList)) 



getFormalListVariables FEmpty classes = []
getFormalListVariables (FormalList theType ident formalList) classes = 
    let typeName = getTypeName theType
    in
      if lookup typeName classes == Nothing || checkForDuplicateFormalListVariables formalList [(theType, ident)] /= "Ok"
      then error("unknown type " ++ typeName ++ " for variable " ++ ident)
      else (ident, getTypeName theType) : getFormalListVariables formalList classes
    
checkForDuplicateFormalListVariables :: FormalList -> [(Type, Ident)] -> String
checkForDuplicateFormalListVariables FEmpty _ = "Ok"
checkForDuplicateFormalListVariables (FormalList theType ident formalList) fList = case elem (theType, ident) fList of
  False -> checkForDuplicateFormalListVariables formalList ([(theType, ident)] ++ fList)
  True -> error("duplicate declaration of " ++ ident ++ " in " ++ show(fList))
      

typeCheckMainClass (MClass className paramName statement) classes = if (lookup className classes == Nothing)
                                                                    then error("Error " ++ className ++ " is not a declared class")
                                                                    else typeCheckStatement statement classes [("this", className), (paramName, "string[]")]



typeCheckStatementList Empty classes context = "Ok"
typeCheckStatementList (StatementList statementList statement) classes context =
    if typeCheckStatement statement classes context == "Ok" && typeCheckStatementList statementList classes context == "Ok"
    then "Ok"
    else "Fail6"

typeCheckStatement (SList statementList) classes context = 
    typeCheckStatementList statementList classes context
typeCheckStatement (SIfElse exp1 statement1 statement2) classes context =
    if (typeCheckExp exp1 classes context) == "bool" && typeCheckStatement statement1 classes context == "Ok" && typeCheckStatement statement2 classes context == "Ok"
    then "Ok"
    else error ("Error in if else statement")

typeCheckStatement (SWhile exp statement) classes context = 
      if typeCheckExp exp classes context == "bool" 
      then typeCheckStatement statement classes context
      else error("Error type of " ++ show(exp) ++ " is not bool in while statement")


typeCheckStatement (SEqual ident exp1) classes context = 
    let identType = lookup ident context in
    case identType of 
         Nothing -> error("Error undeclared identifier " ++ ident ++ " in equal statement")
         Just iType -> if iType == typeCheckExp exp1 classes context
                     then "Ok"
                     else error("Error types do not match in equals statements, type1 " ++ iType ++ " type2 " ++ (typeCheckExp exp1 classes context))


typeCheckStatement (SPrint exp) classes context = if typeCheckExp exp classes context /= "" then "Ok" else "Fail"
typeCheckStatement (SArrayEqual ident exp1 exp2) classes context = 
  case lookup ident context of
    Nothing -> error("Error undeclared identifier " ++ ident ++ " in equal statement")
    Just iType -> if take 2 (reverse iType) /= "[]" then
                    error(ident ++ " is not an array")
                  else 
                    let baseTypeName = reverse (take 2 (reverse iType)) in
                    if baseTypeName /= typeCheckExp exp2 classes context || typeCheckExp exp1 classes context /= "bool" then
                      error("Error, can't assign to array")
                    else
                      "Ok"

-- some helper functions for typeCheckExp

expTypes ExpListEmpty classes context = []
expTypes (ExpList exp expRest) classes context = (typeCheckExp exp classes context) : expRestTypes expRest classes context
expTypes (ExpListExp exp) classes context = [typeCheckExp exp classes context]
    
expRestTypes (ExpRest exp) classes context = [typeCheckExp exp classes context]
        

checkFunctionCall (ClassSymbol cName vars []) methodName [] = 
    error (methodName ++ " is not a method of class " ++ cName)

checkFunctionCall (ClassSymbol cName var methods) methodName methodTypes = 
    let method = lookup methodName methods in
    case method of 
         Just theMethod -> checkMethodTypes theMethod methodTypes
         Nothing -> error (methodName ++ " is not a method of " ++ cName)

checkMethodTypes (MethodSymbol returnType name []) [] = returnType

checkMethodTypes (MethodSymbol returnType name []) methodTypes = error("method " ++ name ++ " doesn't take any arguments but arguments of type " ++ show(methodTypes) ++ " provided")

checkMethodTypes (MethodSymbol returnType name args) [] = error("method " ++ name ++ " takes arguments but no arguments provided")
 
checkMethodTypes (MethodSymbol returnType name ((argName, argType) : args)) (type1 : types)  = 
    if argType == type1 
    then checkMethodTypes (MethodSymbol returnType name args) types 
    else error ("method " ++ name ++ " argument type mismatch " ++ " expected type " ++ argType ++ " but got type " ++ type1)




-- the typeCheckExp function returns the type name of the expression
typeCheckExp (ExpOp exp1 char exp2) classes context = 
    if typeCheckExp exp1 classes context == "int" && typeCheckExp exp2 classes context == "int"
       then "int"
       else error ("one of the expressions exp1:" ++ show(exp1) ++ " exp2:" ++ show(exp2) ++ " is not an integer \n exp1 type: " ++ (typeCheckExp exp1 classes context) ++ "\n exp2 type: " ++ (typeCheckExp exp2 classes context))


typeCheckExp (ExpComOp exp1 char exp2) classes context = 
    if typeCheckExp exp1 classes context == "int" && typeCheckExp exp2 classes context == "int"
       then "bool"
       else error ("one of the expressions exp1:" ++ show(exp1) ++ " exp2:" ++ show(exp2) ++ " is not an integer \n exp1 type: " ++ (typeCheckExp exp1 classes context) ++ "\n exp2 type: " ++ (typeCheckExp exp2 classes context))


typeCheckExp (ExpArray exp1 exp2) classes context  =  -- "Exp [ Exp ]"
  if typeCheckExp exp2 classes context /= "int" then
    error("Error in ExpArray")
  else
    if take 2 (reverse (typeCheckExp exp1 classes context)) /= "[]" then
      error("Error in ExpArray")
    else
      reverse(drop 2 (reverse (typeCheckExp exp1 classes context)))
    
  
typeCheckExp (ExpFCall exp ident expList) classes context =   
      let className = typeCheckExp exp classes context -- Exp . Ident ( ExpList )
          classSym = lookup className classes
          expListTypes = expTypes expList classes context 
      in case classSym of
           Just x -> (checkFunctionCall x ident expListTypes)
           Nothing -> error ("Undeclared class " ++ className ++ " in function call")

typeCheckExp (ExpInt int) classes context = "int"

typeCheckExp (ExpNewIntArray exp) classes context = 
    if typeCheckExp exp classes context == "int"
       then "int[]"
       else error ("Error new int[exp] the expression type is not an integer")


typeCheckExp (ExpNewBoolArray exp) classes context = 
    if typeCheckExp exp classes context == "int"
       then "bool[]"
       else error ("Error new bool[exp] the expression type is not an integer")


typeCheckExp (ExpNewStringArray exp) classes context = 
    if typeCheckExp exp classes context == "int"
       then "string[]"
       else error ("Error new string[exp] the expression type is not an integer")

typeCheckExp (ExpBool bool) classes context  = "bool" -- True or False


typeCheckExp (ExpIdent ident) classes context = 
    case lookup ident context of
      Just x -> x
      Nothing -> error ("Error " ++ ident ++ " is not a declared variable, context " ++ show context)
                                                       
typeCheckExp (ExpNewIdent ident) classes context = 
    if lookup ident classes == Nothing
    then error ("Error " ++ ident ++ " is not a declared class" ++ ", context " ++ show context)
    else ident

typeCheckExp (ExpNewIdentArray ident exp) classes context = 
    if lookup ident classes == Nothing || typeCheckExp exp classes context /= "int"
    then error ("Error " ++ ident ++ " is not a declared class or " ++ show(exp) ++ " is not an int" ++ ", context " ++ show context)
    else ident ++ "[]"

typeCheckExp (ExpExp exp) classes context  = typeCheckExp exp classes context -- Exp ( Exp )

typeCheckExp (ExpThis) classes context =
    let thisSym = lookup "this" context in
    case thisSym of 
      Just sym -> sym
      Nothing -> error ("this symbol undeclared")

typeCheckExp (ExpNot exp) classes context = 
    if typeCheckExp exp classes context == "bool"
       then "bool"
       else error "wrong type for !exp, expecting bool type"

typeCheckExp (ExpLength exp) classes context =
    if typeCheckExp exp classes context == "int[]"
           then "int"
           else error "Error in " ++ show(exp) ++ ".length the expression is not of type int[] "


main = do 
  inStr <- getContents
  let parseTree = newl (alexScanTokens2 inStr)  
  putStrLn ("parseTree: " ++ show(parseTree))
  let defaultClasses = [("int", ClassSymbol "int" [] []), ("string", ClassSymbol "string" [] []), ("bool", ClassSymbol "bool" [] [])]
  let classes = defaultClasses ++ classSymbols parseTree
  putStrLn "classes " 
  print classes
  let typeCheckingResult = typeCheck parseTree classes
  if typeCheckingResult == "Ok"
     then putStrLn "Semantic Analysis Results: Passed"
     else putStrLn ("Semantic Analysis Results: Failed, " ++ typeCheckingResult)
  
  putStrLn ("parseTree: " ++ show(parseTree))
  print "done"
}

Cleaning up NewL and Adding a Symbol Table Part II (Compiler Series Part VII)

Type checking has several stages, I’m breaking it up into a few posts to cover it more fully. To type check a program we need to:

  • 1. Create the symbol table for the classes. This symbol table is an association list where the key is the class name and the value is a “ClassSymbol”.
  • 2. For each class create the symbol table for the class methods which is again an association list where the key is the method name and the value is a “MethodSymbol”
  • 3. For each class create the symbol table of the class instance variables, this symbol table is an association list where the key is the variable name and the value is the variable type name. For example a class with one instance variable named “count” of type “int” has [(“count”, “int”)] as the symbol table.
  • 4. Type check the statements and expressions within the methods. (future post)

In looking over our language NewL I realized that arrays and strings are not handled consistently. In particular a string was declared with a capital “S” in the “String” versus using all lowercase for “int” and “boolean”. I didn’t like this since it make the declaration syntax inconsistent for inbuilt types. The second issue is that it is only possible to declare arrays of integers. I want to make arrays an integral feature of the language. Arrays should be of any type. With those two changes in mind I’ve updated the grammar of NewL. The updated grammar is shown below. I’ve modified the lexer and parser where needed to reflect the update. You can get both here.

Terminal symbols are bolded and single character ones are in double quotes.

NewL Grammar

1. Program ::= MainClass { ClassDecl }
2. MainClass :== class Ident “{public static void main “(” string[] Ident “)” “{” Statement “}” “}
3. ClassDecl ::= class Ident “{” {VarDecl} {MethodDecl} “}
4. VarDecl ::= Type Ident “;
5. MethodDecl ::= public Type Ident “(” FormalList “)” “{” {VarDecl} {Statement} return Exp “;” “}
6. FormalList :== Type Ident { FormalRest } | ε
7. FormalRest :== “,” Type Ident
8. Type :== int | bool | int | string | Ident | IdentArray
9. Statement :== “{” { Statement } “}” | if “(“ Exp “)” Statement else Statement | while(” Exp “)” Statement | System.out.println(” Exp “)” “;” | Ident = Exp “;” | Ident “[” Exp “]” “=” Exp “;
10. Exp ::= Exp Op Exp | Exp “[” Exp “]” | Exp “.length | Exp “.” Ident “(” ExpList “)” |
Integer_Literal | String_Literal | true | false | Ident | this | new Ident “[” Exp “]” | new Ident “(” “)” | “!” Exp | “(” Exp “)
11. ExpList :== Exp { ExpRest } | ε
12. ExpRest ::= “,” Exp
13. Ident ::= Letter { Letter | Digit | “_” }
13. IdentArray ::= Letter { Letter | Digit | “_” }”[“”]
14. Integer_Literal ::= Digit { Digit }
15. String_Literal ::= “” { Character } “
16. Op ::= && | < | +| | *

Revised Hello World for NewL.

class HelloWorld {
    public static void main(string[] a) {
        System.out.println(new Hello().World());
    }
}
class Hello {
    public String World() {
        return "Hello World";
    }
}

With the revised NewL language. We can create the symbol table for the classes. Please note that each ClassSymbol structure itself contains a symbol table for its instance variables and its methods. The below code contains the functions for creating the symbol table of classes. There are several helper functions for creating the symbol table for the methods of each class and the symbol table of the instance variables of each class. Please refer back to the post on “Adding a Symbol Table” for the definition of ClassSymbol and MethodSymbol. With the symbol tables created we will be able to start the type checking.

classSymbols (Program mainClass classDeclList) = classSymbolMainClass mainClass : classSymbolsClassDeclList classDeclList

classSymbolMainClass (MClass className paramName statement) =
                      (className, (ClassSymbol className [] [
                                                            ("main", 
                                                                    (MethodSymbol {returnType = "void", name = "main", args = [("string[]", paramName)]})
                                                            )]
                                  )
                      )
classSymbolClassDecl (ClassDecl className parentClassName varDecls methodDecls) = (className, (ClassSymbol className (varSymbols varDecls) (methodSymbols methodDecls)))
classSymbolsClassDeclList (ClassDeclList classDecl classDeclList) = classSymbolClassDecl classDecl : classSymbolsClassDeclList classDeclList
classSymbolsClassDeclList (CEmpty) = []

varSymbols VEmpty = []
varSymbols (VarDeclList theType ident varDeclList) = varSymbol theType ident : varSymbols varDeclList

varSymbol (TypeString) ident = (ident, "string")
varSymbol (TypeBool) ident = (ident, "bool")
varSymbol (TypeInt) ident = (ident, "int")
varSymbol (TypeIdent identType) ident = (ident, identType)
varSymbol (TypeIdentArray identType) ident = (ident, identType)

methodSymbols MEmpty = []
methodSymbols (MethodDeclList methodDecl methodDeclList) = methodSymbol methodDecl : methodSymbols methodDeclList

methodSymbol (MethodDecl theType ident formalList varDeclList statementList exp)
             = case theType of
                    TypeInt -> (ident, MethodSymbol {returnType = "int", name = ident, args = (argSymbols formalList)})
                    TypeBool -> (ident, MethodSymbol {returnType = "bool", name = ident, args = (argSymbols formalList)})
                    TypeString -> (ident, MethodSymbol {returnType = "string", name = ident, args = (argSymbols formalList)})
                    TypeIdent classType -> (ident, MethodSymbol {returnType = classType, name = ident, args = (argSymbols formalList)})
                    TypeIdentArray classType -> (ident, MethodSymbol {returnType = classType, name = ident, args = (argSymbols formalList)})

argSymbols FEmpty = []
argSymbols (FormalList theType ident formalList) =
           case theType of
                TypeInt -> (ident, "int") : argSymbols formalList
                TypeBool -> (ident, "bool") : argSymbols formalList
                TypeString -> (ident, "string") : argSymbols formalList
                TypeIdent classType -> (ident, classType) : argSymbols formalList
                TypeIdentArray classType -> (ident, classType) : argSymbols formalList

You can download the source code here. Look in the Newl.y file for the parser with the symbol table added.

Adding a Symbol Table (Compiler Series Part VI)

Now that we have a lexer and parser it’s time to add a symbol table for doing type checking. Type checking/Semantic analysis is a huge subject so I’ll be splitting it into several posts.

Our compiler so far consists of the lexer and parser. For a complete compiler we need the following stages (the diagram is taken from figure 1.6 of the second edition of the Dragon Book [1]).

Compiler Stages

We need the beginnings of a symbol table to start the semantic analysis. The symbol table is what it says it is, a table of symbols. For example the following fragment of pseudo code would have the associated symbol tables

                               symbol_table1 = {}
int x;                         symbol_table1 = {"x" : int_type }
int y;                         symbol_table1 = {"x" : int_type, "y" : int_type}
{
    int z                      symbol_table2 = {"z": int_type}, symbol_table1 = {"x" : int_type, "y" : int_type
}

The above symbol tables only track the type of the variables. The code reveals two things, (1) it is necessary to have a stack of symbol tables to track the scope of a variable’s declaration and (2) to get the type of a variable a search of the symbol table stack must be done with the type determined by the first symbol table in the stack to contain a declaration of that variable (this is so that if a variable is shadowed the correct type is returned). Wikipedia has a good article on symbol tables here [2].

For the NewL compiler the construction of the symbol table and the type checking are very entwined. I determined what to put in the symbol table via what was required for type checking. I ended up with the following definition for two symbol types, one for methods and one for classes.

data ClassSymbol 
    = ClassSymbol 
      {
          className       :: String -- class name
        , publicVariables :: [(String, String)] -- (variable name, variable type name) list of variable
        , methods         :: [(String, MethodSymbol)] -- (method name, method symbol) list of methods
      }
      deriving (Eq)

instance Show ClassSymbol where 
    show (ClassSymbol cName vars methods) = show cName
    
data MethodSymbol = MethodSymbol {
      returnType :: String
    , name       :: String
    , args       :: [(String, String)] -- list of arguments
    } 
    deriving (Show, Eq)

The symbol table itself is an association list of variable names and their type names, for instance the following would be the symbol table upon entering the type checking for a class.

[("this", className)]

In my next post I’ll go into how to use the symbol table for some basic type checking.

References
[1] Compilers : Principles, Techniques, and Tools http://dragonbook.stanford.edu/
[2] Wikipedia : Symbol Table

Not Happy, Happy Documentation (Compiler Series Part V)

After reading about monads in chapter 14 of Real World Haskell I thought I understood them to some extant. Then I read the section of the Happy user guide on threaded lexers and I was still confused on how to create a threaded lexer.

After looking at the threaded lexers section and not understanding it, I looked at the Haskell code that Happy generates when it runs. It turns out that the regular parser generator without a threaded lexer passes the token that caused the parse error as the first token in the token list to the “parseError” function. They DO NOT mention this in the documentation.

To grab the token, line and, column# that caused the parse error use the following function

parseError :: [Token] -> a
parseError tokenList = let pos = tokenPosn(head(tokenList)) 
  in 
  error ("parse error at " ++ show (head(tokenList))++ show(getLineNum(pos)) ++ ":" ++ show(getColumnNum(pos)))

The key point is the call to “head(tokenList)” this grabs the first token in the tokenList which is the token that caused the parse error.

You can download the source code and check it out here. To test the error reporting run “make” and then

bbell@bbell-desktop:~/NewL-Compiler/parser_with_error_reporting$ cat ../test_files/test1.txt  | ./newl 
newl: parse error at line 3 and column 9