The Basics of LLVM (Compiler Series Part X)

For the first pass at using LLVM for code generation we’re only generating code for simple expression of the form “1 + 2” etc. This frees us from having to consider many many issues such as how to deal with the memory layout of objects. In future passes we relax the conditions on code generation.

I advise you to read the LLVM Haskell documentation at http://hackage.haskell.org/package/llvm (it’s under the modules section) to familiarize yourself with the API. There are two parts to the LLVM Haskell bindings, a low level interface to the C API, and a high level wrapper. I tried using the high level wrapper but the type declarations confused me. I’m not that fluent in Haskell and the high level wrapper makes extensive use of type classes, monads, and a few other constructs. I decided to use the low level wrapper (Core/Util.hs) around the C API, which uses less of Haskell’s advanced type features.

With the above said, I want to dive in and create some LLVM IR code. A simple NewL program is:

class Main {
    public static void main(string[] args) {
        System.out.println("");
    }
}
class Foo 
{
   public int foo()
   { 
      return 1 + 2;
   }
}

For now I’m ignoring the “Main” class and concentrating only on:

   public int foo()
   { 
      return 1 + 2;
   }

Our plan is to traverse the parse tree and generate code at each node. The parse tree for “foo” is shown below:

To use LLVM to generate code we need to know some LLVM concepts. To use LLVM we need to create a module by calling “Util.createModule.” Util is a wrapper that provides some syntactic sugar for calling the C API of LLVM. One note is that almost any call to the LLVM returns “IO returnType” that is all the return types are in the IO monad. The reason is that most LLVM calls are not pure functions.

To create function a function LLVM two things are needed the module, m, that the function will reside in and the function type definition. The below code creates the function type defintion.

getLLVMType :: TypeNames.Type -> FFI.TypeRef
getLLVMType TypeInt = FFI.int64Type
getLLVMType TypeBool = FFI.int1Type
getLLVMType _ = error("Unimplemented type")

getArgTypes :: FormalList -> [FFI.TypeRef]
getArgTypes FEmpty = []
getArgTypes (FormalList theType ident FEmpty) = [(getLLVMType theType)]
getArgTypes (FormalList theType ident rest) =
    [(getLLVMType theType)] ++ (getArgTypes rest) 

methodType = Util.functionType False (getLLVMType theType) (getArgTypes formalList)

With “methodType” defined we add the function (which is so far empty) to the module, m, by calling

  method <- Util.addFunction m FFI.ExternalLinkage methodName methodType

We now have an empty function called “method.” We create a builder to emit instructions and add a basic block to the method using the below code:

  bld <- U.createBuilder
  entry <- U.appendBasicBlock method "entry"
  U.positionAtEnd bld entry

With the basic block added it is time to emit the code to compute the expression “return 1+2;” We do this with the following code.

  withBuilder bld $ \bldRef ->
      do
        expVal <- codeGenExp exp bld context
        FFI.buildRet bldRef expVal 

The call to “withBuilder” takes some explaining. It takes a “Builder” object and a function of type “BuilderRef -> IO a” and returns “IO a” where “a” is any type. We need “withBuilder” because the function “FFI.buildRet” has type “BuilderRef -> ValueRef -> IO ValueRef.” That is we can’t pass a “Builder” object to “FFI.buildRet” directly we have to pass a “BuilderRef” to it.

If you’re paying attention you notice that only the line “FFI.buildRet bldRef expVal ” actually emits any code. Emitting the code to generate the value of the expression “exp” is left the to the “codeGenExp” function. That code is not interesting and you can view it in the code listing at the end of this post. You are probably interested in what kind of LLVM IR code we generate from the function. See the below listing for what it looks like.

define i64 @foo()
{
entry:
  ret i64 3
}

The “i64” symbol means that the function returns a 64 bit integer. And “ret i64 3” means return “3” as a 64 bit integer.

In my next post I’ll continue explore using LLVM to generate code. Specifically I want to show how to represent strings and other composite objects in LLVM.

The code listing for the entire code generation module is shown below. If you want to compile the code check it out of the github repository at http://github.com/bjwbell/NewL-Compiler and read the README file in the llvm_code_generation1 directory.

module CodeGen where
import TypeNames
import Data.Word
import Data.Int(Int32)
import Data.Typeable as T
import Foreign.C.String as CS
import System.IO.Unsafe
import LLVM.FFI.Core as FFI
import LLVM.Core.Util as U

codeGen :: Program -> IO String
codeGen (Program mainClass classDeclList) = codeGenClassDeclList classDeclList

codeGenClassDeclList :: ClassDeclList -> IO String
codeGenClassDeclList CEmpty = return "Ok"
codeGenClassDeclList (ClassDeclList classDecl classDeclList)  = 
    do
      result <- codeGenClassDecl classDecl
      if result /= "Ok"
        then return "fail: codeGenClassDeclList"
        else codeGenClassDeclList classDeclList
      return "Ok"

codeGenClassDecl :: ClassDecl -> IO String
codeGenClassDecl (ClassDecl className varDeclList methodDeclList) = 
    do
      m <- U.createModule className
      codeGenMethodDeclList methodDeclList m [("this", className)]

codeGenMethodDeclList :: MethodDeclList -> U.Module -> [(String, String)] -> IO String 
codeGenMethodDeclList MEmpty m context = return "Ok" -- no methods so automatically successful
codeGenMethodDeclList (MethodDeclList methodDecl methodDeclList) m context =
    do
      result <- codeGenMethodDecl methodDecl m context
      if result /= "Ok"
        then return "fail: codeGenMethodDeclList"
        else codeGenMethodDeclList methodDeclList m context


getLLVMType :: TypeNames.Type -> FFI.TypeRef
getLLVMType TypeInt = FFI.int64Type
getLLVMType TypeBool = int1Type
getLLVMType _ = error("Unimplemented type")

getArgTypes :: FormalList -> [FFI.TypeRef]
getArgTypes FEmpty = []
getArgTypes (FormalList theType ident FEmpty) = [(getLLVMType theType)]
getArgTypes (FormalList theType ident rest) =
    [(getLLVMType theType)] ++ (getArgTypes rest) 

codeGenMethodDecl :: MethodDecl -> U.Module -> [(String, String)] -> IO String
codeGenMethodDecl (MethodDecl theType methodName formalList varDeclList statementList exp) m context = do
  let methodType = U.functionType False (getLLVMType theType) (getArgTypes formalList)
  method <- U.addFunction m FFI.ExternalLinkage methodName methodType
  bld <- U.createBuilder
  entry <- U.appendBasicBlock method "entry"
  U.positionAtEnd bld entry
  withBuilder bld $ \bldRef ->
      do
        expVal <- codeGenExp exp bld context
        FFI.buildRet bldRef expVal 
  FFI.dumpValue method
  return "Ok"

codeGenExp :: Exp -> U.Builder -> [(String, String)] -> IO FFI.ValueRef
codeGenExp (ExpOp exp1 char exp2) bld context = 
    do
      val1 <- codeGenExp exp1 bld context
      val2 <- codeGenExp exp2 bld context
      case char of 
        '+' -> U.withBuilder bld $ \ bldRef ->
               withCString "" $ \ cString -> buildAdd bldRef val1 val2 cString
        '-' -> U.withBuilder bld $ \ bldRef ->
               withCString "" $ \ cString -> buildAdd bldRef val1 val2 cString
        '*' -> U.withBuilder bld $ \ bldRef ->
               withCString "" $ \ cString -> buildAdd bldRef val1 val2 cString
        '/' -> U.withBuilder bld $ \ bldRef ->
               withCString "" $ \ cString -> buildAdd bldRef val1 val2 cString
        _ -> error ("Unrecognized operator, " ++ (show char) ++ " in ExpOp")

codeGenExp (ExpInt value) bld context = 
    do
      return (constInt int64Type (fromIntegral value) (fromIntegral 64)) 
codeGenExp (ExpBool True) bld context = 
    do
      return (constInt int1Type (fromIntegral 1) (fromIntegral 1)) 
codeGenExp (ExpBool False) bld context = 
    do
      return (constInt int1Type (fromIntegral 0) (fromIntegral 1)) 
Advertisements

6 thoughts on “The Basics of LLVM (Compiler Series Part X)

  1. Pingback: Writing A Compiler In Haskell (Compiler Series Part I) « Math and More

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s