Not logged in. Login

Introduction to LLVM

LLVM Overview

LLVM is a compiler infrastructure that can be used with many different programming languages and target architectures.

LLVM IR: A Common Intermediate Representation

  • Separate front-ends for each programming language are built for each language to parse programs in the language and translate them to LLVM IR.
    • (Clang) C/C++ to LLVM IR.
    • D to LLVM IR
    • Haskell to LLVM IR
    • Swift to LLVM IR
  • Separate back-ends for each target architecture translate the platform-independent IR into platform-specific assembly code.
    • LLVM IR to Intel Architecture 32-bit (x86).
    • LLVM IR to Intel/AMD 64-bit architecture.
    • LLVM IR to Power PC
    • LLVM IR to Cell BroadBand Engine (Playstation)
    • LLVM IR to ARM (mobile CPUs)
    • LLVM IR to Nvidia GPUs
  • Reduces the problem of creating \(m \times n \) compilers (\(m\) front-ends, \(n\) back-ends) to \(m + n\).

LLVM IR: Not Just Internal

Rich, Strongly-Typed IR

The first-class types represent values that can be stored in registers and computed by instructions.

  • Integers of any width in bits.
    • i1 - 1 bit integers, used for Boolean values.
    • i8, i16, i32, i64 : common integer types
    • i128, i256: wide integers (using SIMD registers)
    • i43: 43-bit integers are syntactically valid, but not directly supported by typical back-ends.
  • Floating point types
    • half: 16 bits
    • float: 32 bits
    • double: 64 bits
    • fp128: 128 bits
  • Pointer Types
    • i8* byte pointers
    • double*
  • Vectors of Integers, Floats
    • abstraction of SIMD registers, e.g., 128-bit SSE2 registers on Intel
    • treated as first class types
    • <16 x i8>: 16 8-bit integers
    • <8 x i16>: 8 16-bit integers
    • <2 x i64>
    • <4 x float>
    • <2 x i8*>

Instructions

  • The statements of the LLVM language are instructions operating on the first-class types.
  • Most instructions are in 3-operand form: an operation on two register values produces a single result value.
    • %result = add i32 %x, %y ... add of 32-bit integers
    • %result = add <4 x i32> %x, %y ... SIMD add of 128-bit vectors
  • Boolean instructions produce an i1 value.
    • %flag = icmp ult i16 %a, %b
  • Control flow instructions are limited:
    • branch instructions to a single label
    • call instructions
    • ret instructions
    • conditional branch instructions to one of two labels depending on an i1 value
Test:
  %cond = icmp eq i32 %a, %b
  br i1 %cond, label %IfEqual, label %IfUnequal
IfEqual:
  ret i32 1
IfUnequal:
  ret i32 0
  • Well-formed LLVM IR is in static-single assignment (SSA) form.
    • There is only one assignment (definition) of any given variable.
    • Phi-nodes are used to produce a single value from multiple paths.
LoopHeader: ...
Loop:       ; Infinite loop that counts from 0 on up...
  %loopvar = phi i32 [ 0, %LoopHeader ], [ %nextloopvar, %Loop ]
  %nextloopvar = add i32 %loopvar, 1
  br label %Loop
Updated Fri Jan. 05 2018, 08:04 by cameron.