Introduction to LLVM
LLVM Notes and Resources
This is the CMPT 489/886 page for LLVM (January 2016).
LLVM is a compiler infrastructure that can be used with many different programming languages and target architectures.
LLVM IR: A Common Intermediate Representation
- Separate front-ends for each programming language are built for each language to parse programs in the language and translate them to LLVM IR.
- (Clang) C/C++ to LLVM IR.
- D to LLVM IR
- Haskell to LLVM IR
- Swift to LLVM IR
- Separate back-ends for each target architecture translate the platform-independent IR into platform-specific assembly code.
- LLVM IR to Intel Architecture 32-bit (x86).
- LLVM IR to Intel/AMD 64-bit architecture.
- LLVM IR to Power PC
- LLVM IR to Cell BroadBand Engine (Playstation)
- LLVM IR to ARM (mobile CPUs)
- LLVM IR to Nvidia GPUs
- Reduces the problem of creating \(m \times n \) compilers (\(m\) front-ends, \(n\) back-ends) to \(m + n\).
LLVM IR: Not Just Internal
- Many compiler systems have internal intermediate representations.
- LLVM IR has a defined printable syntax and semantics as given by the LLVM Language Reference Manual.
- LLVM IR can actually be used to directly write programs!
Rich, Strongly-Typed IR
The first-class types represent values that can be stored in registers and computed by instructions.
- Integers of any width in bits.
i1
- 1 bit integers, used for Boolean values.i8
,i16
,i32
,i64
: common integer typesi128
,i256
: wide integers using SIMD registersi43
: 43-bit integers are syntactically valid, but not directly supported by typical back-ends.
- Floating point types
half
: 16 bitsfloat
: 32 bitsdouble
: 64 bitsfp128
: 128 bits
- Pointer Types
i8*
byte pointersdouble*
- Vectors of Integers, Floats
- abstraction of SIMD registers, e.g., 128-bit SSE2 registers on Intel
- treated as first class types
<16 x i8>
: 16 8-bit integers<8 x i16>
: 8 16-bit integers<2 x i64>
<4 x float>
<2 x i8*>
- More info on the LLVM Type System
Instructions
- The statements of the LLVM language are instructions operating on the first-class types.
- Most instructions are in 3-operand form: an operation on two register values produces a single result value.
- Control flow instructions are limited:
- branch instructions to a single label
- conditional branch instructions to one of two labels depending on an
i1
value - indirect branch instructions through address calculation
- call instructions
- Well-formed LLVM IR is in static-single assignment (SSA) form.
- There is only one assignment (definition) of any given variable.
- Phi-nodes are used to produce a single value from multiple paths.
IR Representations of High-Level Languages
Writing Passes
Building
Updated Mon Feb. 15 2016, 13:15 by cameron.