Introduction to LLVM Compiler Infrastructure
LLVM Overview
- LLVM is a compiler infrastructure that can be used with many different programming languages and target architectures.
- Extensive documentation may be found on-line: https:llvm.org/docs/.
- The key to LLVM is its Intermediate Representation (LLVM IR) which acts like a high-level machine-independent assembly language.
LLVM IR: A Common Intermediate Representation
- Separate front-ends for each programming language are built for each language to parse programs in the language and translate them to LLVM IR.
- (Clang) C/C++ to LLVM IR.
- D to LLVM IR
- Haskell to LLVM IR
- Swift to LLVM IR
- Separate back-ends for each target architecture translate the platform-independent IR into platform-specific assembly code.
- LLVM IR to Intel Architecture 32-bit (x86).
- LLVM IR to Intel/AMD 64-bit architecture.
- LLVM IR to Power PC
- LLVM IR to Cell BroadBand Engine (Playstation)
- LLVM IR to ARM (mobile CPUs)
- LLVM IR to Nvidia GPUs
- Reduces the problem of creating \(m \times n \) compilers (\(m\) front-ends, \(n\) back-ends) to \(m + n\).
LLVM IR: Not Just Internal
- Many compiler systems have internal intermediate representations.
- LLVM IR has a defined printable syntax and semantics as given by the LLVM Language Reference Manual.
- LLVM IR can actually be used to directly write programs!
Rich, Strongly-Typed IR
The first-class types represent values that can be stored in registers and computed by instructions.
- Integers of any width in bits.
i1
- 1 bit integers, used for Boolean values.i8
,i16
,i32
,i64
: common integer typesi128
,i256
: wide integers (using SIMD registers)i43
: 43-bit integers are syntactically valid, but not directly supported by typical back-ends.
- Floating point types
half
: 16 bitsfloat
: 32 bitsdouble
: 64 bitsfp128
: 128 bits
- Pointer Types
i8*
byte pointersdouble*
- Vectors of Integers, Floats
- abstraction of SIMD registers, e.g., 128-bit SSE2 registers on Intel
- treated as first class types
<16 x i8>
: 16 8-bit integers<8 x i16>
: 8 16-bit integers<2 x i64>
<4 x float>
<2 x i8*>
- More info on the LLVM Type System
Instructions
- The statements of the LLVM language are instructions operating on the first-class types.
- Most instructions are in 3-operand form: an operation on two register values produces a single result value.
%result = add i32 %x, %y
... add of 32-bit integers%result = add <4 x i32> %x, %y
... SIMD add of 128-bit vectors
- Boolean instructions produce an i1 value.
%flag = icmp ult i16 %a, %b
- Control flow instructions are limited:
- branch instructions to a single label
- call instructions
- ret instructions
- conditional branch instructions to one of two labels depending on an
i1
value
Test: %cond = icmp eq i32 %a, %b br i1 %cond, label %IfEqual, label %IfUnequal IfEqual: ret i32 1 IfUnequal: ret i32 0
- Well-formed LLVM IR is in static-single assignment (SSA) form.
- There is only one assignment (definition) of any given variable.
- Phi-nodes are used to produce a single value from multiple paths.
LoopHeader: ... Loop: ; Infinite loop that counts from 0 on up... %loopvar = phi i32 [ 0, %LoopHeader ], [ %nextloopvar, %Loop ] %nextloopvar = add i32 %loopvar, 1 br label %Loop
Updated Tue Dec. 17 2024, 05:25 by cameron.