CMPT 479/980 Compiler Technology - SIMD Parallel Processing
Course Synopsis
The most significant changes to modern instruction set architecture are the deployment of additional single-instruction, multiple-data (SIMD) parallel programming capabilities. For example, recent Intel chips have recently incorporate AVX-512 technology for simultaneously processing 512 bits of data at a time, arranged as 8 64-bit doubles or long long integers, 16 32-bit floats or integers, 32 16-bit integers, 64 bytes or 512 bits. General purpose programming on GPUs can similarly exploit the SIMT (single-instruction, multiple thread) model. Parallel programming using such capabilities can dramatically increase software performance, but at a significant cost in programmer productivity. In order to increase both performance and productivity, new SIMD and SIMT programming facilities are needed to automate the generation of high-quality software. In this course, students will study the development of new parallel programming facilities implemented using the LLVM compiler infrastructure as well as the Parabix parallel bit stream technology developed at Simon Fraser University. Students will implement at least one significant compiler component using the LLVM and/or Parabix framework.
Course Structure
CMPT 479/980 is a combined undergraduate/graduate research seminar course. Grading is based on in-class participation including quizzes and exercises (50%) and a final project (50%). Graduate student projects have higher expectations, including both a 30-minute in-class presentation and a formal literature review as part of their final project submission.
Contact Info
- Professor
- Rob Cameron
- email @sfu.ca: cameron
- Teaching Assistant
- Nigel Medforth
- email @sfu.ca: nmedfort
- Zoom Office Hours
- Sunday 4:00pm-5:30pm
- Tuesday 4:00pm-5:30pm
- Thursday 4:00pm-5:30pm
- https://sfu.zoom.us/s/3114144524
Readings and Weekly Quizzes
- There will be lightweight weekly quizzes each class based on assigned readings. Your best grades (dropping the two lowest grades) on these quizzes count 40% of your class participation mark (that, is 20% of the total mark).
SIMD Resources
Several reading assignments will be given based on the following text:
Christopher J. Hughes, "Single-Instruction Multiple-Data Execution," in Margaret Martonosi (ed.), Synthesis Lectures on Computer Architecture, Springer, 2015. Available online through the SFU Library.
SIMD Instruction Reference for Intel: Intel Intrinsics Guide
LLVM Resources
- LLVM Documentation
- LLVM for Grad Students
- Asher, Yosi Ben Asher and Nadav Rotem, "Hybrid Type Legalization for a Sparse SIMD Instruction Set," ACM Transactions on Architecture and Code Optimization, 10(3), September 2013, Article 11. Available online through the SFU Library.
- Implementing a JIT Compiled Language with Haskell and LLVM
Reading/Quiz Schedule
- Q1, Hughes, Chapter 2 up to and including section 2.3 - Quiz Monday January 13 2025
- Q2, Sampson, LLVM for Grad Students - Quiz Monday January 20 2025
- Q3, Hughes, Chapter 2 sections 2.4 and 2.5 - Quiz Monday January 27 2025
- Q4, Hughes, Chapter 5 Horizontal Operations, up to and including section 5.3 plus notes on Horizontal SIMD - Quiz Monday February 03 2025
- Q5, Hughes, Chapter 3 Computation and Control Flow - Quiz Monday February 10 2025
- Q6, Review Quiz - Quizzes 1 through 5 Monday February 24 2025
- Q7, Hughes, Chapter 4 Memory Operations - Quiz Monday March 03 2025
- Q8, Notes on SIMD Type Systems and SWAR - Quiz Monday March 10 2025
- Q9, Parabix Notes, Assignments 1 and 2 - Quiz Monday March 17 2025
- Q10, Review Quiz - Quizzes 1 through 5, 7 through 9 Monday March 24 2025
Assignments
- HexLines Assignment - due Sunday January 26 2025, 19:00
- Multiblock Kernel Assignment - due Wednesday March 12 2025, 22:00
Course Project Overview
Project Definition
All students are expected to participate in a course project involving SIMD Compiler technology within an existing open source framework. Students will be expected to make a significant, high-quality source-code contribution to the overall framework together with a set of test cases that demonstrate the contribution.
Possible projects include:
- Target-independent SIMD abstractions implemented as LLVM IR transformation passes (LLVM framework).
- Target-dependent SIMD code-generation for a particular back-end instruction set architecture (LLVM framework).
- Contributions or novel applications of the Parabix-LLVM framework based on bitwise data parallelism.
- Contributions to the Intel SPMD Compiler.
- Specific Project Ideas
- Byte-Oriented SpreadByMask and FilterByMask
- Project: Transliterator Compiler
- UTF Compiler
- Parabix on NVPTX
- Parabix on ARM
Project Requirements
- Project Proposal and Updates
- Groups present updates on their projects
- Prepare, present and submit a 2-page PDF.
- Grades based on participation.
- Final Project Group Report (Wednesday April 09 2025)
- Introduction
- Design
- Implementation Details
- Implementation Evaluation
- Conclusion: Lessons Learned and Further Work
- Appendixes: Source Code, Scripts, Sample Data
- Marking Rubric for Final Report
- Individual Write-up (Sunday April 13 2025)
- Description of Role in Overall Project
- Graduate students: Related Literature and Research Contribution
- Marking Rubric
- Leadership (4 pts) 4 = Strong group leadership
- Contribution (4 pts) 4 = Strong overall contribution
- Insight (2 pts) 2 = Insightful individual comments
- Final Group Poster Presentation (Monday April 07 2025, 17:30)
- 5-8 page PDF presentation/slide show.
- Poster Review (Monday April 07 2025, 20:30)
- Your individual comments/feedback on each project/poster.
AVX-512 Machines
Course Notes
- May 8
- May 15
- May 22
- May 29
- June 5
- IDISA: Inductive Doubling Instruction Set Architecture
- Defines vertical and horizontal SIMD operations for power-of-2 field widths.
- A generic IDISA builder implements operations independent of any particular SIMD instructions. idisa_builder.cpp
- Specific builders using SSE, SSE2, and SSSE3 instructions override generic implementations. idisa_sse_builder.cpp
- Other builders to take advantage of AVX2 and AVX-512 instructions are also defined.
- NVPTX and ARM builders are defined - but have not been maintained.
- Transducer Example: CSV to JSON
- IDISA: Inductive Doubling Instruction Set Architecture
- June 12
- LLVM Basic Blocks
- Basic Blocks are straight line sequences of Instructions.
- Basic blocks consist of:
- A labelled entry point.
- One or more Phi nodes (see below).
- One or more instructions that compute values.
- A final terminator instruction to transfer control
- usually unconditional branch, conditional branch, or return.
- Instructions in Basic Blocks are in
Static Single Assignment form.
- Every variable (register number or name) is defined only once.
- PHI nodes allow selection of values depending on control flow.
- Programming LLVM IR in Parabix
- Bitwise logic kernels are most conveniently programmed in the Pablo language.
- Other kernels are programmed directly in terms of LLVM IR.
- For example, the Expand3_4 Kernel illustrates programming a loop in terms of LLVM Basic Blocks and instructions.
- LLVM Basic Blocks
- June 19
- June 26