Lab Exercise 12
Provided code: lab12.zip.
This week, we will revisit some C code we have seen before. It's provided a
lab12c.c in the ZIP file.
Use the Compiler Explorer to examine the way the compiler/optimizer deals with this code, specifically the following…
hailstone_length is a recursive implementation of the "number of hailstone steps to get to one" problem from earlier in the semester.
Have a look at that function in Compiler Explorer with GCC and compiler options
-std=c17 -march=haswell and see how it compiles. Compare that with the compilation at the different levels of compiler optimization:
Last week, one of the question proposed that the C "map polynomial" implementation was vectorized but the "dot product" wasn't. Let's check.
Compare the implementations of
-O3. If the compiler is going to vectorize as we did last week, we'd hope to see it at -O3. ❓
In each case, the first step will be: find the main loop and ignore the rest. If you can identify the code that does most of the calculation, that's what needs your attention: it's usually the first loop. The following code that cleans up the "array size not divisible by SIMD register size" elements isn't going to be what takes the time.
The compilation of
dot_single_c is deceptive. It seems to be using the SIMD instructions, but very differently that we did. Remember that it can't reorder the floating point operations and make a reasonable guess about what it's doing (without trying to unravel every instruction, which would be somewhere between painful and impossible).
You can't spell
-funsafe-math-optimizations without "fun"
When we wrote our vectorized assembly implementations, we didn't care about reordering floating point operations. Let's give the compiler the same benefit: add the
Now do those functions use SIMD instructions as you did last week? ❓.
vfmadd* instructions are fused-multiply-add that do both a mulltiplication and addition as one operation.
How much fun was it?
timing.c test from last week with
-O3 -funsafe-math-optimizations. ❓
- Without any compiler optimization, where are the local variables (
is_even) stored in
hailstone_length? How does that change at
-O2, how is
hailstone_length, a very surprising optimization occurs between
-O2. What? (Hint: look for the recursive
-O3? How can you tell?
- How did that change with
- How did the lab 11 performance change with
-funsafe-math-optimizations? How did the C compare to the hand-written assembly or vectorclass implementations now?
Submit your work to Lab 12 in CourSys.