Lab Exercise 1
Provided code: lab1.zip.
Setup
Get yourself into a Linux environment.
- In CSIL, that might require rebooting from Windows to Linux.
- On your own Windows computer, you might need to install Windows Subsystem for Linux or another virtual machine that can run a Linux OS.
- Or with Windows, you could dual boot Linux.
- On a Mac, you will need some kind of virtual machine software (technically x86 emulation) to create a VM and install Linux.
- Use VS Code's SSH functionality to edit on your computer but compile/run on one of the CSIL Linux workstations (but not one of the csil-cpu servers: their processors are too old).
Get the basic development tools installed (they are already installed in CSIL): gcc, as, ld. If you're on a Debian/Ubuntu/Mint or similar Linux distribution, you can install the build-essential package to get them.
Compiling and Assembling
Recall from lecture, when I say "compile", what I mean is:
gcc -Wall -Wpedantic -std=c17 -march=haswell
Or possibly:
clang -Wall -Wpedantic -std=c17 -march=haswell
In the provided lab1.zip (link at the top of this page), there is a file array_sum.S that contains assembly code implementing a function array_sum that sums the elements of an array of int32_t values. The C signature for this function can be found in array_sum.h.
The provided array_sum_speed.c uses this function: it creates a random array, gets its sum with the array_sum function, and times it.
- Assemble
array_sum.Sto producearray_sum.o. - Compile
array_sum_speed.cto producearray_sum_speed.o. - Link the two
.ofiles to produce an executable (possibly nameda.out, or possibly something else). - Run the executable, giving a command line argument "1000000" for the array size.
Do not do unnecessary steps (like run the preprocessor by hand) as demonstrated in lecture. That was a demo of the tools, not a practical method to create an executable. Each of those should be a single command.
❓ [← that's a signal that you should look at the "Questions" section below, because there's something there to answer about this part.]
Replace Assembly with C
Create an array_sum.c file that contains a C implementation of the array_sum function: it should have a signature that matches array_sum.h and return the sum of elements 0 to n-1 of the array. It should return the same result as the assembly implementation in all cases.
Repeat the above, but replace the first step ("assemble") with a gcc or clang command that compiles your C implementation to create array_sum.o.
Compare the relative speed of the two array_sum implementations (using the time taken which is printed by the main function). ❓
You may get more accurate timing results if you prevent your CPU from slowing down to save power: see the Timing Code page for info.
Compile it again, but better
Add the -O3 switch to the command that compiles array_sum.c and repeat the timing. ❓
Replace C with Assembly
See the provided multiply_add.c, multiply_add.h, and multiply_add_test.c files.
The multiply_add.c and multiply_add.h describe a simple C function. Code to test it can be found in multiply_add_test.c.
Write an assembly replacement multiply_add.S that can be used in place of multiply_add.c. It should define a multiply_add function that behaves exactly the same as the C implementation.
Hint: the instructions that you need to do the integer arithmetic are add and imul. You can find the three function arguments in %rdi, %rsi, and %rdx (in that order). The result is returned by putting it in %rax.
Suggestion: first create a function that always returns 0. Test. Then modify so it always returns c, then b*c, then a+b*c.
Questions
Answer these questions in a text file answers.txt. [Generally, these questions should be answered in a few sentences each.]
- What were the four commands you used to assemble, compile, link, and run in the "Compiling and Assembling" section?
- What was the first command you used to compile
array_sum.c? How much faster/slower was your C implementation compared to the provided assembly? - How did the speed of the C and assembly implementations of
array_sumcompare after you added-O3to the compilation command? How did C speed compare with and without-O3?
Submit
Submit your work to Lab 1 in CourSys.