Department of Electrical Engineering and Computer Science #### MASSACHUSETTS INSTITUTE OF TECHNOLOGY # $^{6.035~{ m Fall}}$ ${f Test~III}$ You have 50 minutes to finish this quiz. Write your name and athena username on this cover sheet. Some questions may be harder than others. Read them all through first and attack them in the order that allows you to make the most progress. If you find a question ambiguous, be sure to write down any assumptions you make. Be neat. If we can't understand your answer, we can't give you credit! This exam is open book and open laptop. Additionally, you may access the course website, but aside from that you may NOT USE THE NETWORK. Please do not write in the boxes below. | I (xx/??) | II $(xx/??)$ | III $(xx/??)$ | IV (xx/??) | Total $(xx/??)$ | |-----------|--------------|---------------|------------|-----------------| | | | | | | | | | | | | | | | | | | | 17 | פוע | m | $\boldsymbol{\circ}$ | |----|-----|-----|----------------------| | Τ, | ١a | LLL | ıe: | | | | | | #### Athena username: ## I Register Allocation In this problem you will perform register allocation for the following Decaf program code. Note that each instruction is numbered. The functions get\_int() and printf() are callout functions. ``` void main() { int x, y, z; 1: x = get_int() y = get_int() while (x > 0) { x = x - y; z = x * y; 5: 6: if (z > -32) { 7: z = z - y; } else { 8: z = z + y; printf (z); 9: } ``` 1. [6 points]: Write the set of def-use chains for each variable in the program. Write each def-use chain as number pair (d, u) where d is the number of an instruction that defines the variable and u is the number of an instruction that uses that definition. x: у: z : | | - | - | Write the chains that | | | | _ | _ | | | | |-----|----|----------------|-----------------------|--------------|-----------|----------|----------|----------|------|-----------|----| | x : | | | | | | | | | | | | | у: | | | | | | | | | | | | | z : | | | | | | | | | | | | | | | | | | | | | | | | | | 3. | [6 | $_{ m points}$ | : Draw | the interfer | rence gra | ph for t | the webs | that you | have | identifie | d. | Specifically, each node in your graph should represent one web and there should be an edge between two nodes if the two webs interfere. Label each node with the name of the web it represents. | 4. [6 points]: What is the minimum number of registers that we can use to store all variables without spilling? Justify your answer. | | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--| | | | | | | | | | | | | | | | | | | | 5. [3 points]: Assuming that you compile this Decaf program to the x86 architecture, which register would you select for the variable z to minimize the number of the generated assembly instructions? | | | | | | | | | | | | 6. [3 points]: Assuming that you compile this Decaf program to the x86 architecture, which register would you select for the variable y to minimize the number of the generated assembly instructions? | | | | | | | | ## II Loop Identification and Induction Variables Consider the following snippet of code (n is a constant, A is a global array; assume that the local variables are not used after the loops): ``` i = 0; t0 = 0; t1 = 0; t2 = 0; 2: while ( t0 < n ) { 3: t0 = 2 * i + 1; for j = 0 to i { 4: t1 = 2 * t0 + j; 5: 6: t2 = 7 * t1; 7: A[i, j] = (j * t1) + t2; if (t1 < 0) { 8: 9: t0 = t0 * t1; 10: i = i + 1; } ``` 7. [8 points]: Draw the control-flow graph and the dominator tree of this code side-by-side below. Mark the nodes that contain the statements with the corresponding numbers. | 8. [2 points]: On the previous CFG, mark the header node of the outer loop with HO and the header node of the inner loop with HI. Explain below why these nodes represent headers | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | | | | | | | | | 9. [5 points]: Identify the basic and the derived induction variables for the inner loop. | | | | | | | | 10 [E maintal. Is to an industian unniable in the outen lean? Euplain why an why not | | 10. [5 points]: Is to an induction variable in the outer loop? Explain why or why not If not, is there a way to transform the program to make to an induction variable? | | | | | 11. [5 points]: Find a loop invariant code in either the inner or the outer loop. For each such piece of code, specify its line. Write below the optimized loop, after applying the loop-invariant code movement transformation. 12. [5 points]: Can we use the result of the induction variable analysis to perform strength reduction in the inner loop? If yes, write the optimized loop body. If no, explain why. ## III Instruction Scheduling 8: Consider a simple machine that executes the instructions as follows: - ALU instructions are executed within one clock cycle - $\bullet\,$ Memory load instructions are executed within two clock cycles - Memory store instructions are executed within one clock cycle Consider the following snippet of assembly code (we use local variables to represent stack locations): ``` 1: mov 0(%rbp), %r0 2: mov -8(%rbp), %r1 3: mul $3, %r0 4: add %r0, %r1 5: sub $1, %r0 6: mov -16(%rbp), %r2 7: add %r0, %r2 ``` mov %r1, -16(%rbp) 13. [7 points]: Draw the dependence DAG for this code. Label each edge with the latency between the instructions. 14. [4 points]: Consider the machine with a single memory and a single ALU unit. Assume that the CPU executes without instruction scheduling. Fill in the table with the labels L1 to L8 for the instructions from the previous page that execute in each cycle. In how many cycles does the base execution complete? | | | Clock | | | | | | | | | | | | | |-------------|---|-------|---|---|---|---|---|---|---|----|----|----|----|----| | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | | Memory Unit | | | | | | | | | | | | | | | | ALU | | | | | | | | | | | | | | | 15. [7 points]: Consider the machine with a single memory and a single ALU unit. Use list scheduling algorithm to schedule the instructions and fill in the table below (use labels L1 to L8 for the corresponding instructions). What is the speedup compared to the base execution? | | | Clock | | | | | | | | | | | |-------------|---|-------|---|---|---|---|---|---|---|----|--|--| | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | | | | Memory Unit | | | | | | | | | | | | | | ALU | | | | | | | | | | | | | #### Speedup: 16. [7 points]: Consider now the machine with *two* stages memory unit (which can issue one memory operation in each cycle) and a single ALU unit. Use list scheduling algorithm to schedule the instructions and fill in the table below (use labels L1 to L8 for the corresponding instructions). What is the speedup compared to the base execution? | | | Clock | | | | | | | | | | | |---------------|---|-------|---|---|---|---|---|---|---|----|--|--| | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | | | | Memory Unit 1 | | | | | | | | | | | | | | Memory Unit 2 | | | | | | | | | | | | | | ALU | | | | | | | | | | | | | #### Speedup: #### IV Loop Optimization Consider the following loop (A is a global array, y and t are local variables, % is the modulo operator): ``` for i = 1 to n { t = A[i] / A[i-1]; if ( i%2 == 1 ) { t = -t; } y = y + t; } ``` 17. [10 points]: Unroll the loop body two times. Write down the semantically equivalent unrolled loop (use BODY(J) to denote the three statements in the body of the loop for an induction variable J). 18. [5 points]: Outline the analysis and the transformation that would remove unnecessary if conditionals from the body of the unrolled loop.