fbpx
Wikipedia

Strength reduction

In compiler construction, strength reduction is a compiler optimization where expensive operations are replaced with equivalent but less expensive operations.[1] The classic example of strength reduction converts strong multiplications inside a loop into weaker additions – something that frequently occurs in array addressing.(Cooper, Simpson & Vick 1995, p. 1)

Examples of strength reduction include replacing a multiplication within a loop with an addition and replacing exponentiation within a loop with a multiplication.

Code analysis edit

Most of a program's execution time is typically spent in a small section of code (called a hot spot), and that code is often inside a loop that is executed over and over.

A compiler uses methods to identify loops and recognize the characteristics of register values within those loops. For strength reduction, the compiler is interested in:

  • Loop invariants: the values which do not change within the body of a loop.
  • Induction variables: the values which are being iterated each time through the loop.

Loop invariants are essentially constants within a loop, but their value may change outside of the loop. Induction variables are changing by known amounts. The terms are relative to a particular loop. When loops are nested, an induction variable in the outer loop can be a loop invariant in the inner loop.

Strength reduction looks for expressions involving a loop invariant and an induction variable. Some of those expressions can be simplified. For example, the multiplication of loop invariant c and induction variable i

c = 7; for (i = 0; i < N; i++) {  y[i] = c * i; } 

can be replaced with successive weaker additions

c = 7; k = 0; for (i = 0; i < N; i++) {  y[i] = k;  k = k + c; } 

Strength reduction example edit

Below is an example that will strength-reduce all the loop multiplications that arose from array indexing address calculations.

Imagine a simple loop that sets an array to the identity matrix.

for (i = 0; i < n; i++) {  for (j = 0; j < n; j++)  {  A[i,j] = 0.0;  }  A[i,i] = 1.0; } 

Intermediate code edit

The compiler will view this code as

0010 ; for (i = 0, i < n; i++) 0020 ; { 0030 r1 = #0  ; i = 0 0040 G0000: 0050 load r2, n ; i < n 0060 cmp r1, r2 0070 bge G0001 0080 0090 ; for (j = 0; j < n; j++) 0100 ; { 0110 r3 = #0  ; j = 0 0120 G0002: 0130 load r4, n ; j < n 0140 cmp r3, r4 0150 bge G0003 0160 0170 ; A[i,j] = 0.0; 0180 load r7, n 0190 r8 = r1 * r7 ; calculate subscript i * n + j 0200 r9 = r8 + r3 0210 r10 = r9 * #8  ; calculate byte address 0220 fr3 = #0.0 0230 fstore fr3, A[r10] 0240 0250 r3 = r3 + #1  ; j++ 0260 br G0002 0270 ; } 0280 G0003: 0290 ; A[i,i] = 1.0; 0300 load r12, n ; calculate subscript i * n + i 0310 r13 = r1 * r12 0320 r14 = r13 + r1 0330 r15 = r14 * #8  ; calculate byte address 0340 fr4 = #1.0 0350 fstore fr4, A[r15] 0360 0370 ; i++ 0380 r1 = r1 + #1 0390 br G0000 0400 ; } 0410 G0001: 

This expresses 2-dimensional array A as a 1-dimensional array of n*n size, so that whenever the high-level code expresses A[x, y] it will internally be A[(x*n)+y] for any given valid indices x and y.

Many optimizations edit

The compiler will start doing many optimizations – not just strength reduction. Expressions that are constant (invariant) within a loop will be hoisted out of the loop. Constants can be loaded outside of both loops—such as floating point registers fr3 and fr4. Recognition that some variables don't change allows registers to be merged; n is constant, so r2, r4, r7, r12 can be hoisted and collapsed. The common value i*n is computed in (the hoisted) r8 and r13, so they collapse. The innermost loop (0120-0260) has been reduced from 11 to 7 intermediate instructions. The only multiply that remains in the innermost loop is line 0210's multiply by 8.

0010 ; for (i = 0, i < n; i++) 0020 { 0030 r1 = #0  ; i = 0 0050 load r2, n 0130 ; load r4, n killed; use r2 0180 ; load r7, n killed; use r2 0300 ; load r12, n killed; use r2 0220 fr3 = #0.0 0340 fr4 = #1.0 0040 G0000: 0060 cmp r1, r2 ; i < n 0070 bge G0001 0080 0190 r8 = r1 * r2 ; calculate subscript i * n 0310 ; r13 = r1 * r2 killed; use r8  ; calculate subscript i * n 0090 ; for (j = 0; j < n; j++) 0100 { 0110 r3 = #0  ; j = 0 0120 G0002: 0140 cmp r3, r2 ; j < n 0150 bge G0003 0160 0170 ; A[i,j] = 0.0; 0200 r9 = r8 + r3 ; calculate subscript i * n + j 0210 r10 = r9 * #8  ; calculate byte address 0230 fstore fr3, A[r10] 0240 0250 r3 = r3 + #1  ; j++ 0260 br G0002 0270 } 0280 G0003: 0290 ; A[i,i] = 1.0; 0320 r14 = r8 + r1 ; calculate subscript i * n + i 0330 r15 = r14 * #8  ; calculate byte address 0350 fstore fr4, A[r15] 0360 0370 ;i++ 0380 r1 = r1 + #1 0390 br G0000 0400 } 0410 G0001: 

There are more optimizations to do. Register r3 is the main variable in the innermost loop (0140-0260); it gets incremented by 1 each time through the loop. Register r8 (which is invariant in the innermost loop) is added to r3. Instead of using r3, the compiler can eliminate r3 and use r9. The loop, instead of being controlled by r3 = 0 to n-1, can be controlled by r9=r8+0 to r8+n-1. That adds four instructions and kills four instructions, but there's one fewer instruction inside the loop.

0110 ; r3 = #0 killed  ; j = 0 0115 r9 = r8 ; new assignment 0117 r20 = r8 + r2 ; new limit 0120 G0002: 0140 ; cmp r3, r2 killed  ; j < n 0145 cmp r9, r20 ; r8 + j < r8 + n = r20 0150 bge G0003 0160 0170 ; A[i,j] = 0.0; 0200 ; r9 = r8 + r3 killed  ; calculate subscript i * n + j 0210 r10 = r9 * #8  ; calculate byte address 0230 fstore fr3, A[r10] 0240 0250 ; r3 = r3 + #1 killed  ; j++ 0255 r9 = r9 + #1  ; new loop variable 0260 br G0002 

Now r9 is the loop variable, but it interacts with the multiply by 8. Here we get to do some strength reduction. The multiply by 8 can be reduced to some successive additions of 8. Now there are no multiplications inside the loop.

0115 r9 = r8 ; new assignment 0117 r20 = r8 + r2 ; new limit 0118 r10 = r8 * #8  ; initial value of r10 0120 G0002: 0145 cmp r9, r20 ; r8 + j < r8 + n = r20 0150 bge G0003 0160 0170 ; A[i,j] = 0.0; 0210 ; r10 = r9 * #8 killed  ; calculate byte address 0230 fstore fr3, A[r10] 0240 0245 r10 = r10 + #8  ; strength reduced multiply 0255 r9 = r9 + #1  ; loop variable 0260 br G0002 

Registers r9 and r10 (= 8*r9) aren't both needed; r9 can be eliminated in the loop. The loop is now 5 instructions.

0115 ; r9 = r8 killed 0117 r20 = r8 + r2 ; limit 0118 r10 = r8 * #8  ; initial value of r10 0119 r22 = r20 * #8  ; new limit 0120 G0002: 0145 ; cmp r9, r20 killed  ; r8 + j < r8 + n = r20 0147 cmp r10, r22 ; r10 = 8*(r8 + j) < 8*(r8 + n) = r22 0150 bge G0003 0160 0170 ; A[i,j] = 0.0; 0230 fstore fr3, A[r10] 0240 0245 r10 = r10 + #8  ; strength reduced multiply 0255 ; r9 = r9 + #1 killed  ; loop variable 0260 br G0002 

Outer loop edit

Back to the whole picture:

0010 ; for (i = 0, i < n; i++) 0020 { 0030 r1 = #0  ; i = 0 0050 load r2, n 0220 fr3 = #0.0 0340 fr4 = #1.0 0040 G0000: 0060 cmp r1, r2 ; i < n 0070 bge G0001 0080 0190 r8 = r1 * r2 ; calculate subscript i * n 0117 r20 = r8 + r2 ; limit 0118 r10 = r8 * #8  ; initial value of r10 0119 r22 = r20 * #8  ; new limit 0090 ; for (j = 0; j < n; j++) 0100 { 0120 G0002: 0147 cmp r10, r22 ; r10 = 8*(r8 + j) < 8*(r8 + n) = r22 0150 bge G0003 0160 0170 ; A[i,j] = 0.0; 0230 fstore fr3, A[r10] 0240 0245 r10 = r10 + #8  ; strength reduced multiply 0260 br G0002 0270 } 0280 G0003: 0290 ; A[i,i] = 1.0; 0320 r14 = r8 + r1 ; calculate subscript i * n + i 0330 r15 = r14 * #8  ; calculate byte address 0350 fstore fr4, A[r15] 0360 0370 ;i++ 0380 r1 = r1 + #1 0390 br G0000 0400 } 0410 G0001: 

There are now four multiplications within the outer loop that increments r1. Register r8 = r1*r2 at 0190 can be strength reduced by setting it before entering the loop (0055) and incrementing it by r2 at the bottom of the loop (0385).

The value r8*8 (at 0118) can be strength reduced by initializing it (0056) and adding 8*r2 to it when r8 gets incremented (0386).

Register r20 is being incremented by the invariant/constant r2 each time through the loop at 0117. After being incremented, it is multiplied by 8 to create r22 at 0119. That multiplication can be strength reduced by adding 8*r2 each time through the loop.

0010 ; for (i = 0, i < n; i++) 0020 { 0030 r1 = #0  ; i = 0 0050 load r2, n 0220 fr3 = #0.0 0340 fr4 = #1.0 0055 r8 = r1 * r2 ; set initial value for r8 0056 r40 = r8 * #8  ; initial value for r8 * 8 0057 r30 = r2 * #8  ; increment for r40 0058 r20 = r8 + r2 ; copied from 0117 0058 r22 = r20 * #8  ; initial value of r22 0040 G0000: 0060 cmp r1, r2 ; i < n 0070 bge G0001 0080 0190 ; r8 = r1 * r2 killed  ; calculate subscript i * n 0117 ; r20 = r8 + r2 killed - dead code 0118 r10 = r40 ; strength reduced expression to r40 0119 ; r22 = r20 * #8 killed  ; strength reduced 0090 ; for (j = 0; j < n; j++) 0100 { 0120 G0002: 0147 cmp r10, r22 ; r10 = 8*(r8 + j) < 8*(r8 + n) = r22 0150 bge G0003 0160 0170 ; A[i,j] = 0.0; 0230 fstore fr3, A[r10] 0240 0245 r10 = r10 + #8  ; strength reduced multiply 0260 br G0002 0270 } 0280 G0003: 0290 ; A[i,i] = 1.0; 0320 r14 = r8 + r1 ; calculate subscript i * n + i 0330 r15 = r14 * #8  ; calculate byte address 0350 fstore fr4, A[r15] 0360 0370 ;i++ 0380 r1 = r1 + #1 0385 r8 = r8 + r2 ; strength reduce r8 = r1 * r2 0386 r40 = r40 + r30 ; strength reduce expression r8 * 8 0388 r22 = r22 + r30 ; strength reduce r22 = r20 * 8 0390 br G0000 0400 } 0410 G0001: 

The last multiply edit

That leaves the two loops with only one multiplication operation (at 0330) within the outer loop and no multiplications within the inner loop.

0010 ; for (i = 0, i < n; i++) 0020 { 0030 r1 = #0  ; i = 0 0050 load r2, n 0220 fr3 = #0.0 0340 fr4 = #1.0 0055 r8 = r1 * r2 ; set initial value for r8 0056 r40 = r8 * #8  ; initial value for r8 * 8 0057 r30 = r2 * #8  ; increment for r40 0058 r20 = r8 + r2 ; copied from 0117 0058 r22 = r20 * #8  ; initial value of r22 0040 G0000: 0060 cmp r1, r2 ; i < n 0070 bge G0001 0080 0118 r10 = r40 ; strength reduced expression to r40 0090 ; for (j = 0; j < n; j++) 0100 { 0120 G0002: 0147 cmp r10, r22 ; r10 = 8*(r8 + j) < 8*(r8 + n) = r22 0150 bge G0003 0160 0170 ; A[i,j] = 0.0; 0230 fstore fr3, A[r10] 0240 0245 r10 = r10 + #8  ; strength reduced multiply 0260 br G0002 0270 } 0280 G0003: 0290 ; A[i,i] = 1.0; 0320 r14 = r8 + r1 ; calculate subscript i * n + i 0330 r15 = r14 * #8  ; calculate byte address 0350 fstore fr4, A[r15] 0360 0370 ;i++ 0380 r1 = r1 + #1 0385 r8 = r8 + r2 ; strength reduce r8 = r1 * r2 0386 r40 = r40 + r30 ; strength reduce expression r8 * 8 0388 r22 = r22 + r30 ; strength reduce r22 = r20 * 8 0390 br G0000 0400 } 0410 G0001: 

At line 0320, r14 is the sum of r8 and r1, and r8 and r1 are being incremented in the loop. Register r8 is being bumped by r2 (=n) and r1 is being bumped by 1. Consequently, r14 is being bumped by n+1 each time through the loop. The last loop multiply at 0330 can be strength reduced by adding (r2+1)*8 each time through the loop.

0010 ; for (i = 0, i < n; i++) 0020 { 0030 r1 = #0  ; i = 0 0050 load r2, n 0220 fr3 = #0.0 0340 fr4 = #1.0 0055 r8 = r1 * r2 ; set initial value for r8 0056 r40 = r8 * #8  ; initial value for r8 * 8 0057 r30 = r2 * #8  ; increment for r40 0058 r20 = r8 + r2 ; copied from 0117 0058 r22 = r20 * #8  ; initial value of r22 005A r14 = r8 + r1 ; copied from 0320 005B r15 = r14 * #8  ; initial value of r15 (0330) 005C r49 = r2 + #1 005D r50 = r49 * #8  ; strength reduced increment 0040 G0000: 0060 cmp r1, r2 ; i < n 0070 bge G0001 0080 0118 r10 = r40 ; strength reduced expression to r40 0090 ; for (j = 0; j < n; j++) 0100 { 0120 G0002: 0147 cmp r10, r22 ; r10 = 8*(r8 + j) < 8*(r8 + n) = r22 0150 bge G0003 0160 0170 ; A[i,j] = 0.0; 0230 fstore fr3, A[r10] 0240 0245 r10 = r10 + #8  ; strength reduced multiply 0260 br G0002 0270 } 0280 G0003: 0290 ; A[i,i] = 1.0; 0320 ; r14 = r8 + r1 killed  ; dead code 0330 ; r15 = r14 * #8 killed  ; strength reduced 0350 fstore fr4, A[r15] 0360 0370 ;i++ 0380 r1 = r1 + #1 0385 r8 = r8 + r2 ; strength reduce r8 = r1 * r2 0386 r40 = r40 + r30 ; strength reduce expression r8 * 8 0388 r22 = r22 + r30 ; strength reduce r22 = r20 * 8 0389 r15 = r15 + r50 ; strength reduce r15 = r14 * 8 0390 br G0000 0400 } 0410 G0001: 

There's still more to go. Constant folding will recognize that r1=0 in the preamble, so several instructions will clean up. Register r8 isn't used in the loop, so it can disappear.

Furthermore, r1 is only being used to control the loop, so r1 can be replaced by a different induction variable such as r40. Where i went 0 <= i < n, register r40 goes 0 <= r40 < 8 * n * n.

0010 ; for (i = 0, i < n; i++) 0020 { 0030 ; r1 = #0  ; i = 0, becomes dead code 0050 load r2, n 0220 fr3 = #0.0 0340 fr4 = #1.0 0055 ; r8 = #0 killed  ; r8 no longer used 0056 r40 = #0  ; initial value for r8 * 8 0057 r30 = r2 * #8  ; increment for r40 0058 ; r20 = r2 killed  ; r8 = 0, becomes dead code 0058 r22 = r2 * #8  ; r20 = r2 005A ; r14 = #0 killed  ; r8 = 0, becomes dead code 005B r15 = #0  ; r14 = 0 005C r49 = r2 + #1 005D r50 = r49 * #8  ; strength reduced increment 005D r60 = r2 * r30 ; new limit for r40 0040 G0000: 0060 ; cmp r1, r2 killed  ; i < n; induction variable replaced 0065 cmp r40, r60 ; i * 8 * n < 8 * n * n 0070 bge G0001 0080 0118 r10 = r40 ; strength reduced expression to r40 0090 ; for (j = 0; j < n; j++) 0100 { 0120 G0002: 0147 cmp r10, r22 ; r10 = 8*(r8 + j) < 8*(r8 + n) = r22 0150 bge G0003 0160 0170 ; A[i,j] = 0.0; 0230 fstore fr3, A[r10] 0240 0245 r10 = r10 + #8  ; strength reduced multiply 0260 br G0002 0270 } 0280 G0003: 0290 ; A[i,i] = 1.0; 0350 fstore fr4, A[r15] 0360 0370 ;i++ 0380 ; r1 = r1 + #1 killed  ; dead code (r40 controls loop) 0385 ; r8 = r8 + r2 killed  ; dead code 0386 r40 = r40 + r30 ; strength reduce expression r8 * 8 0388 r22 = r22 + r30 ; strength reduce r22 = r20 * 8 0389 r15 = r15 + r50 ; strength reduce r15 = r14 * 8 0390 br G0000 0400 } 0410 G0001: 

Other strength reduction operations edit

Operator strength reduction uses mathematical identities to replace slow math operations with faster operations. The benefits depend on the target CPU and sometimes on the surrounding code (which can affect the availability of other functional units within the CPU).

  • replacing integer division or multiplication by a power of 2 with an arithmetic shift or logical shift[2]
  • replacing integer multiplication by a constant with a combination of shifts, adds or subtracts
  • replacing integer division by a constant with a multiplication, taking advantage of the limited range of machine integers.[3] This method also works if divisor is a non-integer sufficiently greater than 1, e.g. √2 or π.[4]
Original calculation Replacement calculation
y+= 1 y++
y%2 != 0 y & 1
y = x * 2 y = x << 1
y = x / 2 y = x >> 1
y = x % 2 y = x & 1
y = x * 15 y = (x << 4) - x
y = (uint16_t)x / 10 y = ((uint32_t)x * (uint32_t)0xCCCD) >> 19)
y = (uint16_t)x / π y = (((uint32_t)x * (uint32_t)0x45F3) >> 16) + x) >> 2)

Induction variable (orphan) edit

Induction variable or recursive strength reduction replaces a function of some systematically changing variable with a simpler calculation using previous values of the function. In a procedural programming language this would apply to an expression involving a loop variable and in a declarative language it would apply to the argument of a recursive function. For example,

 f x = ... (3 ** x) ... (f (x + 1)) ... 

becomes

 f x = f' x 1  where  f' x z = ... z ... (f' (x + 1) (3 * z)) ... 

Here modified recursive function f takes a second parameter z = 3 ** x, allowing the expensive computation (3 ** x) to be replaced by the cheaper (3 * z).

See also edit

Notes edit

  1. ^ Steven Muchnick; Muchnick and Associates (15 August 1997). Advanced Compiler Design Implementation. Morgan Kaufmann. ISBN 978-1-55860-320-2. Strength reduction.
  2. ^ In languages such as C and Java, integer division has round-towards-zero semantics, whereas a bit-shift always rounds down, requiring special treatment for negative numbers. For example, in Java, -3 / 2 evaluates to -1, whereas -3 >> 1 evaluates to -2. So in this case, the compiler cannot optimize division by two by replacing it by a bit shift.
  3. ^ Granlund, Torbjörn; Peter L. Montgomery. "Division by Invariant Integers Using Multiplication" (PDF).
  4. ^ Jones, Nigel. "Division of integers by constants".

References edit

  • Aho, Alfred V.; Sethi, Ravi; Ullman, Jeffrey D. (1986), Compilers: Principles, Techniques, and Tools (2nd ed.), ISBN 978-0-201-10088-4
  • Allen, Francis E.; Cocke, John; Kennedy, Ken (1981), "Reduction of Operator Strength", in Munchnik, Steven S.; Jones, Neil D. (eds.), Program Flow Analysis: Theory and Applications, Prentice-Hall, ISBN 978-0-13-729681-1
  • Cocke, John; Kennedy, Ken (November 1977), "An algorithm for reduction of operator strength", Communications of the ACM, 20 (11): 850–856, doi:10.1145/359863.359888, S2CID 1092505
  • Cooper, Keith; Simpson, Taylor; Vick, Christopher (October 1995), Operator Strength Reduction (PDF), Rice University, retrieved April 22, 2010

strength, reduction, compiler, construction, strength, reduction, compiler, optimization, where, expensive, operations, replaced, with, equivalent, less, expensive, operations, classic, example, strength, reduction, converts, strong, multiplications, inside, l. In compiler construction strength reduction is a compiler optimization where expensive operations are replaced with equivalent but less expensive operations 1 The classic example of strength reduction converts strong multiplications inside a loop into weaker additions something that frequently occurs in array addressing Cooper Simpson amp Vick 1995 p 1 Examples of strength reduction include replacing a multiplication within a loop with an addition and replacing exponentiation within a loop with a multiplication Contents 1 Code analysis 2 Strength reduction example 2 1 Intermediate code 2 2 Many optimizations 2 3 Outer loop 2 4 The last multiply 3 Other strength reduction operations 4 Induction variable orphan 5 See also 6 Notes 7 ReferencesCode analysis editMost of a program s execution time is typically spent in a small section of code called a hot spot and that code is often inside a loop that is executed over and over A compiler uses methods to identify loops and recognize the characteristics of register values within those loops For strength reduction the compiler is interested in Loop invariants the values which do not change within the body of a loop Induction variables the values which are being iterated each time through the loop Loop invariants are essentially constants within a loop but their value may change outside of the loop Induction variables are changing by known amounts The terms are relative to a particular loop When loops are nested an induction variable in the outer loop can be a loop invariant in the inner loop Strength reduction looks for expressions involving a loop invariant and an induction variable Some of those expressions can be simplified For example the multiplication of loop invariant c and induction variable i c 7 for i 0 i lt N i y i c i can be replaced with successive weaker additions c 7 k 0 for i 0 i lt N i y i k k k c Strength reduction example editBelow is an example that will strength reduce all the loop multiplications that arose from array indexing address calculations Imagine a simple loop that sets an array to the identity matrix for i 0 i lt n i for j 0 j lt n j A i j 0 0 A i i 1 0 Intermediate code edit The compiler will view this code as 0010 for i 0 i lt n i 0020 0030 r1 0 i 0 0040 G0000 0050 load r2 n i lt n 0060 cmp r1 r2 0070 bge G0001 0080 0090 for j 0 j lt n j 0100 0110 r3 0 j 0 0120 G0002 0130 load r4 n j lt n 0140 cmp r3 r4 0150 bge G0003 0160 0170 A i j 0 0 0180 load r7 n 0190 r8 r1 r7 calculate subscript i n j 0200 r9 r8 r3 0210 r10 r9 8 calculate byte address 0220 fr3 0 0 0230 fstore fr3 A r10 0240 0250 r3 r3 1 j 0260 br G0002 0270 0280 G0003 0290 A i i 1 0 0300 load r12 n calculate subscript i n i 0310 r13 r1 r12 0320 r14 r13 r1 0330 r15 r14 8 calculate byte address 0340 fr4 1 0 0350 fstore fr4 A r15 0360 0370 i 0380 r1 r1 1 0390 br G0000 0400 0410 G0001 This expresses 2 dimensional array A as a 1 dimensional array of n n size so that whenever the high level code expresses A x y it will internally be A x n y for any given valid indices x and y Many optimizations edit The compiler will start doing many optimizations not just strength reduction Expressions that are constant invariant within a loop will be hoisted out of the loop Constants can be loaded outside of both loops such as floating point registers fr3 and fr4 Recognition that some variables don t change allows registers to be merged n is constant so r2 r4 r7 r12 can be hoisted and collapsed The common value i n is computed in the hoisted r8 and r13 so they collapse The innermost loop 0120 0260 has been reduced from 11 to 7 intermediate instructions The only multiply that remains in the innermost loop is line 0210 s multiply by 8 0010 for i 0 i lt n i 0020 0030 r1 0 i 0 0050 load r2 n 0130 load r4 n killed use r2 0180 load r7 n killed use r2 0300 load r12 n killed use r2 0220 fr3 0 0 0340 fr4 1 0 0040 G0000 0060 cmp r1 r2 i lt n 0070 bge G0001 0080 0190 r8 r1 r2 calculate subscript i n 0310 r13 r1 r2 killed use r8 calculate subscript i n 0090 for j 0 j lt n j 0100 0110 r3 0 j 0 0120 G0002 0140 cmp r3 r2 j lt n 0150 bge G0003 0160 0170 A i j 0 0 0200 r9 r8 r3 calculate subscript i n j 0210 r10 r9 8 calculate byte address 0230 fstore fr3 A r10 0240 0250 r3 r3 1 j 0260 br G0002 0270 0280 G0003 0290 A i i 1 0 0320 r14 r8 r1 calculate subscript i n i 0330 r15 r14 8 calculate byte address 0350 fstore fr4 A r15 0360 0370 i 0380 r1 r1 1 0390 br G0000 0400 0410 G0001 There are more optimizations to do Register r3 is the main variable in the innermost loop 0140 0260 it gets incremented by 1 each time through the loop Register r8 which is invariant in the innermost loop is added to r3 Instead of using r3 the compiler can eliminate r3 and use r9 The loop instead of being controlled by r3 0 to n 1 can be controlled by r9 r8 0 to r8 n 1 That adds four instructions and kills four instructions but there s one fewer instruction inside the loop 0110 r3 0 killed j 0 0115 r9 r8 new assignment 0117 r20 r8 r2 new limit 0120 G0002 0140 cmp r3 r2 killed j lt n 0145 cmp r9 r20 r8 j lt r8 n r20 0150 bge G0003 0160 0170 A i j 0 0 0200 r9 r8 r3 killed calculate subscript i n j 0210 r10 r9 8 calculate byte address 0230 fstore fr3 A r10 0240 0250 r3 r3 1 killed j 0255 r9 r9 1 new loop variable 0260 br G0002 Now r9 is the loop variable but it interacts with the multiply by 8 Here we get to do some strength reduction The multiply by 8 can be reduced to some successive additions of 8 Now there are no multiplications inside the loop 0115 r9 r8 new assignment 0117 r20 r8 r2 new limit 0118 r10 r8 8 initial value of r10 0120 G0002 0145 cmp r9 r20 r8 j lt r8 n r20 0150 bge G0003 0160 0170 A i j 0 0 0210 r10 r9 8 killed calculate byte address 0230 fstore fr3 A r10 0240 0245 r10 r10 8 strength reduced multiply 0255 r9 r9 1 loop variable 0260 br G0002 Registers r9 and r10 8 r9 aren t both needed r9 can be eliminated in the loop The loop is now 5 instructions 0115 r9 r8 killed 0117 r20 r8 r2 limit 0118 r10 r8 8 initial value of r10 0119 r22 r20 8 new limit 0120 G0002 0145 cmp r9 r20 killed r8 j lt r8 n r20 0147 cmp r10 r22 r10 8 r8 j lt 8 r8 n r22 0150 bge G0003 0160 0170 A i j 0 0 0230 fstore fr3 A r10 0240 0245 r10 r10 8 strength reduced multiply 0255 r9 r9 1 killed loop variable 0260 br G0002 Outer loop edit Back to the whole picture 0010 for i 0 i lt n i 0020 0030 r1 0 i 0 0050 load r2 n 0220 fr3 0 0 0340 fr4 1 0 0040 G0000 0060 cmp r1 r2 i lt n 0070 bge G0001 0080 0190 r8 r1 r2 calculate subscript i n 0117 r20 r8 r2 limit 0118 r10 r8 8 initial value of r10 0119 r22 r20 8 new limit 0090 for j 0 j lt n j 0100 0120 G0002 0147 cmp r10 r22 r10 8 r8 j lt 8 r8 n r22 0150 bge G0003 0160 0170 A i j 0 0 0230 fstore fr3 A r10 0240 0245 r10 r10 8 strength reduced multiply 0260 br G0002 0270 0280 G0003 0290 A i i 1 0 0320 r14 r8 r1 calculate subscript i n i 0330 r15 r14 8 calculate byte address 0350 fstore fr4 A r15 0360 0370 i 0380 r1 r1 1 0390 br G0000 0400 0410 G0001 There are now four multiplications within the outer loop that increments r1 Register r8 r1 r2 at 0190 can be strength reduced by setting it before entering the loop 0055 and incrementing it by r2 at the bottom of the loop 0385 The value r8 8 at 0118 can be strength reduced by initializing it 0056 and adding 8 r2 to it when r8 gets incremented 0386 Register r20 is being incremented by the invariant constant r2 each time through the loop at 0117 After being incremented it is multiplied by 8 to create r22 at 0119 That multiplication can be strength reduced by adding 8 r2 each time through the loop 0010 for i 0 i lt n i 0020 0030 r1 0 i 0 0050 load r2 n 0220 fr3 0 0 0340 fr4 1 0 0055 r8 r1 r2 set initial value for r8 0056 r40 r8 8 initial value for r8 8 0057 r30 r2 8 increment for r40 0058 r20 r8 r2 copied from 0117 0058 r22 r20 8 initial value of r22 0040 G0000 0060 cmp r1 r2 i lt n 0070 bge G0001 0080 0190 r8 r1 r2 killed calculate subscript i n 0117 r20 r8 r2 killed dead code 0118 r10 r40 strength reduced expression to r40 0119 r22 r20 8 killed strength reduced 0090 for j 0 j lt n j 0100 0120 G0002 0147 cmp r10 r22 r10 8 r8 j lt 8 r8 n r22 0150 bge G0003 0160 0170 A i j 0 0 0230 fstore fr3 A r10 0240 0245 r10 r10 8 strength reduced multiply 0260 br G0002 0270 0280 G0003 0290 A i i 1 0 0320 r14 r8 r1 calculate subscript i n i 0330 r15 r14 8 calculate byte address 0350 fstore fr4 A r15 0360 0370 i 0380 r1 r1 1 0385 r8 r8 r2 strength reduce r8 r1 r2 0386 r40 r40 r30 strength reduce expression r8 8 0388 r22 r22 r30 strength reduce r22 r20 8 0390 br G0000 0400 0410 G0001 The last multiply edit That leaves the two loops with only one multiplication operation at 0330 within the outer loop and no multiplications within the inner loop 0010 for i 0 i lt n i 0020 0030 r1 0 i 0 0050 load r2 n 0220 fr3 0 0 0340 fr4 1 0 0055 r8 r1 r2 set initial value for r8 0056 r40 r8 8 initial value for r8 8 0057 r30 r2 8 increment for r40 0058 r20 r8 r2 copied from 0117 0058 r22 r20 8 initial value of r22 0040 G0000 0060 cmp r1 r2 i lt n 0070 bge G0001 0080 0118 r10 r40 strength reduced expression to r40 0090 for j 0 j lt n j 0100 0120 G0002 0147 cmp r10 r22 r10 8 r8 j lt 8 r8 n r22 0150 bge G0003 0160 0170 A i j 0 0 0230 fstore fr3 A r10 0240 0245 r10 r10 8 strength reduced multiply 0260 br G0002 0270 0280 G0003 0290 A i i 1 0 0320 r14 r8 r1 calculate subscript i n i 0330 r15 r14 8 calculate byte address 0350 fstore fr4 A r15 0360 0370 i 0380 r1 r1 1 0385 r8 r8 r2 strength reduce r8 r1 r2 0386 r40 r40 r30 strength reduce expression r8 8 0388 r22 r22 r30 strength reduce r22 r20 8 0390 br G0000 0400 0410 G0001 At line 0320 r14 is the sum of r8 and r1 and r8 and r1 are being incremented in the loop Register r8 is being bumped by r2 n and r1 is being bumped by 1 Consequently r14 is being bumped by n 1 each time through the loop The last loop multiply at 0330 can be strength reduced by adding r2 1 8 each time through the loop 0010 for i 0 i lt n i 0020 0030 r1 0 i 0 0050 load r2 n 0220 fr3 0 0 0340 fr4 1 0 0055 r8 r1 r2 set initial value for r8 0056 r40 r8 8 initial value for r8 8 0057 r30 r2 8 increment for r40 0058 r20 r8 r2 copied from 0117 0058 r22 r20 8 initial value of r22 005 A r14 r8 r1 copied from 0320 005 B r15 r14 8 initial value of r15 0330 005 C r49 r2 1 005 D r50 r49 8 strength reduced increment 0040 G0000 0060 cmp r1 r2 i lt n 0070 bge G0001 0080 0118 r10 r40 strength reduced expression to r40 0090 for j 0 j lt n j 0100 0120 G0002 0147 cmp r10 r22 r10 8 r8 j lt 8 r8 n r22 0150 bge G0003 0160 0170 A i j 0 0 0230 fstore fr3 A r10 0240 0245 r10 r10 8 strength reduced multiply 0260 br G0002 0270 0280 G0003 0290 A i i 1 0 0320 r14 r8 r1 killed dead code 0330 r15 r14 8 killed strength reduced 0350 fstore fr4 A r15 0360 0370 i 0380 r1 r1 1 0385 r8 r8 r2 strength reduce r8 r1 r2 0386 r40 r40 r30 strength reduce expression r8 8 0388 r22 r22 r30 strength reduce r22 r20 8 0389 r15 r15 r50 strength reduce r15 r14 8 0390 br G0000 0400 0410 G0001 There s still more to go Constant folding will recognize that r1 0 in the preamble so several instructions will clean up Register r8 isn t used in the loop so it can disappear Furthermore r1 is only being used to control the loop so r1 can be replaced by a different induction variable such as r40 Where i went 0 lt i lt n register r40 goes 0 lt r40 lt 8 n n 0010 for i 0 i lt n i 0020 0030 r1 0 i 0 becomes dead code 0050 load r2 n 0220 fr3 0 0 0340 fr4 1 0 0055 r8 0 killed r8 no longer used 0056 r40 0 initial value for r8 8 0057 r30 r2 8 increment for r40 0058 r20 r2 killed r8 0 becomes dead code 0058 r22 r2 8 r20 r2 005 A r14 0 killed r8 0 becomes dead code 005 B r15 0 r14 0 005 C r49 r2 1 005 D r50 r49 8 strength reduced increment 005 D r60 r2 r30 new limit for r40 0040 G0000 0060 cmp r1 r2 killed i lt n induction variable replaced 0065 cmp r40 r60 i 8 n lt 8 n n 0070 bge G0001 0080 0118 r10 r40 strength reduced expression to r40 0090 for j 0 j lt n j 0100 0120 G0002 0147 cmp r10 r22 r10 8 r8 j lt 8 r8 n r22 0150 bge G0003 0160 0170 A i j 0 0 0230 fstore fr3 A r10 0240 0245 r10 r10 8 strength reduced multiply 0260 br G0002 0270 0280 G0003 0290 A i i 1 0 0350 fstore fr4 A r15 0360 0370 i 0380 r1 r1 1 killed dead code r40 controls loop 0385 r8 r8 r2 killed dead code 0386 r40 r40 r30 strength reduce expression r8 8 0388 r22 r22 r30 strength reduce r22 r20 8 0389 r15 r15 r50 strength reduce r15 r14 8 0390 br G0000 0400 0410 G0001 Other strength reduction operations editThis material is disputed It is better described as peephole optimization and instruction assignment Operator strength reduction uses mathematical identities to replace slow math operations with faster operations The benefits depend on the target CPU and sometimes on the surrounding code which can affect the availability of other functional units within the CPU replacing integer division or multiplication by a power of 2 with an arithmetic shift or logical shift 2 replacing integer multiplication by a constant with a combination of shifts adds or subtracts replacing integer division by a constant with a multiplication taking advantage of the limited range of machine integers 3 This method also works if divisor is a non integer sufficiently greater than 1 e g 2 or p 4 Original calculation Replacement calculationy 1 y y 2 0 y amp 1y x 2 y x lt lt 1y x 2 y x gt gt 1y x 2 y x amp 1y x 15 y x lt lt 4 xy uint16 t x 10 y uint32 t x uint32 t 0xCCCD gt gt 19 y uint16 t x p y uint32 t x uint32 t 0x45F3 gt gt 16 x gt gt 2 Induction variable orphan editInduction variable or recursive strength reduction replaces a function of some systematically changing variable with a simpler calculation using previous values of the function In a procedural programming language this would apply to an expression involving a loop variable and in a declarative language it would apply to the argument of a recursive function For example f x 3 x f x 1 becomes f x f x 1 where f x z z f x 1 3 z Here modified recursive function f takes a second parameter z 3 x allowing the expensive computation 3 x to be replaced by the cheaper 3 z See also editFast inverse square root Hacker s Delight Partial evaluationNotes edit Steven Muchnick Muchnick and Associates 15 August 1997 Advanced Compiler Design Implementation Morgan Kaufmann ISBN 978 1 55860 320 2 Strength reduction In languages such as C and Java integer division has round towards zero semantics whereas a bit shift always rounds down requiring special treatment for negative numbers For example in Java 3 2 evaluates to 1 whereas 3 gt gt 1 evaluates to 2 So in this case the compiler cannot optimize division by two by replacing it by a bit shift Granlund Torbjorn Peter L Montgomery Division by Invariant Integers Using Multiplication PDF Jones Nigel Division of integers by constants References editAho Alfred V Sethi Ravi Ullman Jeffrey D 1986 Compilers Principles Techniques and Tools 2nd ed ISBN 978 0 201 10088 4 Allen Francis E Cocke John Kennedy Ken 1981 Reduction of Operator Strength in Munchnik Steven S Jones Neil D eds Program Flow Analysis Theory and Applications Prentice Hall ISBN 978 0 13 729681 1 Cocke John Kennedy Ken November 1977 An algorithm for reduction of operator strength Communications of the ACM 20 11 850 856 doi 10 1145 359863 359888 S2CID 1092505 Cooper Keith Simpson Taylor Vick Christopher October 1995 Operator Strength Reduction PDF Rice University retrieved April 22 2010 Retrieved from https en wikipedia org w index php title Strength reduction amp oldid 1195830351, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.