fbpx
Wikipedia

Schönhage–Strassen algorithm

The Schönhage–Strassen algorithm is an asymptotically fast multiplication algorithm for large integers. It was developed by Arnold Schönhage and Volker Strassen in 1971.[1] The run-time bit complexity is, in big O notation, for two n-digit numbers. The algorithm uses recursive fast Fourier transforms in rings with 2n+1 elements, a specific type of number theoretic transform.

The Schönhage–Strassen algorithm is based on the fast Fourier transform (FFT) method of integer multiplication. This figure demonstrates multiplying 1234 × 5678 = 7006652 using the simple FFT method. Number-theoretic transforms in the integers modulo 337 are used, selecting 85 as an 8th root of unity. Base 10 is used in place of base 2w for illustrative purposes. Schönhage–Strassen improves on this by using negacyclic convolutions.
Schönhage (r) and Strassen (l) playing chess in Oberwolfach, 1979

The Schönhage–Strassen algorithm was the asymptotically fastest multiplication method known from 1971 until 2007, when a new method, Fürer's algorithm, was announced with lower asymptotic complexity,[2] and is used in the Basic Polynomial Algebra Subprograms (BPAS) open source library.[3][4] The algorithm does not adapt for polynomials over finite fields though. The current best multiplication algorithm in terms of asymptotic complexity is by David Harvey and Joris van der Hoeven, which gives complexity.[5] However, this improvement is too insignificant to be seen in integer sizes seen in practice (see Galactic algorithms) and implementations do not exist.

In practice the Schönhage–Strassen algorithm starts to outperform older methods such as Karatsuba and Toom–Cook multiplication for numbers beyond 2215 to 2217 (10,000 to 40,000 decimal digits).[6][7][8] The GNU Multi-Precision Library uses it[9] for values of at least 1728 to 7808 64-bit words (33,000 to 150,000 decimal digits), depending on architecture.[10][11] There is a Java implementation of Schönhage–Strassen which uses it above 74,000 decimal digits.[12]

Applications of the Schönhage–Strassen algorithm include mathematical empiricism, such as the Great Internet Mersenne Prime Search and computing approximations of π, as well as practical applications such as Kronecker substitution, in which multiplication of polynomials with integer coefficients can be efficiently reduced to large integer multiplication; this is used in practice by GMP-ECM for Lenstra elliptic curve factorization.[13]

Implementation details

This section is based primarily on an overview of the method by Crandall and Pomerance in their Prime Numbers: A Computational Perspective.[14] This variant differs somewhat from Schönhage's original method in that it exploits the discrete weighted transform to perform negacyclic convolutions more efficiently. Another source for detailed information is Knuth's The Art of Computer Programming.[15] The section begins by discussing the building blocks and concludes with a step-by-step description of the algorithm itself.

Convolutions

Suppose we are multiplying two numbers like 123 and 456 using long multiplication with base B digits, but without performing any carrying. The result might look something like this:

1 2 3
× 4 5 6

6 12 18
5 10 15
4 8 12

4 13 28 27 18

This sequence (4, 13, 28, 27, 18) is called the acyclic or linear convolution of the two original sequences (1,2,3) and (4,5,6). Once we have the acyclic convolution of two sequences, computing the product of the original numbers is easy: we just perform the carrying (for example, in the rightmost column, we keep the 8 and add the 1 to the column containing 27). In the example this yields the correct product 56088.

There are two other types of convolutions that will be useful. Suppose the input sequences have n elements (here 3). Then the acyclic convolution has n+n−1 elements; if we take the rightmost n elements and add the leftmost n−1 elements, this produces the cyclic convolution:

28 27 18
+ 4 13

28 31 31

If we perform carrying on the cyclic convolution, the result is equivalent to the product of the inputs mod Bn − 1. In the example, 103 − 1 = 999, performing carrying on (28, 31, 31) yields 3141, and 3141 ≡ 56088 (mod 999).

Conversely, if we take the rightmost n elements and subtract the leftmost n−1 elements, this produces the negacyclic convolution:

28 27 18
4 13

28 23 5

If we perform carrying on the negacyclic convolution, the result is equivalent to the product of the inputs mod Bn + 1. In the example, 103 + 1 = 1001, performing carrying on (28, 23, 5) yields 3035, and 3035 ≡ 56088 (mod 1001). The negacyclic convolution can contain negative numbers, which can be eliminated during carrying using borrowing, as is done in long subtraction.

Convolution theorem

Like other multiplication methods based on the fast Fourier transform, Schönhage–Strassen depends fundamentally on the convolution theorem, which provides an efficient way to compute the cyclic convolution of two sequences. It states that:

The cyclic convolution of two vectors can be found by taking the discrete Fourier transform (DFT) of each of them, multiplying the resulting vectors element by element, and then taking the inverse discrete Fourier transform (IDFT).

Or in symbols:

CyclicConvolution(X, Y) = IDFT(DFT(X) · DFT(Y))

If we compute the DFT and IDFT using a fast Fourier transform algorithm, and invoke our multiplication algorithm recursively to multiply the entries of the transformed vectors DFT(X) and DFT(Y), this yields an efficient algorithm for computing the cyclic convolution.

In this algorithm, it will be more useful to compute the negacyclic convolution; as it turns out, a slightly modified version of the convolution theorem (see discrete weighted transform) can enable this as well. Suppose the vectors X and Y have length n, and a is a primitive root of unity of order 2n (that is, a2n = 1 and a to all smaller powers is not 1). Then we can define a third vector A, called the weight vector, as:

A = (aj), 0 ≤ j < n
A−1 = (aj), 0 ≤ j < n

Now, we can state:

NegacyclicConvolution(X, Y) = A−1 · IDFT(DFT(A · X) · DFT(A · Y))

In other words, it's the same as before except that the inputs are first multiplied by A, and the result is multiplied by A−1.

Choice of ring

The discrete Fourier transform is an abstract operation that can be performed in any algebraic ring; typically it's performed in the complex numbers, but actually performing complex arithmetic to sufficient precision to ensure accurate results for multiplication is slow and error-prone. Instead, we will use the approach of the number theoretic transform, which is to perform the transform in the integers mod N for some integer N.

Just like there are primitive roots of unity of every order in the complex plane, given any order n we can choose a suitable N such that b is a primitive root of unity of order n in the integers mod N (in other words, bn ≡ 1 (mod N), and no smaller power of b is equivalent to 1 mod N).

The algorithm will spend most of its time performing recursive multiplications of smaller numbers; with a naive algorithm, these occur in a number of places:

  1. Inside the fast Fourier transform algorithm, where the primitive root of unity b is repeatedly powered, squared, and multiplied by other values.
  2. When taking powers of the primitive root of unity a to form the weight vector A and when multiplying A or A−1 by other vectors.
  3. When performing element-by-element multiplication of the transformed vectors.

The key insight to Schönhage–Strassen is to choose N, the modulus, to be equal to 2n + 1 for some integer n that is a multiple of the number of pieces P=2p being transformed. This has a number of benefits in standard systems that represent large integers in binary form:

  • Any value can be rapidly reduced modulo 2n + 1 using only shifts and adds, as explained in the next section.
  • All roots of unity used by the transform can be written in the form 2k; consequently we can multiply or divide any number by a root of unity using a shift, and power or square a root of unity by operating only on its exponent.
  • The element-by-element recursive multiplications of the transformed vectors can be performed using a negacyclic convolution, which is faster than an acyclic convolution and already has "for free" the effect of reducing its result mod 2n + 1.

To make the recursive multiplications convenient, we will frame Schönhage–Strassen as being a specialized multiplication algorithm for computing not just the product of two numbers, but the product of two numbers mod 2n + 1 for some given n. This is not a loss of generality, since one can always choose n large enough so that the product mod 2n + 1 is simply the product.

Shift optimizations

In the course of the algorithm, there are many cases in which multiplication or division by a power of two (including all roots of unity) can be profitably replaced by a small number of shifts and adds. This makes use of the observation that:

(2n)k ≡ (−1)k mod (2n + 1)

Note that a k-digit number in base 2n written in positional notation can be expressed as (dk−1,...,d1,d0). It represents the number  . Also note that for each di we have 0 ≤ di < 2n.

This makes it simple to reduce a number represented in binary mod 2n + 1: take the rightmost (least significant) n bits, subtract the next n bits, add the next n bits, and so on until the bits are exhausted. If the resulting value is still not between 0 and 2n, normalize it by adding or subtracting a multiple of the modulus 2n + 1. For example, if n=3 (and so the modulus is 23+1 = 9) and the number being reduced is 656, we have:

656 = 10100100002 ≡ 0002 − 0102 + 0102 − 12 = 0 − 2 + 2 − 1 = −1 ≡ 8 (mod 23 + 1).

Moreover, it's possible to effect very large shifts without ever constructing the shifted result. Suppose we have a number A between 0 and 2n, and wish to multiply it by 2k. Dividing k by n we find k = qn + r with r < n. It follows that:

A(2k) = A(2qn + r) = A[(2n)q(2r)] ≡ (−1)q Shift Left(A, r) (mod 2n + 1).

Since A is ≤ 2n and r < n, A shift-left r has at most 2n−1 bits, and so only one shift and subtraction (followed by normalization) is needed.

Finally, to divide by 2k, observe that squaring the first equivalence above yields:

22n ≡ 1 (mod 2n + 1)

Hence,

A/2k = A(2k) ≡ A(22nk) = Shift_Left(A, 2nk) (mod 2n + 1).

Algorithm

The algorithm follows a split, evaluate (forward FFT), pointwise multiply, interpolate (inverse FFT), and combine phases similar to Karatsuba and Toom-Cook methods.

Given input numbers x and y, and an integer N, the following algorithm computes the product xy mod 2N + 1. Provided N is sufficiently large this is simply the product.

  1. Split each input number into vectors X and Y of 2k parts each, where 2k divides N. (e.g. 12345678 → (12, 34, 56, 78) for 2k=4).
  2. In order to make progress, it's necessary to use a smaller N for recursive multiplications (denoted n below). For this purpose choose n as the smallest integer at least 2N/2k + k and divisible by 2k.
  3. Compute the product of X and Y mod 2N + 1 using the negacyclic convolution:
    1. Multiply X and Y each by the weight vector A using shifts (shift the jth entry left by jn/2k).
    2. Compute the DFT of X and Y using the number-theoretic FFT (perform all multiplications using shifts; for the 2k-th root of unity, use 22n/2k).
    3. Recursively apply this algorithm to multiply corresponding elements of the transformed X and Y.
    4. Compute the IDFT of the resulting vector to get the result vector C (perform all multiplications using shifts). This corresponds to interpolation phase.
    5. Multiply the result vector C by A−1 using shifts.
    6. Adjust signs: some elements of the result may be negative. We compute the largest possible positive value for the jth element of C, (j + 1)22N/2k, and if it exceeds this we subtract the modulus 2n + 1.
  4. Finally, perform carrying mod 2N + 1 to get the final result.

The optimal number of pieces to divide the input into is proportional to  , where N is the number of input bits, and this setting achieves the running time of O(N log N log log N),[1][14] so the parameter k should be set accordingly. In practice, it is set empirically based on the input sizes and the architecture, typically to a value between 4 and 16.[13]

In step 2, the observation is used that:

  • Each element of the input vectors has at most N/2k bits;
  • The product of any two input vector elements has at most 2N/2k bits;
  • Each element of the convolution is the sum of at most 2k such products, and so cannot exceed 2N/2k + k bits.
  • n must be divisible by 2k to ensure that in step 3.1 the 2·2k-th root of unity exists for the weight vector A.

Optimizations

This section explains a number of important practical optimizations that have been considered when implementing Schönhage–Strassen in real systems. It is based primarily on a 2007 work by Gaudry, Kruppa, and Zimmermann describing enhancements to the GNU Multi-Precision Library.[13]

Below a certain cutoff point, it's more efficient to perform the recursive multiplications using other algorithms, such as Toom–Cook multiplication. The results must be reduced mod 2n + 1, which can be done efficiently as explained above in Shift optimizations with shifts and adds/subtracts.

Computing the IDFT involves dividing each entry by the primitive root of unity 22n/2k, an operation that is frequently combined with multiplying the vector by A−1 afterwards, since both involve division by a power of two.

In a system where a large number is represented as an array of 2w-bit words, it's useful to ensure that the vector size 2k is also a multiple of the bits per word by choosing kw (e.g. choose k ≥ 5 on a 32-bit computer and k ≥ 6 on a 64-bit computer); this allows the inputs to be broken up into pieces without bit shifts, and provides a uniform representation for values mod 2n + 1 where the high word can only be zero or one.

Normalization involves adding or subtracting the modulus 2n + 1; this value has only two bits set, which means this can be done in constant time on average with a specialized operation.

Iterative FFT algorithms such as the Cooley–Tukey FFT algorithm, although frequently used for FFTs on vectors of complex numbers, tend to exhibit very poor cache locality with the large vector entries used in Schönhage–Strassen. The straightforward recursive, not in-place implementation of FFT is more successful, with all operations fitting in the cache beyond a certain point in the call depth, but still makes suboptimal use of the cache in higher call depths. Gaudry, Kruppa, and Zimmerman used a technique combining Bailey's 4-step algorithm with higher radix transforms that combine multiple recursive steps. They also mix phases, going as far into the algorithm as possible on each element of the vector before moving on to the next one.

The "square root of 2 trick", first described by Schönhage, is to note that, provided k ≥ 2, 23n/4−2n/4 is a square root of 2 mod 2n+1, and so a 4n-th root of unity (since 22n ≡ 1). This allows the transform length to be extended from 2k to 2k + 1.

Finally, the authors are careful to choose the right value of k for different ranges of input numbers, noting that the optimal value of k may go back and forth between the same values several times as the input size increases.

References

  1. ^ a b A. Schönhage and V. Strassen, "Schnelle Multiplikation großer Zahlen", Computing 7 (1971), pp. 281–292.
  2. ^ Martin Fürer, "Faster integer multiplication 2013-04-25 at the Wayback Machine", STOC 2007 Proceedings, pp. 57–66.
  3. ^ Covanov, Sviatoslav; Mohajerani, Davood; Moreno-Maza, Marc; Wang, Lin-Xiao (2018-11-04). "Putting Fürer Algorithm into Practice with the BPAS Library". arXiv:1811.01490 [cs.SC].
  4. ^ . GitHub. 2021-09-04. Archived from the original on 2021-09-04. Retrieved 2022-08-30.
  5. ^ Harvey, David; van der Hoeven, Joris (2021). "Integer multiplication in time  ". Annals of Mathematics. Second Series. 193 (2): 563–617. doi:10.4007/annals.2021.193.2.4. MR 4224716. S2CID 109934776.
  6. ^ Van Meter, Rodney; Itoh, Kohei M. (2005). "Fast Quantum Modular Exponentiation". Physical Review. A. 71 (5): 052320. arXiv:quant-ph/0408006. Bibcode:2005PhRvA..71e2320V. doi:10.1103/PhysRevA.71.052320. S2CID 14983569.
  7. ^ Overview of Magma V2.9 Features, arithmetic section 2006-08-20 at the Wayback Machine: Discusses practical crossover points between various algorithms.
  8. ^ Luis Carlos Coronado García, "Can Schönhage multiplication speed up the RSA encryption or decryption? ", University of Technology, Darmstadt (2005)
  9. ^ "FFT Multiplication (GNU MP 6.2.1)". gmplib.org. Retrieved 2021-07-20.
  10. ^ . GMP developers' corner. Archived from the original on 24 November 2010. Retrieved 3 November 2011.
  11. ^ "MUL_FFT_THRESHOLD". gmplib.org. Retrieved 2021-07-20.
  12. ^ "An improved BigInteger class which uses efficient algorithms, including Schönhage–Strassen". Oracle. Retrieved 2014-01-10.
  13. ^ a b c Pierrick Gaudry, Alexander Kruppa, and Paul Zimmermann. A GMP-based Implementation of Schönhage–Strassen’s Large Integer Multiplication Algorithm. Proceedings of the 2007 International Symposium on Symbolic and Algebraic Computation, pp.167–174.
  14. ^ a b R. Crandall & C. Pomerance. Prime Numbers – A Computational Perspective. Second Edition, Springer, 2005. Section 9.5.6: Schönhage method, p. 502. ISBN 0-387-94777-9
  15. ^ Donald E. Knuth, The Art of Computer Programming, Volume 2: Seminumerical Algorithms (3rd Edition), 1997. Addison-Wesley Professional, ISBN 0-201-89684-2. Section 4.3.3.C: Discrete Fourier transforms, pg.305.

schönhage, strassen, algorithm, asymptotically, fast, multiplication, algorithm, large, integers, developed, arnold, schönhage, volker, strassen, 1971, time, complexity, notation, displaystyle, cdot, cdot, digit, numbers, algorithm, uses, recursive, fast, four. The Schonhage Strassen algorithm is an asymptotically fast multiplication algorithm for large integers It was developed by Arnold Schonhage and Volker Strassen in 1971 1 The run time bit complexity is in big O notation O n log n log log n displaystyle O n cdot log n cdot log log n for two n digit numbers The algorithm uses recursive fast Fourier transforms in rings with 2n 1 elements a specific type of number theoretic transform The Schonhage Strassen algorithm is based on the fast Fourier transform FFT method of integer multiplication This figure demonstrates multiplying 1234 5678 7006652 using the simple FFT method Number theoretic transforms in the integers modulo 337 are used selecting 85 as an 8th root of unity Base 10 is used in place of base 2w for illustrative purposes Schonhage Strassen improves on this by using negacyclic convolutions Schonhage r and Strassen l playing chess in Oberwolfach 1979 The Schonhage Strassen algorithm was the asymptotically fastest multiplication method known from 1971 until 2007 when a new method Furer s algorithm was announced with lower asymptotic complexity 2 and is used in the Basic Polynomial Algebra Subprograms BPAS open source library 3 4 The algorithm does not adapt for polynomials over finite fields though The current best multiplication algorithm in terms of asymptotic complexity is by David Harvey and Joris van der Hoeven which gives O n log n displaystyle O n log n complexity 5 However this improvement is too insignificant to be seen in integer sizes seen in practice see Galactic algorithms and implementations do not exist In practice the Schonhage Strassen algorithm starts to outperform older methods such as Karatsuba and Toom Cook multiplication for numbers beyond 2215 to 2217 10 000 to 40 000 decimal digits 6 7 8 The GNU Multi Precision Library uses it 9 for values of at least 1728 to 7808 64 bit words 33 000 to 150 000 decimal digits depending on architecture 10 11 There is a Java implementation of Schonhage Strassen which uses it above 74 000 decimal digits 12 Applications of the Schonhage Strassen algorithm include mathematical empiricism such as the Great Internet Mersenne Prime Search and computing approximations of p as well as practical applications such as Kronecker substitution in which multiplication of polynomials with integer coefficients can be efficiently reduced to large integer multiplication this is used in practice by GMP ECM for Lenstra elliptic curve factorization 13 Contents 1 Implementation details 1 1 Convolutions 1 2 Convolution theorem 1 3 Choice of ring 1 4 Shift optimizations 1 5 Algorithm 2 Optimizations 3 ReferencesImplementation details EditThis section is based primarily on an overview of the method by Crandall and Pomerance in their Prime Numbers A Computational Perspective 14 This variant differs somewhat from Schonhage s original method in that it exploits the discrete weighted transform to perform negacyclic convolutions more efficiently Another source for detailed information is Knuth s The Art of Computer Programming 15 The section begins by discussing the building blocks and concludes with a step by step description of the algorithm itself Convolutions Edit Suppose we are multiplying two numbers like 123 and 456 using long multiplication with base B digits but without performing any carrying The result might look something like this 1 2 3 4 5 66 12 185 10 154 8 124 13 28 27 18This sequence 4 13 28 27 18 is called the acyclic or linear convolution of the two original sequences 1 2 3 and 4 5 6 Once we have the acyclic convolution of two sequences computing the product of the original numbers is easy we just perform the carrying for example in the rightmost column we keep the 8 and add the 1 to the column containing 27 In the example this yields the correct product 56088 There are two other types of convolutions that will be useful Suppose the input sequences have n elements here 3 Then the acyclic convolution has n n 1 elements if we take the rightmost n elements and add the leftmost n 1 elements this produces the cyclic convolution 28 27 18 4 1328 31 31If we perform carrying on the cyclic convolution the result is equivalent to the product of the inputs mod Bn 1 In the example 103 1 999 performing carrying on 28 31 31 yields 3141 and 3141 56088 mod 999 Conversely if we take the rightmost n elements and subtract the leftmost n 1 elements this produces the negacyclic convolution 28 27 18 4 1328 23 5If we perform carrying on the negacyclic convolution the result is equivalent to the product of the inputs mod Bn 1 In the example 103 1 1001 performing carrying on 28 23 5 yields 3035 and 3035 56088 mod 1001 The negacyclic convolution can contain negative numbers which can be eliminated during carrying using borrowing as is done in long subtraction Convolution theorem Edit Like other multiplication methods based on the fast Fourier transform Schonhage Strassen depends fundamentally on the convolution theorem which provides an efficient way to compute the cyclic convolution of two sequences It states that The cyclic convolution of two vectors can be found by taking the discrete Fourier transform DFT of each of them multiplying the resulting vectors element by element and then taking the inverse discrete Fourier transform IDFT Or in symbols CyclicConvolution X Y IDFT DFT X DFT Y If we compute the DFT and IDFT using a fast Fourier transform algorithm and invoke our multiplication algorithm recursively to multiply the entries of the transformed vectors DFT X and DFT Y this yields an efficient algorithm for computing the cyclic convolution In this algorithm it will be more useful to compute the negacyclic convolution as it turns out a slightly modified version of the convolution theorem see discrete weighted transform can enable this as well Suppose the vectors X and Y have length n and a is a primitive root of unity of order 2n that is a2n 1 and a to all smaller powers is not 1 Then we can define a third vector A called the weight vector as A aj 0 j lt n A 1 a j 0 j lt nNow we can state NegacyclicConvolution X Y A 1 IDFT DFT A X DFT A Y In other words it s the same as before except that the inputs are first multiplied by A and the result is multiplied by A 1 Choice of ring Edit The discrete Fourier transform is an abstract operation that can be performed in any algebraic ring typically it s performed in the complex numbers but actually performing complex arithmetic to sufficient precision to ensure accurate results for multiplication is slow and error prone Instead we will use the approach of the number theoretic transform which is to perform the transform in the integers mod N for some integer N Just like there are primitive roots of unity of every order in the complex plane given any order n we can choose a suitable N such that b is a primitive root of unity of order n in the integers mod N in other words bn 1 mod N and no smaller power of b is equivalent to 1 mod N The algorithm will spend most of its time performing recursive multiplications of smaller numbers with a naive algorithm these occur in a number of places Inside the fast Fourier transform algorithm where the primitive root of unity b is repeatedly powered squared and multiplied by other values When taking powers of the primitive root of unity a to form the weight vector A and when multiplying A or A 1 by other vectors When performing element by element multiplication of the transformed vectors The key insight to Schonhage Strassen is to choose N the modulus to be equal to 2n 1 for some integer n that is a multiple of the number of pieces P 2p being transformed This has a number of benefits in standard systems that represent large integers in binary form Any value can be rapidly reduced modulo 2n 1 using only shifts and adds as explained in the next section All roots of unity used by the transform can be written in the form 2k consequently we can multiply or divide any number by a root of unity using a shift and power or square a root of unity by operating only on its exponent The element by element recursive multiplications of the transformed vectors can be performed using a negacyclic convolution which is faster than an acyclic convolution and already has for free the effect of reducing its result mod 2n 1 To make the recursive multiplications convenient we will frame Schonhage Strassen as being a specialized multiplication algorithm for computing not just the product of two numbers but the product of two numbers mod 2n 1 for some given n This is not a loss of generality since one can always choose n large enough so that the product mod 2n 1 is simply the product Shift optimizations Edit In the course of the algorithm there are many cases in which multiplication or division by a power of two including all roots of unity can be profitably replaced by a small number of shifts and adds This makes use of the observation that 2n k 1 k mod 2n 1 Note that a k digit number in base 2n written in positional notation can be expressed as dk 1 d1 d0 It represents the number i 0 k 1 d i 2 n i displaystyle scriptstyle sum i 0 k 1 d i cdot 2 n i Also note that for each di we have 0 di lt 2n This makes it simple to reduce a number represented in binary mod 2n 1 take the rightmost least significant n bits subtract the next n bits add the next n bits and so on until the bits are exhausted If the resulting value is still not between 0 and 2n normalize it by adding or subtracting a multiple of the modulus 2n 1 For example if n 3 and so the modulus is 23 1 9 and the number being reduced is 656 we have 656 10100100002 0002 0102 0102 12 0 2 2 1 1 8 mod 23 1 Moreover it s possible to effect very large shifts without ever constructing the shifted result Suppose we have a number A between 0 and 2n and wish to multiply it by 2k Dividing k by n we find k qn r with r lt n It follows that A 2k A 2qn r A 2n q 2r 1 q Shift Left A r mod 2n 1 Since A is 2n and r lt n A shift left r has at most 2n 1 bits and so only one shift and subtraction followed by normalization is needed Finally to divide by 2k observe that squaring the first equivalence above yields 22n 1 mod 2n 1 Hence A 2k A 2 k A 22n k Shift Left A 2n k mod 2n 1 Algorithm Edit The algorithm follows a split evaluate forward FFT pointwise multiply interpolate inverse FFT and combine phases similar to Karatsuba and Toom Cook methods Given input numbers x and y and an integer N the following algorithm computes the product xy mod 2N 1 Provided N is sufficiently large this is simply the product Split each input number into vectors X and Y of 2k parts each where 2k divides N e g 12345678 12 34 56 78 for 2k 4 In order to make progress it s necessary to use a smaller N for recursive multiplications denoted n below For this purpose choose n as the smallest integer at least 2N 2k k and divisible by 2k Compute the product of X and Y mod 2N 1 using the negacyclic convolution Multiply X and Y each by the weight vector A using shifts shift the jth entry left by jn 2k Compute the DFT of X and Y using the number theoretic FFT perform all multiplications using shifts for the 2k th root of unity use 22n 2k Recursively apply this algorithm to multiply corresponding elements of the transformed X and Y Compute the IDFT of the resulting vector to get the result vector C perform all multiplications using shifts This corresponds to interpolation phase Multiply the result vector C by A 1 using shifts Adjust signs some elements of the result may be negative We compute the largest possible positive value for the jth element of C j 1 22N 2k and if it exceeds this we subtract the modulus 2n 1 Finally perform carrying mod 2N 1 to get the final result The optimal number of pieces to divide the input into is proportional to N displaystyle sqrt N where N is the number of input bits and this setting achieves the running time of O N log N log log N 1 14 so the parameter k should be set accordingly In practice it is set empirically based on the input sizes and the architecture typically to a value between 4 and 16 13 In step 2 the observation is used that Each element of the input vectors has at most N 2k bits The product of any two input vector elements has at most 2N 2k bits Each element of the convolution is the sum of at most 2k such products and so cannot exceed 2N 2k k bits n must be divisible by 2k to ensure that in step 3 1 the 2 2k th root of unity exists for the weight vector A Optimizations EditThis section explains a number of important practical optimizations that have been considered when implementing Schonhage Strassen in real systems It is based primarily on a 2007 work by Gaudry Kruppa and Zimmermann describing enhancements to the GNU Multi Precision Library 13 Below a certain cutoff point it s more efficient to perform the recursive multiplications using other algorithms such as Toom Cook multiplication The results must be reduced mod 2n 1 which can be done efficiently as explained above in Shift optimizations with shifts and adds subtracts Computing the IDFT involves dividing each entry by the primitive root of unity 22n 2k an operation that is frequently combined with multiplying the vector by A 1 afterwards since both involve division by a power of two In a system where a large number is represented as an array of 2w bit words it s useful to ensure that the vector size 2k is also a multiple of the bits per word by choosing k w e g choose k 5 on a 32 bit computer and k 6 on a 64 bit computer this allows the inputs to be broken up into pieces without bit shifts and provides a uniform representation for values mod 2n 1 where the high word can only be zero or one Normalization involves adding or subtracting the modulus 2n 1 this value has only two bits set which means this can be done in constant time on average with a specialized operation Iterative FFT algorithms such as the Cooley Tukey FFT algorithm although frequently used for FFTs on vectors of complex numbers tend to exhibit very poor cache locality with the large vector entries used in Schonhage Strassen The straightforward recursive not in place implementation of FFT is more successful with all operations fitting in the cache beyond a certain point in the call depth but still makes suboptimal use of the cache in higher call depths Gaudry Kruppa and Zimmerman used a technique combining Bailey s 4 step algorithm with higher radix transforms that combine multiple recursive steps They also mix phases going as far into the algorithm as possible on each element of the vector before moving on to the next one The square root of 2 trick first described by Schonhage is to note that provided k 2 23n 4 2n 4 is a square root of 2 mod 2n 1 and so a 4n th root of unity since 22n 1 This allows the transform length to be extended from 2k to 2k 1 Finally the authors are careful to choose the right value of k for different ranges of input numbers noting that the optimal value of k may go back and forth between the same values several times as the input size increases References Edit a b A Schonhage and V Strassen Schnelle Multiplikation grosser Zahlen Computing 7 1971 pp 281 292 Martin Furer Faster integer multiplication Archived 2013 04 25 at the Wayback Machine STOC 2007 Proceedings pp 57 66 Covanov Sviatoslav Mohajerani Davood Moreno Maza Marc Wang Lin Xiao 2018 11 04 Putting Furer Algorithm into Practice with the BPAS Library arXiv 1811 01490 cs SC FFT over large vectors gt 2 16 FFT test exe test1 error in index Issue 1 orcca uwo BPAS GitHub GitHub 2021 09 04 Archived from the original on 2021 09 04 Retrieved 2022 08 30 Harvey David van der Hoeven Joris 2021 Integer multiplication in time O n log n displaystyle O n log n Annals of Mathematics Second Series 193 2 563 617 doi 10 4007 annals 2021 193 2 4 MR 4224716 S2CID 109934776 Van Meter Rodney Itoh Kohei M 2005 Fast Quantum Modular Exponentiation Physical Review A 71 5 052320 arXiv quant ph 0408006 Bibcode 2005PhRvA 71e2320V doi 10 1103 PhysRevA 71 052320 S2CID 14983569 Overview of Magma V2 9 Features arithmetic section Archived 2006 08 20 at the Wayback Machine Discusses practical crossover points between various algorithms Luis Carlos Coronado Garcia Can Schonhage multiplication speed up the RSA encryption or decryption Archived University of Technology Darmstadt 2005 FFT Multiplication GNU MP 6 2 1 gmplib org Retrieved 2021 07 20 MUL FFT THRESHOLD GMP developers corner Archived from the original on 24 November 2010 Retrieved 3 November 2011 MUL FFT THRESHOLD gmplib org Retrieved 2021 07 20 An improved BigInteger class which uses efficient algorithms including Schonhage Strassen Oracle Retrieved 2014 01 10 a b c Pierrick Gaudry Alexander Kruppa and Paul Zimmermann A GMP based Implementation of Schonhage Strassen s Large Integer Multiplication AlgorithmArchived Proceedings of the 2007 International Symposium on Symbolic and Algebraic Computation pp 167 174 a b R Crandall amp C Pomerance Prime Numbers A Computational Perspective Second Edition Springer 2005 Section 9 5 6 Schonhage method p 502 ISBN 0 387 94777 9 Donald E Knuth The Art of Computer Programming Volume 2 Seminumerical Algorithms 3rd Edition 1997 Addison Wesley Professional ISBN 0 201 89684 2 Section 4 3 3 C Discrete Fourier transforms pg 305 Retrieved from https en wikipedia org w index php title Schonhage Strassen algorithm amp oldid 1117449518, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.