fbpx
Wikipedia

Iterative Stencil Loops

Iterative Stencil Loops (ISLs) or Stencil computations are a class of numerical data processing solution[1] which update array elements according to some fixed pattern, called a stencil.[2] They are most commonly found in computer simulations, e.g. for computational fluid dynamics in the context of scientific and engineering applications. Other notable examples include solving partial differential equations,[1] the Jacobi kernel, the Gauss–Seidel method,[2] image processing[1] and cellular automata.[3] The regular structure of the arrays sets stencil techniques apart from other modeling methods such as the Finite element method. Most finite difference codes which operate on regular grids can be formulated as ISLs.

The shape of a 7-point 3D von Neumann style stencil.

Definition edit

ISLs perform a sequence of sweeps (called timesteps) through a given array.[2] Generally this is a 2- or 3-dimensional regular grid.[3] The elements of the arrays are often referred to as cells. In each timestep, all array elements are updated.[2] Using neighboring array elements in a fixed pattern (the stencil), each cell's new value is computed. In most cases boundary values are left unchanged, but in some cases (e.g. LBM codes) those need to be adjusted during the computation as well. Since the stencil is the same for each element, the pattern of data accesses is repeated.[4]

More formally, we may define ISLs as a 5-tuple   with the following meaning:[3]

  •   is the index set. It defines the topology of the array.
  •   is the (not necessarily finite) set of states, one of which each cell may take on on any given timestep.
  •   defines the initial state of the system at time 0.
  •   is the stencil itself and describes the actual shape of the neighborhood. There are   elements in the stencil.
  •   is the transition function which is used to determine a cell's new state, depending on its neighbors.

Since I is a k-dimensional integer interval, the array will always have the topology of a finite regular grid. The array is also called simulation space and individual cells are identified by their index  . The stencil is an ordered set of   relative coordinates. We can now obtain for each cell   the tuple of its neighbors indices  

 

Their states are given by mapping the tuple   to the corresponding tuple of states  , where   is defined as follows:

 

This is all we need to define the system's state for the following time steps   with  :

 

Note that   is defined on   and not just on   since the boundary conditions need to be set, too. Sometimes the elements of   may be defined by a vector addition modulo the simulation space's dimension to realize toroidal topologies:

 

This may be useful for implementing periodic boundary conditions, which simplifies certain physical models.

Example: 2D Jacobi iteration edit

 
Data dependencies of a selected cell in the 2D array.

To illustrate the formal definition, we'll have a look at how a two dimensional Jacobi iteration can be defined. The update function computes the arithmetic mean of a cell's four neighbors. In this case we set off with an initial solution of 0. The left and right boundary are fixed at 1, while the upper and lower boundaries are set to 0. After a sufficient number of iterations, the system converges against a saddle-shape.

 
 
 
 
 
 
 
 
 
 
 
 
 
2D Jacobi Iteration on a   Array

Stencils edit

The shape of the neighborhood used during the updates depends on the application itself. The most common stencils are the 2D or 3D versions of the von Neumann neighborhood and Moore neighborhood. The example above uses a 2D von Neumann stencil while LBM codes generally use its 3D variant. Conway's Game of Life uses the 2D Moore neighborhood. That said, other stencils such as a 25-point stencil for seismic wave propagation[5] can be found, too.

 
9-point 2D stencil
 
5-point 2D stencil
 
7-point 3D stencil
 
25-point 3D stencil
A selection of stencils used in various scientific applications.

Implementation issues edit

Many simulation codes may be formulated naturally as ISLs. Since computing time and memory consumption grow linearly with the number of array elements, parallel implementations of ISLs are of paramount importance to research.[6] This is challenging since the computations are tightly coupled (because of the cell updates depending on neighboring cells) and most ISLs are memory bound (i.e. the ratio of memory accesses to calculations is high).[7] Virtually all current parallel architectures have been explored for executing ISLs efficiently;[8] at the moment GPGPUs have proven to be most efficient.[9]

Libraries edit

Due to both the importance of ISLs to computer simulations and their high computational requirements, there are a number of efforts which aim at creating reusable libraries to support scientists in performing stencil-based computations. The libraries are mostly concerned with the parallelization, but may also tackle other challenges, such as IO, steering and checkpointing. They may be classified by their API.

Patch-based libraries edit

This is a traditional design. The library manages a set of n-dimensional scalar arrays, which the user program may access to perform updates. The library handles the synchronization of the boundaries (dubbed ghost zone or halo). The advantage of this interface is that the user program may loop over the arrays, which makes it easy to integrate legacy code [10] . The disadvantage is that the library can not handle cache blocking (as this has to be done within the loops[11]) or wrapping of the API-calls for accelerators (e.g. via CUDA or OpenCL). Implementations include Cactus, a physics problem solving environment, and waLBerla.

Cell-based libraries edit

These libraries move the interface to updating single simulation cells: only the current cell and its neighbors are exposed, e.g. via getter/setter methods. The advantage of this approach is that the library can control tightly which cells are updated in which order, which is useful not only to implement cache blocking,[9] but also to run the same code on multi-cores and GPUs.[12] This approach requires the user to recompile the source code together with the library. Otherwise a function call for every cell update would be required, which would seriously impair performance. This is only feasible with techniques such as class templates or metaprogramming, which is also the reason why this design is only found in newer libraries. Examples are Physis and LibGeoDecomp.

See also edit

References edit

  1. ^ a b c Roth, Gerald et al. (1997) Proceedings of SC'97: High Performance Networking and Computing. Compiling Stencils in High Performance Fortran.
  2. ^ a b c d Sloot, Peter M.A. et al. (May 28, 2002) Computational Science – ICCS 2002: International Conference, Amsterdam, the Netherlands, April 21–24, 2002. Proceedings, Part I. Page 843. Publisher: Springer. ISBN 3-540-43591-3.
  3. ^ a b c Fey, Dietmar et al. (2010) Grid-Computing: Eine Basistechnologie für Computational Science. Page 439. Publisher: Springer. ISBN 3-540-79746-7
  4. ^ Yang, Laurence T.; Guo, Minyi. (August 12, 2005) High-Performance Computing : Paradigm and Infrastructure. Page 221. Publisher: Wiley-Interscience. ISBN 0-471-65471-X
  5. ^ Micikevicius, Paulius et al. (2009) 3D finite difference computation on GPUs using CUDA Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units ISBN 978-1-60558-517-8
  6. ^ Datta, Kaushik (2009) Auto-tuning Stencil Codes for Cache-Based Multicore Platforms 2012-10-08 at the Wayback Machine, Ph.D. Thesis
  7. ^ Wellein, G et al. (2009) Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization, 33rd Annual IEEE International Computer Software and Applications Conference, COMPSAC 2009
  8. ^ Datta, Kaushik et al. (2008) Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures, SC '08 Proceedings of the 2008 ACM/IEEE conference on Supercomputing
  9. ^ a b Schäfer, Andreas and Fey, Dietmar (2011) High Performance Stencil Code Algorithms for GPGPUs, Proceedings of the International Conference on Computational Science, ICCS 2011
  10. ^ S. Donath, J. Götz, C. Feichtinger, K. Iglberger and U. Rüde (2010) waLBerla: Optimization for Itanium-based Systems with Thousands of Processors, High Performance Computing in Science and Engineering, Garching/Munich 2009
  11. ^ Nguyen, Anthony et al. (2010) 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs, SC '10 Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  12. ^ Naoya Maruyama, Tatsuo Nomura, Kento Sato, and Satoshi Matsuoka (2011) Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-Scale GPU-Accelerated Supercomputers, SC '11 Proceedings of the 2011 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis

External links edit

  • Physis
  • LibGeoDecomp
  • waLBerla

iterative, stencil, loops, isls, stencil, computations, class, numerical, data, processing, solution, which, update, array, elements, according, some, fixed, pattern, called, stencil, they, most, commonly, found, computer, simulations, computational, fluid, dy. Iterative Stencil Loops ISLs or Stencil computations are a class of numerical data processing solution 1 which update array elements according to some fixed pattern called a stencil 2 They are most commonly found in computer simulations e g for computational fluid dynamics in the context of scientific and engineering applications Other notable examples include solving partial differential equations 1 the Jacobi kernel the Gauss Seidel method 2 image processing 1 and cellular automata 3 The regular structure of the arrays sets stencil techniques apart from other modeling methods such as the Finite element method Most finite difference codes which operate on regular grids can be formulated as ISLs The shape of a 7 point 3D von Neumann style stencil Contents 1 Definition 1 1 Example 2D Jacobi iteration 2 Stencils 3 Implementation issues 4 Libraries 4 1 Patch based libraries 4 2 Cell based libraries 5 See also 6 References 7 External linksDefinition editISLs perform a sequence of sweeps called timesteps through a given array 2 Generally this is a 2 or 3 dimensional regular grid 3 The elements of the arrays are often referred to as cells In each timestep all array elements are updated 2 Using neighboring array elements in a fixed pattern the stencil each cell s new value is computed In most cases boundary values are left unchanged but in some cases e g LBM codes those need to be adjusted during the computation as well Since the stencil is the same for each element the pattern of data accesses is repeated 4 More formally we may define ISLs as a 5 tuple I S S0 s T displaystyle I S S 0 s T nbsp with the following meaning 3 I i 1k 0 ni displaystyle I prod i 1 k 0 ldots n i nbsp is the index set It defines the topology of the array S displaystyle S nbsp is the not necessarily finite set of states one of which each cell may take on on any given timestep S0 Zk S displaystyle S 0 colon mathbb Z k to S nbsp defines the initial state of the system at time 0 s i 1lZk displaystyle s in prod i 1 l mathbb Z k nbsp is the stencil itself and describes the actual shape of the neighborhood There are l displaystyle l nbsp elements in the stencil T Sl S displaystyle T colon S l to S nbsp is the transition function which is used to determine a cell s new state depending on its neighbors Since I is a k dimensional integer interval the array will always have the topology of a finite regular grid The array is also called simulation space and individual cells are identified by their index c I displaystyle c in I nbsp The stencil is an ordered set of l displaystyle l nbsp relative coordinates We can now obtain for each cell c displaystyle c nbsp the tuple of its neighbors indices Ic displaystyle I c nbsp Ic j x s j c x displaystyle I c j mid exists x in s j c x nbsp Their states are given by mapping the tuple Ic displaystyle I c nbsp to the corresponding tuple of states Ni c displaystyle N i c nbsp where Ni I Sl displaystyle N i colon I to S l nbsp is defined as follows Ni c s1 sl with sj Si Ic j displaystyle N i c s 1 ldots s l text with s j S i I c j nbsp This is all we need to define the system s state for the following time steps Si 1 Zk S displaystyle S i 1 colon mathbb Z k to S nbsp with i N displaystyle i in mathbb N nbsp Si 1 c T Ni c c ISi c c Zk I displaystyle S i 1 c begin cases T N i c amp c in I S i c amp c in mathbb Z k setminus I end cases nbsp Note that Si displaystyle S i nbsp is defined on Zk displaystyle mathbb Z k nbsp and not just on I displaystyle I nbsp since the boundary conditions need to be set too Sometimes the elements of Ic displaystyle I c nbsp may be defined by a vector addition modulo the simulation space s dimension to realize toroidal topologies Ic j x s j c x mod n1 nk displaystyle I c j mid exists x in s j c x mod n 1 ldots n k nbsp This may be useful for implementing periodic boundary conditions which simplifies certain physical models Example 2D Jacobi iteration edit nbsp Data dependencies of a selected cell in the 2D array To illustrate the formal definition we ll have a look at how a two dimensional Jacobi iteration can be defined The update function computes the arithmetic mean of a cell s four neighbors In this case we set off with an initial solution of 0 The left and right boundary are fixed at 1 while the upper and lower boundaries are set to 0 After a sufficient number of iterations the system converges against a saddle shape I 0 99 2S RS0 Z2 RS0 x y 1 x lt 00 0 x lt 1001 x 100s 0 1 1 0 1 0 0 1 T R4 RT x1 x2 x3 x4 0 25 x1 x2 x3 x4 displaystyle begin aligned I amp 0 ldots 99 2 S amp mathbb R S 0 amp mathbb Z 2 to mathbb R S 0 x y amp begin cases 1 amp x lt 0 0 amp 0 leq x lt 100 1 amp x geq 100 end cases s amp 0 1 1 0 1 0 0 1 T amp colon mathbb R 4 to mathbb R T x 1 x 2 x 3 x 4 amp 0 25 cdot x 1 x 2 x 3 x 4 end aligned nbsp nbsp S0 displaystyle S 0 nbsp nbsp S200 displaystyle S 200 nbsp nbsp S400 displaystyle S 400 nbsp nbsp S600 displaystyle S 600 nbsp nbsp S800 displaystyle S 800 nbsp nbsp S1000 displaystyle S 1000 nbsp 2D Jacobi Iteration on a 1002 displaystyle 100 2 nbsp ArrayStencils editThe shape of the neighborhood used during the updates depends on the application itself The most common stencils are the 2D or 3D versions of the von Neumann neighborhood and Moore neighborhood The example above uses a 2D von Neumann stencil while LBM codes generally use its 3D variant Conway s Game of Life uses the 2D Moore neighborhood That said other stencils such as a 25 point stencil for seismic wave propagation 5 can be found too nbsp 9 point 2D stencil nbsp 5 point 2D stencil nbsp 7 point 3D stencil nbsp 25 point 3D stencilA selection of stencils used in various scientific applications Implementation issues editMany simulation codes may be formulated naturally as ISLs Since computing time and memory consumption grow linearly with the number of array elements parallel implementations of ISLs are of paramount importance to research 6 This is challenging since the computations are tightly coupled because of the cell updates depending on neighboring cells and most ISLs are memory bound i e the ratio of memory accesses to calculations is high 7 Virtually all current parallel architectures have been explored for executing ISLs efficiently 8 at the moment GPGPUs have proven to be most efficient 9 Libraries editDue to both the importance of ISLs to computer simulations and their high computational requirements there are a number of efforts which aim at creating reusable libraries to support scientists in performing stencil based computations The libraries are mostly concerned with the parallelization but may also tackle other challenges such as IO steering and checkpointing They may be classified by their API Patch based libraries edit This is a traditional design The library manages a set of n dimensional scalar arrays which the user program may access to perform updates The library handles the synchronization of the boundaries dubbed ghost zone or halo The advantage of this interface is that the user program may loop over the arrays which makes it easy to integrate legacy code 10 The disadvantage is that the library can not handle cache blocking as this has to be done within the loops 11 or wrapping of the API calls for accelerators e g via CUDA or OpenCL Implementations include Cactus a physics problem solving environment and waLBerla Cell based libraries edit These libraries move the interface to updating single simulation cells only the current cell and its neighbors are exposed e g via getter setter methods The advantage of this approach is that the library can control tightly which cells are updated in which order which is useful not only to implement cache blocking 9 but also to run the same code on multi cores and GPUs 12 This approach requires the user to recompile the source code together with the library Otherwise a function call for every cell update would be required which would seriously impair performance This is only feasible with techniques such as class templates or metaprogramming which is also the reason why this design is only found in newer libraries Examples are Physis and LibGeoDecomp See also editAdvanced Simulation Library Finite difference method Computer simulation Five point stencil Compact stencil Non compact stencil Stencil jumping Stencil numerical analysis References edit a b c Roth Gerald et al 1997 Proceedings of SC 97 High Performance Networking and Computing Compiling Stencils in High Performance Fortran a b c d Sloot Peter M A et al May 28 2002 Computational Science ICCS 2002 International Conference Amsterdam the Netherlands April 21 24 2002 Proceedings Part I Page 843 Publisher Springer ISBN 3 540 43591 3 a b c Fey Dietmar et al 2010 Grid Computing Eine Basistechnologie fur Computational Science Page 439 Publisher Springer ISBN 3 540 79746 7 Yang Laurence T Guo Minyi August 12 2005 High Performance Computing Paradigm and Infrastructure Page 221 Publisher Wiley Interscience ISBN 0 471 65471 X Micikevicius Paulius et al 2009 3D finite difference computation on GPUs using CUDA Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units ISBN 978 1 60558 517 8 Datta Kaushik 2009 Auto tuning Stencil Codes for Cache Based Multicore Platforms Archived 2012 10 08 at the Wayback Machine Ph D Thesis Wellein G et al 2009 Efficient temporal blocking for stencil computations by multicore aware wavefront parallelization 33rd Annual IEEE International Computer Software and Applications Conference COMPSAC 2009 Datta Kaushik et al 2008 Stencil computation optimization and auto tuning on state of the art multicore architectures SC 08 Proceedings of the 2008 ACM IEEE conference on Supercomputing a b Schafer Andreas and Fey Dietmar 2011 High Performance Stencil Code Algorithms for GPGPUs Proceedings of the International Conference on Computational Science ICCS 2011 S Donath J Gotz C Feichtinger K Iglberger and U Rude 2010 waLBerla Optimization for Itanium based Systems with Thousands of Processors High Performance Computing in Science and Engineering Garching Munich 2009 Nguyen Anthony et al 2010 3 5 D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs SC 10 Proceedings of the 2010 ACM IEEE International Conference for High Performance Computing Networking Storage and Analysis Naoya Maruyama Tatsuo Nomura Kento Sato and Satoshi Matsuoka 2011 Physis An Implicitly Parallel Programming Model for Stencil Computations on Large Scale GPU Accelerated Supercomputers SC 11 Proceedings of the 2011 ACM IEEE International Conference for High Performance Computing Networking Storage and AnalysisExternal links editPhysis LibGeoDecomp waLBerla Retrieved from https en wikipedia org w index php title Iterative Stencil Loops amp oldid 1210183906, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.