fbpx
Wikipedia

Granularity (parallel computing)

In parallel computing, granularity (or grain size) of a task is a measure of the amount of work (or computation) which is performed by that task.[1]

Another definition of granularity takes into account the communication overhead between multiple processors or processing elements. It defines granularity as the ratio of computation time to communication time, wherein, computation time is the time required to perform the computation of a task and communication time is the time required to exchange data between processors.[2]

If is the computation time and denotes the communication time, then the Granularity G of a task can be calculated as:[2]

Granularity is usually measured in terms of the number of instructions executed in a particular task.[1] Alternately, granularity can also be specified in terms of the execution time of a program, combining the computation time and communication time.[1]

Types of parallelism

Depending on the amount of work which is performed by a parallel task, parallelism can be classified into three categories: fine-grained, medium-grained and coarse-grained parallelism.

Fine-grained parallelism

In fine-grained parallelism, a program is broken down to a large number of small tasks. These tasks are assigned individually to many processors. The amount of work associated with a parallel task is low and the work is evenly distributed among the processors. Hence, fine-grained parallelism facilitates load balancing.[3]

As each task processes less data, the number of processors required to perform the complete processing is high. This in turn, increases the communication and synchronization overhead.

Fine-grained parallelism is best exploited in architectures which support fast communication. Shared memory architecture which has a low communication overhead is most suitable for fine-grained parallelism.

It is difficult for programmers to detect parallelism in a program, therefore, it is usually the compilers' responsibility to detect fine-grained parallelism.[1]

An example of a fine-grained system (from outside the parallel computing domain) is the system of neurons in our brain.[4]

Connection Machine (CM-2) and J-Machine are examples of fine-grain parallel computers that have grain size in the range of 4-5 μs.[1]

Coarse-grained parallelism

In coarse-grained parallelism, a program is split into large tasks. Due to this, a large amount of computation takes place in processors. This might result in load imbalance, wherein certain tasks process the bulk of the data while others might be idle. Further, coarse-grained parallelism fails to exploit the parallelism in the program as most of the computation is performed sequentially on a processor. The advantage of this type of parallelism is low communication and synchronization overhead.

Message-passing architecture takes a long time to communicate data among processes which makes it suitable for coarse-grained parallelism.[1]

Cray Y-MP is an example of coarse-grained parallel computer which has a grain size of about 20s.[1]

Medium-grained parallelism

Medium-grained parallelism is used relatively to fine-grained and coarse-grained parallelism. Medium-grained parallelism is a compromise between fine-grained and coarse-grained parallelism, where we have task size and communication time greater than fine-grained parallelism and lower than coarse-grained parallelism. Most general-purpose parallel computers fall in this category.[4]

Intel iPSC is an example of medium-grained parallel computer which has a grain size of about 10ms.[1]

Example

Consider a 10*10 image that needs to be processed, given that, processing of the 100 pixels is independent of each other.

Fine-grained parallelism: Assume there are 100 processors that are responsible for processing the 10*10 image. Ignoring the communication overhead, the 100 processors can process the 10*10 image in 1 clock cycle. Each processor is working on 1 pixel of the image and then communicates the output to other processors. This is an example of fine-grained parallelism.

Medium-grained parallelism: Consider that there are 25 processors processing the 10*10 image. The processing of the image will now take 4 clock cycles. This is an example of medium-grained parallelism.

Coarse-grained parallelism: Further, if we reduce the processors to 2, then the processing will take 50 clock cycles. Each processor need to process 50 elements which increases the computation time, but the communication overhead decreases as the number of processors which share data decreases. This case illustrates coarse-grained parallelism.

Fine-grain : Pseudocode for 100 processors Medium-grain : Pseudocode for 25 processors Coarse-grain : Pseudocode for 2 processors
void main() {  switch (Processor_ID)  {  case 1: Compute element 1; break;  case 2: Compute element 2; break;  case 3: Compute element 3; break;  .  .  .  .  case 100: Compute element 100;   break;  } } 
void main() {  switch (Processor_ID)  {  case 1: Compute elements 1-4; break;  case 2: Compute elements 5-8; break;  case 3: Compute elements 9-12; break;  .  .  case 25: Compute elements 97-100;   break;  } } 
void main() {  switch (Processor_ID)  {  case 1: Compute elements 1-50;   break;  case 2: Compute elements 51-100;   break;  } } 
Computation time - 1 clock cycle Computation time - 4 clock cycles Computation time - 50 clock cycles

Levels of parallelism

Granularity is closely tied to the level of processing. A program can be broken down into 4 levels of parallelism -

  1. Instruction level.
  2. Loop level
  3. Sub-routine level and
  4. Program-level

The highest amount of parallelism is achieved at instruction level, followed by loop-level parallelism. At instruction and loop level, fine-grained parallelism is achieved. Typical grain size at instruction-level is 20 instructions, while the grain-size at loop-level is 500 instructions.[1]

At the sub-routine (or procedure) level the grain size is typically a few thousand instructions. Medium-grained parallelism is achieved at sub-routine level.[1]

At program-level, parallel execution of programs takes place. Granularity can be in the range of tens of thousands of instructions.[1] Coarse-grained parallelism is used at this level.

The below table shows the relationship between levels of parallelism, grain size and degree of parallelism

Levels Grain Size Parallelism
Instruction level Fine Highest
Loop level Fine Moderate
Sub-routine level Medium Moderate
Program level Coarse Least

Impact of granularity on performance

Granularity affects the performance of parallel computers. Using fine grains or small tasks results in more parallelism and hence increases the speedup. However, synchronization overhead, scheduling strategies etc. can negatively impact the performance of fine-grained tasks. Increasing parallelism alone cannot give the best performance.[5]

In order to reduce the communication overhead, granularity can be increased. Coarse grained tasks have less communication overhead but they often cause load imbalance. Hence optimal performance is achieved between the two extremes of fine-grained and coarse-grained parallelism.[6]

Various studies[5][7][8] have proposed their solution to help determine the best granularity to aid parallel processing. Finding the best grain size, depends on a number of factors and varies greatly from problem-to-problem.

See also

Citations

  1. ^ a b c d e f g h i j k Hwang, Kai (1992). Advanced Computer Architecture: Parallelism,Scalability,Programmability (1st ed.). McGraw-Hill Higher Education. ISBN 978-0070316225.
  2. ^ a b Kwiatkowski, Jan (9 September 2001). "Evaluation of Parallel Programs by Measurement of Its Granularity". Parallel Processing and Applied Mathematics. Lecture Notes in Computer Science. Vol. 2328. pp. 145–153. doi:10.1007/3-540-48086-2_16. ISBN 9783540437925. {{cite book}}: Missing or empty |title= (help) ISBN 9783540480860.
  3. ^ Barney, Blaise. Introduction to Parallel Computing.
  4. ^ a b Miller, Russ; Stout, Quentin F. (1996). Parallel Algorithms for Regular Architectures: Meshes and Pyramids. Cambridge, Mass.: MIT Press. pp. 5–6. ISBN 9780262132336.
  5. ^ a b Chen, Ding-Kai; Su, Hong-Men; Yew, Pen-Chung (1 January 1990). "The Impact of Synchronization and Granularity on Parallel Systems". Proceedings of the 17th Annual International Symposium on Computer Architecture. 18 (2SI): 239–248. CiteSeerX 10.1.1.51.3389. doi:10.1145/325164.325150. S2CID 16193537.
  6. ^ Yeung, Donald; Dally, William J.; Agarwal, Anant. "How to Choose the Grain Size of a Parallel Computer". CiteSeerX 10.1.1.66.3298. {{cite journal}}: Cite journal requires |journal= (help)
  7. ^ McCreary, Carolyn; Gill, Helen (1 September 1989). "Automatic Determination of Grain Size for Efficient Parallel Processing". Commun. ACM. 32 (9): 1073–1078. doi:10.1145/66451.66454. ISSN 0001-0782. S2CID 14807217.
  8. ^ Kruatrachue, Boontee; Lewis, Ted (1 January 1988). "Grain Size Determination for Parallel Processing". IEEE Softw. 5 (1): 23–32. doi:10.1109/52.1991. ISSN 0740-7459. S2CID 2034255.

granularity, parallel, computing, parallel, computing, granularity, grain, size, task, measure, amount, work, computation, which, performed, that, task, another, definition, granularity, takes, into, account, communication, overhead, between, multiple, process. In parallel computing granularity or grain size of a task is a measure of the amount of work or computation which is performed by that task 1 Another definition of granularity takes into account the communication overhead between multiple processors or processing elements It defines granularity as the ratio of computation time to communication time wherein computation time is the time required to perform the computation of a task and communication time is the time required to exchange data between processors 2 If T c o m p displaystyle T mathrm comp is the computation time and T c o m m displaystyle T mathrm comm denotes the communication time then the Granularity G of a task can be calculated as 2 G T c o m p T c o m m displaystyle G frac T mathrm comp T mathrm comm dd Granularity is usually measured in terms of the number of instructions executed in a particular task 1 Alternately granularity can also be specified in terms of the execution time of a program combining the computation time and communication time 1 Contents 1 Types of parallelism 1 1 Fine grained parallelism 1 2 Coarse grained parallelism 1 3 Medium grained parallelism 2 Example 3 Levels of parallelism 4 Impact of granularity on performance 5 See also 6 CitationsTypes of parallelism EditDepending on the amount of work which is performed by a parallel task parallelism can be classified into three categories fine grained medium grained and coarse grained parallelism Fine grained parallelism Edit For more see Microparallelism In fine grained parallelism a program is broken down to a large number of small tasks These tasks are assigned individually to many processors The amount of work associated with a parallel task is low and the work is evenly distributed among the processors Hence fine grained parallelism facilitates load balancing 3 As each task processes less data the number of processors required to perform the complete processing is high This in turn increases the communication and synchronization overhead Fine grained parallelism is best exploited in architectures which support fast communication Shared memory architecture which has a low communication overhead is most suitable for fine grained parallelism It is difficult for programmers to detect parallelism in a program therefore it is usually the compilers responsibility to detect fine grained parallelism 1 An example of a fine grained system from outside the parallel computing domain is the system of neurons in our brain 4 Connection Machine CM 2 and J Machine are examples of fine grain parallel computers that have grain size in the range of 4 5 ms 1 Coarse grained parallelism Edit In coarse grained parallelism a program is split into large tasks Due to this a large amount of computation takes place in processors This might result in load imbalance wherein certain tasks process the bulk of the data while others might be idle Further coarse grained parallelism fails to exploit the parallelism in the program as most of the computation is performed sequentially on a processor The advantage of this type of parallelism is low communication and synchronization overhead Message passing architecture takes a long time to communicate data among processes which makes it suitable for coarse grained parallelism 1 Cray Y MP is an example of coarse grained parallel computer which has a grain size of about 20s 1 Medium grained parallelism Edit Medium grained parallelism is used relatively to fine grained and coarse grained parallelism Medium grained parallelism is a compromise between fine grained and coarse grained parallelism where we have task size and communication time greater than fine grained parallelism and lower than coarse grained parallelism Most general purpose parallel computers fall in this category 4 Intel iPSC is an example of medium grained parallel computer which has a grain size of about 10ms 1 Example EditConsider a 10 10 image that needs to be processed given that processing of the 100 pixels is independent of each other Fine grained parallelism Assume there are 100 processors that are responsible for processing the 10 10 image Ignoring the communication overhead the 100 processors can process the 10 10 image in 1 clock cycle Each processor is working on 1 pixel of the image and then communicates the output to other processors This is an example of fine grained parallelism Medium grained parallelism Consider that there are 25 processors processing the 10 10 image The processing of the image will now take 4 clock cycles This is an example of medium grained parallelism Coarse grained parallelism Further if we reduce the processors to 2 then the processing will take 50 clock cycles Each processor need to process 50 elements which increases the computation time but the communication overhead decreases as the number of processors which share data decreases This case illustrates coarse grained parallelism Fine grain Pseudocode for 100 processors Medium grain Pseudocode for 25 processors Coarse grain Pseudocode for 2 processorsvoid main switch Processor ID case 1 Compute element 1 break case 2 Compute element 2 break case 3 Compute element 3 break case 100 Compute element 100 break void main switch Processor ID case 1 Compute elements 1 4 break case 2 Compute elements 5 8 break case 3 Compute elements 9 12 break case 25 Compute elements 97 100 break void main switch Processor ID case 1 Compute elements 1 50 break case 2 Compute elements 51 100 break Computation time 1 clock cycle Computation time 4 clock cycles Computation time 50 clock cyclesLevels of parallelism EditGranularity is closely tied to the level of processing A program can be broken down into 4 levels of parallelism Instruction level Loop level Sub routine level and Program levelThe highest amount of parallelism is achieved at instruction level followed by loop level parallelism At instruction and loop level fine grained parallelism is achieved Typical grain size at instruction level is 20 instructions while the grain size at loop level is 500 instructions 1 At the sub routine or procedure level the grain size is typically a few thousand instructions Medium grained parallelism is achieved at sub routine level 1 At program level parallel execution of programs takes place Granularity can be in the range of tens of thousands of instructions 1 Coarse grained parallelism is used at this level The below table shows the relationship between levels of parallelism grain size and degree of parallelism Levels Grain Size ParallelismInstruction level Fine HighestLoop level Fine ModerateSub routine level Medium ModerateProgram level Coarse LeastImpact of granularity on performance EditGranularity affects the performance of parallel computers Using fine grains or small tasks results in more parallelism and hence increases the speedup However synchronization overhead scheduling strategies etc can negatively impact the performance of fine grained tasks Increasing parallelism alone cannot give the best performance 5 In order to reduce the communication overhead granularity can be increased Coarse grained tasks have less communication overhead but they often cause load imbalance Hence optimal performance is achieved between the two extremes of fine grained and coarse grained parallelism 6 Various studies 5 7 8 have proposed their solution to help determine the best granularity to aid parallel processing Finding the best grain size depends on a number of factors and varies greatly from problem to problem See also EditInstruction level parallelism Data ParallelismCitations Edit a b c d e f g h i j k Hwang Kai 1992 Advanced Computer Architecture Parallelism Scalability Programmability 1st ed McGraw Hill Higher Education ISBN 978 0070316225 a b Kwiatkowski Jan 9 September 2001 Evaluation of Parallel Programs by Measurement of Its Granularity Parallel Processing and Applied Mathematics Lecture Notes in Computer Science Vol 2328 pp 145 153 doi 10 1007 3 540 48086 2 16 ISBN 9783540437925 a href Template Cite book html title Template Cite book cite book a Missing or empty title help ISBN 9783540480860 Barney Blaise Introduction to Parallel Computing a b Miller Russ Stout Quentin F 1996 Parallel Algorithms for Regular Architectures Meshes and Pyramids Cambridge Mass MIT Press pp 5 6 ISBN 9780262132336 a b Chen Ding Kai Su Hong Men Yew Pen Chung 1 January 1990 The Impact of Synchronization and Granularity on Parallel Systems Proceedings of the 17th Annual International Symposium on Computer Architecture 18 2SI 239 248 CiteSeerX 10 1 1 51 3389 doi 10 1145 325164 325150 S2CID 16193537 Yeung Donald Dally William J Agarwal Anant How to Choose the Grain Size of a Parallel Computer CiteSeerX 10 1 1 66 3298 a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help McCreary Carolyn Gill Helen 1 September 1989 Automatic Determination of Grain Size for Efficient Parallel Processing Commun ACM 32 9 1073 1078 doi 10 1145 66451 66454 ISSN 0001 0782 S2CID 14807217 Kruatrachue Boontee Lewis Ted 1 January 1988 Grain Size Determination for Parallel Processing IEEE Softw 5 1 23 32 doi 10 1109 52 1991 ISSN 0740 7459 S2CID 2034255 Retrieved from https en wikipedia org w index php title Granularity parallel computing amp oldid 1099400658, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.