fbpx
Wikipedia

Parallel RAM

In computer science, a parallel random-access machine (parallel RAM or PRAM) is a shared-memory abstract machine. As its name indicates, the PRAM is intended as the parallel-computing analogy to the random-access machine (RAM) (not to be confused with random-access memory). In the same way that the RAM is used by sequential-algorithm designers to model algorithmic performance (such as time complexity), the PRAM is used by parallel-algorithm designers to model parallel algorithmic performance (such as time complexity, where the number of processors assumed is typically also stated). Similar to the way in which the RAM model neglects practical issues, such as access time to cache memory versus main memory, the PRAM model neglects such issues as synchronization and communication, but provides any (problem-size-dependent) number of processors. Algorithm cost, for instance, is estimated using two parameters O(time) and O(time × processor_number).

Read/write conflicts edit

Read/write conflicts, commonly termed interlocking in accessing the same shared memory location simultaneously are resolved by one of the following strategies:

  1. Exclusive read exclusive write (EREW)—every memory cell can be read or written to by only one processor at a time
  2. Concurrent read exclusive write (CREW)—multiple processors can read a memory cell but only one can write at a time
  3. Exclusive read concurrent write (ERCW)—never considered[citation needed]
  4. Concurrent read concurrent write (CRCW)—multiple processors can read and write. A CRCW PRAM is sometimes called a concurrent random-access machine.[1]

Here, E and C stand for 'exclusive' and 'concurrent' respectively. The read causes no discrepancies while the concurrent write is further defined as:

Common—all processors write the same value; otherwise is illegal
Arbitrary—only one arbitrary attempt is successful, others retire
Priority—processor rank indicates who gets to write
Another kind of array reduction operation like SUM, Logical AND or MAX.

Several simplifying assumptions are made while considering the development of algorithms for PRAM. They are:

  1. There is no limit on the number of processors in the machine.
  2. Any memory location is uniformly accessible from any processor.
  3. There is no limit on the amount of shared memory in the system.
  4. Resource contention is absent.
  5. The programs written on these machines are, in general, of type SIMD.

These kinds of algorithms are useful for understanding the exploitation of concurrency, dividing the original problem into similar sub-problems and solving them in parallel. The introduction of the formal 'P-RAM' model in Wyllie's 1979 thesis[2] had the aim of quantifying analysis of parallel algorithms in a way analogous to the Turing Machine. The analysis focused on a MIMD model of programming using a CREW model but showed that many variants, including implementing a CRCW model and implementing on an SIMD machine, were possible with only constant overhead.

Implementation edit

PRAM algorithms cannot be parallelized with the combination of CPU and dynamic random-access memory (DRAM) because DRAM does not allow concurrent access to a single bank (not even different addresses in the bank); but they can be implemented in hardware or read/write to the internal static random-access memory (SRAM) blocks of a field-programmable gate array (FPGA), it can be done using a CRCW algorithm.

However, the test for practical relevance of PRAM (or RAM) algorithms depends on whether their cost model provides an effective abstraction of some computer; the structure of that computer can be quite different than the abstract model. The knowledge of the layers of software and hardware that need to be inserted is beyond the scope of this article. But, articles such as Vishkin (2011) demonstrate how a PRAM-like abstraction can be supported by the explicit multi-threading (XMT) paradigm and articles such as Caragea & Vishkin (2011) demonstrate that a PRAM algorithm for the maximum flow problem can provide strong speedups relative to the fastest serial program for the same problem. The article Ghanim, Vishkin & Barua (2018) demonstrated that PRAM algorithms as-is can achieve competitive performance even without any additional effort to cast them as multi-threaded programs on XMT.

Example code edit

This is an example of SystemVerilog code which finds the maximum value in the array in only 2 clock cycles. It compares all the combinations of the elements in the array at the first clock, and merges the result at the second clock. It uses CRCW memory; m[i] <= 1 and maxNo <= data[i] are written concurrently. The concurrency causes no conflicts because the algorithm guarantees that the same value is written to the same memory. This code can be run on FPGA hardware.

module FindMax #(parameter int len = 8)  (input bit clock, resetN, input bit[7:0] data[len], output bit[7:0] maxNo);  typedef enum bit[1:0] {COMPARE, MERGE, DONE} State;    State state;  bit m[len];  int i, j;    always_ff @(posedge clock, negedge resetN) begin  if (!resetN) begin  for (i = 0; i < len; i++) m[i] <= 0;  state <= COMPARE;  end else begin  case (state)  COMPARE: begin  for (i = 0; i < len; i++) begin  for (j = 0; j < len; j++) begin  if (data[i] < data[j]) m[i] <= 1;  end  end  state <= MERGE;  end    MERGE: begin  for (i = 0; i < len; i++) begin  if (m[i] == 0) maxNo <= data[i];  end  state <= DONE;  end  endcase  end  end endmodule 

See also edit

References edit

  1. ^ Neil Immerman, Expressibility and parallel complexity. SIAM Journal on Computing, vol. 18, no. 3, pp. 625-638, 1989.
  2. ^ Wyllie, James C. The Complexity of Parallel Computations, PhD Thesis, Dept. of Computer Science, Cornell University
  • Eppstein, David; Galil, Zvi (1988), "Parallel algorithmic techniques for combinatorial computation", Annu. Rev. Comput. Sci., 3: 233–283, doi:10.1146/annurev.cs.03.060188.001313
  • JaJa, Joseph (1992), An Introduction to Parallel Algorithms, Addison-Wesley, ISBN 0-201-54856-9
  • Karp, Richard M.; Ramachandran, Vijaya (1988), A Survey of Parallel Algorithms for Shared-Memory Machines, University of California, Berkeley, Department of EECS, Tech. Rep. UCB/CSD-88-408
  • Keller, Jörg; Christoph Keßler; Jesper Träff (2001). Practical PRAM Programming. John Wiley and Sons. ISBN 0-471-35351-5.
  • Vishkin, Uzi (2009), Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques, 104 pages (PDF), Class notes of courses on parallel algorithms taught since 1992 at the University of Maryland, College Park, Tel Aviv University and the Technion
  • Vishkin, Uzi (2011), "Using simple abstraction to reinvent computing for parallelism", Communications of the ACM, 54: 75–85, doi:10.1145/1866739.1866757
  • Caragea, George Constantin; Vishkin, Uzi (2011), "Brief announcement: Better speedups for parallel max-flow", Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures - SPAA '11, p. 131, doi:10.1145/1989493.1989511, ISBN 9781450307437, S2CID 5511743
  • Ghanim, Fady; Vishkin, Uzi; Barua, Rajeev (2018), "Easy PRAM-based High-performance Parallel Programming with ICE", IEEE Transactions on Parallel and Distributed Systems, 29 (2): 377–390, doi:10.1109/TPDS.2017.2754376, hdl:1903/18521

External links edit

  • Saarland University's prototype PRAM
  • University Of Maryland's PRAM-On-Chip prototype. This prototype seeks to put many parallel processors and the fabric for inter-connecting them on a single chip
  • XMTC: PRAM-like Programming - Software release

parallel, this, article, includes, list, general, references, lacks, sufficient, corresponding, inline, citations, please, help, improve, this, article, introducing, more, precise, citations, july, 2016, learn, when, remove, this, template, message, computer, . This article includes a list of general references but it lacks sufficient corresponding inline citations Please help to improve this article by introducing more precise citations July 2016 Learn how and when to remove this template message In computer science a parallel random access machine parallel RAM or PRAM is a shared memory abstract machine As its name indicates the PRAM is intended as the parallel computing analogy to the random access machine RAM not to be confused with random access memory In the same way that the RAM is used by sequential algorithm designers to model algorithmic performance such as time complexity the PRAM is used by parallel algorithm designers to model parallel algorithmic performance such as time complexity where the number of processors assumed is typically also stated Similar to the way in which the RAM model neglects practical issues such as access time to cache memory versus main memory the PRAM model neglects such issues as synchronization and communication but provides any problem size dependent number of processors Algorithm cost for instance is estimated using two parameters O time and O time processor number Contents 1 Read write conflicts 2 Implementation 3 Example code 4 See also 5 References 6 External linksRead write conflicts editRead write conflicts commonly termed interlocking in accessing the same shared memory location simultaneously are resolved by one of the following strategies Exclusive read exclusive write EREW every memory cell can be read or written to by only one processor at a time Concurrent read exclusive write CREW multiple processors can read a memory cell but only one can write at a time Exclusive read concurrent write ERCW never considered citation needed Concurrent read concurrent write CRCW multiple processors can read and write A CRCW PRAM is sometimes called a concurrent random access machine 1 Here E and C stand for exclusive and concurrent respectively The read causes no discrepancies while the concurrent write is further defined as Common all processors write the same value otherwise is illegal Arbitrary only one arbitrary attempt is successful others retire Priority processor rank indicates who gets to write Another kind of array reduction operation like SUM Logical AND or MAX dd Several simplifying assumptions are made while considering the development of algorithms for PRAM They are There is no limit on the number of processors in the machine Any memory location is uniformly accessible from any processor There is no limit on the amount of shared memory in the system Resource contention is absent The programs written on these machines are in general of type SIMD These kinds of algorithms are useful for understanding the exploitation of concurrency dividing the original problem into similar sub problems and solving them in parallel The introduction of the formal P RAM model in Wyllie s 1979 thesis 2 had the aim of quantifying analysis of parallel algorithms in a way analogous to the Turing Machine The analysis focused on a MIMD model of programming using a CREW model but showed that many variants including implementing a CRCW model and implementing on an SIMD machine were possible with only constant overhead Implementation editPRAM algorithms cannot be parallelized with the combination of CPU and dynamic random access memory DRAM because DRAM does not allow concurrent access to a single bank not even different addresses in the bank but they can be implemented in hardware or read write to the internal static random access memory SRAM blocks of a field programmable gate array FPGA it can be done using a CRCW algorithm However the test for practical relevance of PRAM or RAM algorithms depends on whether their cost model provides an effective abstraction of some computer the structure of that computer can be quite different than the abstract model The knowledge of the layers of software and hardware that need to be inserted is beyond the scope of this article But articles such as Vishkin 2011 demonstrate how a PRAM like abstraction can be supported by the explicit multi threading XMT paradigm and articles such as Caragea amp Vishkin 2011 demonstrate that a PRAM algorithm for the maximum flow problem can provide strong speedups relative to the fastest serial program for the same problem The article Ghanim Vishkin amp Barua 2018 demonstrated that PRAM algorithms as is can achieve competitive performance even without any additional effort to cast them as multi threaded programs on XMT Example code editThis is an example of SystemVerilog code which finds the maximum value in the array in only 2 clock cycles It compares all the combinations of the elements in the array at the first clock and merges the result at the second clock It uses CRCW memory m i lt 1 and maxNo lt data i are written concurrently The concurrency causes no conflicts because the algorithm guarantees that the same value is written to the same memory This code can be run on FPGA hardware module FindMax parameter int len 8 input bit clock resetN input bit 7 0 data len output bit 7 0 maxNo typedef enum bit 1 0 COMPARE MERGE DONE State State state bit m len int i j always ff posedge clock negedge resetN begin if resetN begin for i 0 i lt len i m i lt 0 state lt COMPARE end else begin case state COMPARE begin for i 0 i lt len i begin for j 0 j lt len j begin if data i lt data j m i lt 1 end end state lt MERGE end MERGE begin for i 0 i lt len i begin if m i 0 maxNo lt data i end state lt DONE end endcase end end endmoduleSee also editAnalysis of PRAM algorithms Flynn s taxonomy Lock free and wait free algorithms Random access machine Parallel programming model XMTC Parallel external memory Model References edit Neil Immerman Expressibility and parallel complexity SIAM Journal on Computing vol 18 no 3 pp 625 638 1989 Wyllie James C The Complexity of Parallel Computations PhD Thesis Dept of Computer Science Cornell University Eppstein David Galil Zvi 1988 Parallel algorithmic techniques for combinatorial computation Annu Rev Comput Sci 3 233 283 doi 10 1146 annurev cs 03 060188 001313 JaJa Joseph 1992 An Introduction to Parallel Algorithms Addison Wesley ISBN 0 201 54856 9 Karp Richard M Ramachandran Vijaya 1988 A Survey of Parallel Algorithms for Shared Memory Machines University of California Berkeley Department of EECS Tech Rep UCB CSD 88 408 Keller Jorg Christoph Kessler Jesper Traff 2001 Practical PRAM Programming John Wiley and Sons ISBN 0 471 35351 5 Vishkin Uzi 2009 Thinking in Parallel Some Basic Data Parallel Algorithms and Techniques 104 pages PDF Class notes of courses on parallel algorithms taught since 1992 at the University of Maryland College Park Tel Aviv University and the Technion Vishkin Uzi 2011 Using simple abstraction to reinvent computing for parallelism Communications of the ACM 54 75 85 doi 10 1145 1866739 1866757 Caragea George Constantin Vishkin Uzi 2011 Brief announcement Better speedups for parallel max flow Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures SPAA 11 p 131 doi 10 1145 1989493 1989511 ISBN 9781450307437 S2CID 5511743 Ghanim Fady Vishkin Uzi Barua Rajeev 2018 Easy PRAM based High performance Parallel Programming with ICE IEEE Transactions on Parallel and Distributed Systems 29 2 377 390 doi 10 1109 TPDS 2017 2754376 hdl 1903 18521External links editSaarland University s prototype PRAM University Of Maryland s PRAM On Chip prototype This prototype seeks to put many parallel processors and the fabric for inter connecting them on a single chip XMTC PRAM like Programming Software release Retrieved from https en wikipedia org w index php title Parallel RAM amp oldid 1171366468, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.