fbpx
Wikipedia

Explicit multi-threading

Explicit Multi-Threading (XMT) is a computer science paradigm for building and programming parallel computers designed around the parallel random-access machine (PRAM) parallel computational model. A more direct explanation of XMT starts with the rudimentary abstraction that made serial computing simple: that any single instruction available for execution in a serial program executes immediately. A consequence of this abstraction is a step-by-step (inductive) explication of the instruction available next for execution. The rudimentary parallel abstraction behind XMT, dubbed Immediate Concurrent Execution (ICE) in Vishkin (2011), is that indefinitely many instructions available for concurrent execution execute immediately. A consequence of ICE is a step-by-step (inductive) explication of the instructions available next for concurrent execution. Moving beyond the serial von Neumann computer (the only successful general-purpose platform to date), the aspiration of XMT is that computer science will again be able to augment mathematical induction with a simple one-line computing abstraction.

The random-access machine (RAM) is an abstract machine model used in computer science to study algorithms and complexity for standard serial computing. The PRAM computational model is an abstract parallel machine model that had been introduced to similarly study parallel algorithms and complexity for parallel computing, when they were yet to be built. Researchers have developed a large body of knowledge of parallel algorithms for the PRAM model. These parallel algorithms are also known for being simple, by standards of other approaches to parallel algorithms.

This large body of parallel algorithms knowledge for the PRAM model and their relative simplicity motivated building computers whose programming can be guided by these parallel algorithms. Since productivity of parallel programmers has long been considered crucial for the success a parallel computer, simplicity of algorithms is important.

Multi-core computers are built around two or more processor cores integrated on a single integrated circuit die. They are widely used across many application domains including general-purpose computing. Explicit Multi-Threading (XMT) is a computing paradigm for building and programming multi-core computers with tens, hundreds or thousands of processor cores.

Experimental work published in 2011 and 2012 demonstrates significantly greater speedups for advanced PRAM algorithms on XMT prototypes than for the same problems on state-of-the-art multi-core computers.

Work published in 2018 shows that lock-step parallel programming (using ICE) can achieve the same performance as the fastest hand-tuned multi-threaded code on XMT systems. Such inductive lock-step approach stands in contrast to multi-threaded programming approaches of many other core systems that are known for challenging programmers.

The XMT paradigm was introduced by Uzi Vishkin.

The main levels of abstraction of XMT edit

The Explicit Multi-Threading (XMT) computing paradigm integrates several levels of abstraction.

The work-time (WT) (sometimes called work-depth) framework, introduced by Shiloach & Vishkin (1982), provides a simple way for conceptualizing and describing parallel algorithms. In the WT framework, a parallel algorithm is first described in terms of parallel rounds. For each round, the operations to be performed are characterized, but several issues can be suppressed. For example, the number of operations at each round need not be clear, processors need not be mentioned and any information that may help with the assignment of processors to jobs need not be accounted for. Second, the suppressed information is provided. The inclusion of the suppressed information is, in fact, guided by the proof of a scheduling theorem due to Brent (1974). The WT framework is useful since while it can greatly simplify the initial description of a parallel algorithm, inserting the details suppressed by that initial description is often not very difficult. For example, the WT framework was adopted as the basic presentation framework in the parallel algorithms books (for the PRAM model) JaJa (1992) and Keller, Kessler & Traeff (2001), as well as in the class notes Vishkin (2009). Vishkin (2011) explains the simple connection between the WT framework and the more rudimentary ICE abstraction noted above.

The XMT paradigm can be programmed using XMTC, a parallel multi-threaded programming language which is a small extension of the programming language C. The XMT paradigm include a programmer's workflow that starts with casting an algorithm in the WT framework and proceeds to programming it in XMTC.

The XMT multi-core computer systems provides run-time load-balancing of multi-threaded programs incorporating several patents. One of them [1] generalizes the program counter concept, which is central to the von Neumann architecture to multi-core hardware.

XMT prototyping and links to more information edit

In January 2007, a 64-processor computer [2] named Paraleap,[3] that demonstrates the overall concept was completed. The XMT concept was presented in Vishkin et al. (1998) and Naishlos et al. (2003) and the XMT 64-processor computer in Wen & Vishkin (2008). Since making parallel programming easy is one of the biggest challenges facing computer science today, the demonstration also sought to include teaching the basics of PRAM algorithms and XMTC programming to students ranging from high-school Torbert et al. (2010) to graduate school.

Experimental work reported in Caragea & Vishkin (2011) for the Maximum flow problem, and in two papers by Edwards and Vishkin (2012a, 2012b) for the Graph Connectivity (Connectivity (graph theory)), Graph Biconnectivity (biconnected graph) and Graph Triconnectivity (Triconnected component) problems demonstrated that for some of the most advanced algorithms in the parallel algorithmic literature, the XMT paradigm can offer 8 times to over 100 times greater speedups than for the same problems on state-of-the-art multi-core computers. Each reported speedup was obtained by comparing clock cycles on an XMT prototype relative to the fastest serial algorithm running on the fastest serial machines.

XMT prototyping was culminated in Ghanim, Vishkin & Barua (2018), establishing that lock-step parallel programming (using ICE) can achieve the same performance as the fastest hand-tuned multi-threaded code on XMT systems. This 2018 result sharpens the contrast between XMT programming and the multi-threaded programming approaches employed by nearly all many other-core systems, whose race conditions and other demands tend to challenge, and sometimes even fail programmers Vishkin (2014).

References edit

  • Brent, Richard P. (1974), "The parallel evaluation of general arithmetic expressions", Journal of the ACM, 21 (2): 201–208, CiteSeerX 10.1.1.100.9361, doi:10.1145/321812.321815, S2CID 16416106.
  • Shiloach, Yossi; Vishkin, Uzi (1982), "An O(n2 log n) parallel max-flow algorithm", Journal of Algorithms, 3 (2): 128–146, doi:10.1016/0196-6774(82)90013-X.
  • JaJa, Joseph (1992), An Introduction to Parallel Algorithms, Addison-Wesley, ISBN 978-0-201-54856-3
  • Keller, Jorg; Kessler, Cristoph W.; Traeff, Jesper L. (2001), Practical PRAM Programming, Wiley-Interscience, ISBN 978-0-471-35351-5
  • Naishlos, Dorit; Nuzman, Joseph; Tseng, Chau-Wen; Vishkin, Uzi (2003), "Towards a First Vertical Prototyping of an Extremely Fine-Grained Parallel Programming Approach" (PDF), Theory of Computing Systems, 36 (5): 551–552, doi:10.1007/s00224-003-1086-6, S2CID 1929495.
  • Torbert, Shane; Vishkin, Uzi; Tzur, Ron; Ellison, David (2010), "Is teaching parallel algorithmic thinking to high school students possible?", Proceedings of the 41st ACM technical symposium on Computer science education - SIGCSE '10, p. 290, doi:10.1145/1734263.1734363, ISBN 9781450300063.
  • Vishkin, Uzi; Dascal, Shlomit; Berkovich, Efraim; Nuzman, Joseph (1998), "Explicit Multi-Threading (XMT) bridging models for instruction parallelism", Proc. 1998 ACM Symposium on Parallel Algorithms and Architectures (SPAA), pp. 140–151.
  • Vishkin, Uzi (2009), Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques, 104 pages (PDF), Class notes of courses on parallel algorithms taught since 1992 at the University of Maryland, College Park, Tel Aviv University and the Technion
  • Wen, Xingzhi; Vishkin, Uzi (2008), "FPGA-based prototype of a PRAM-on-chip processor", Proc. 2008 ACM Conference on Computing Frontiers (Ischia, Italy) (PDF), pp. 55–66, doi:10.1145/1366230.1366240, ISBN 9781605580777, S2CID 11557669.
  • Vishkin, Uzi (2011), "Using simple abstraction to reinvent computing for parallelism", Communications of the ACM, 54: 75–85, doi:10.1145/1866739.1866757.
  • Caragea, George; Vishkin, Uzi (2011), "Brief announcement: better speedups for parallel max-flow", Proc. 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 131–134, doi:10.1145/1989493.1989511, ISBN 9781450307437, S2CID 5511743.
  • Edwards, James A.; Vishkin, Uzi (2012a), "Better speedups using simpler parallel programming for graph connectivity and biconnectivity", Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, pp. 103–114, doi:10.1145/2141702.2141714, ISBN 9781450312110, S2CID 15095569.
  • Edwards, James A.; Vishkin, Uzi (2012b), "Brief announcement: speedups for parallel graph triconnectivity", Proc. 24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 190–192, doi:10.1145/2312005.2312042, ISBN 9781450312134, S2CID 16908459.
  • Vishkin, Uzi (2014), "Is Multi-Core Hardware for General-Purpose Parallel Processing Broken? Viewpoint article", Communications of the ACM, 57 (4): 35–39, doi:10.1145/2580945, S2CID 30098719.
  • Ghanim, Fady; Vishkin, Uzi; Barua, Rajeev (February 2018), "Easy PRAM-Based High-Performance Parallel Programming with ICE", IEEE Transactions on Parallel and Distributed Systems, 29 (2): 377–390, doi:10.1109/TPDS.2017.2754376, hdl:1903/18521.

Notes edit

  1. ^ Vishkin, Uzi. Spawn-join instruction set architecture for providing explicit multithreading. U.S. Patent 6,463,527. See also Vishkin et al. (1998).
  2. ^ University of Maryland, press release, June 26, 2007: "Maryland Professor Creates Desktop Supercomputer" 2009-12-14 at the Wayback Machine.
  3. ^ University of Maryland, A. James Clark School of Engineering, press release, November 28, 2007: "Next Big "Leap" in Computing Technology Gets a Name".

External links edit

  • Home page of the XMT project, with links to a software release, on-line tutorial and to material for teaching parallelism.

explicit, multi, threading, explicit, multi, threading, computer, science, paradigm, building, programming, parallel, computers, designed, around, parallel, random, access, machine, pram, parallel, computational, model, more, direct, explanation, starts, with,. Explicit Multi Threading XMT is a computer science paradigm for building and programming parallel computers designed around the parallel random access machine PRAM parallel computational model A more direct explanation of XMT starts with the rudimentary abstraction that made serial computing simple that any single instruction available for execution in a serial program executes immediately A consequence of this abstraction is a step by step inductive explication of the instruction available next for execution The rudimentary parallel abstraction behind XMT dubbed Immediate Concurrent Execution ICE in Vishkin 2011 is that indefinitely many instructions available for concurrent execution execute immediately A consequence of ICE is a step by step inductive explication of the instructions available next for concurrent execution Moving beyond the serial von Neumann computer the only successful general purpose platform to date the aspiration of XMT is that computer science will again be able to augment mathematical induction with a simple one line computing abstraction The random access machine RAM is an abstract machine model used in computer science to study algorithms and complexity for standard serial computing The PRAM computational model is an abstract parallel machine model that had been introduced to similarly study parallel algorithms and complexity for parallel computing when they were yet to be built Researchers have developed a large body of knowledge of parallel algorithms for the PRAM model These parallel algorithms are also known for being simple by standards of other approaches to parallel algorithms This large body of parallel algorithms knowledge for the PRAM model and their relative simplicity motivated building computers whose programming can be guided by these parallel algorithms Since productivity of parallel programmers has long been considered crucial for the success a parallel computer simplicity of algorithms is important Multi core computers are built around two or more processor cores integrated on a single integrated circuit die They are widely used across many application domains including general purpose computing Explicit Multi Threading XMT is a computing paradigm for building and programming multi core computers with tens hundreds or thousands of processor cores Experimental work published in 2011 and 2012 demonstrates significantly greater speedups for advanced PRAM algorithms on XMT prototypes than for the same problems on state of the art multi core computers Work published in 2018 shows that lock step parallel programming using ICE can achieve the same performance as the fastest hand tuned multi threaded code on XMT systems Such inductive lock step approach stands in contrast to multi threaded programming approaches of many other core systems that are known for challenging programmers The XMT paradigm was introduced by Uzi Vishkin Contents 1 The main levels of abstraction of XMT 2 XMT prototyping and links to more information 3 References 4 Notes 5 External linksThe main levels of abstraction of XMT editThe Explicit Multi Threading XMT computing paradigm integrates several levels of abstraction The work time WT sometimes called work depth framework introduced by Shiloach amp Vishkin 1982 provides a simple way for conceptualizing and describing parallel algorithms In the WT framework a parallel algorithm is first described in terms of parallel rounds For each round the operations to be performed are characterized but several issues can be suppressed For example the number of operations at each round need not be clear processors need not be mentioned and any information that may help with the assignment of processors to jobs need not be accounted for Second the suppressed information is provided The inclusion of the suppressed information is in fact guided by the proof of a scheduling theorem due to Brent 1974 The WT framework is useful since while it can greatly simplify the initial description of a parallel algorithm inserting the details suppressed by that initial description is often not very difficult For example the WT framework was adopted as the basic presentation framework in the parallel algorithms books for the PRAM model JaJa 1992 and Keller Kessler amp Traeff 2001 as well as in the class notes Vishkin 2009 Vishkin 2011 explains the simple connection between the WT framework and the more rudimentary ICE abstraction noted above The XMT paradigm can be programmed using XMTC a parallel multi threaded programming language which is a small extension of the programming language C The XMT paradigm include a programmer s workflow that starts with casting an algorithm in the WT framework and proceeds to programming it in XMTC The XMT multi core computer systems provides run time load balancing of multi threaded programs incorporating several patents One of them 1 generalizes the program counter concept which is central to the von Neumann architecture to multi core hardware XMT prototyping and links to more information editIn January 2007 a 64 processor computer 2 named Paraleap 3 that demonstrates the overall concept was completed The XMT concept was presented in Vishkin et al 1998 and Naishlos et al 2003 and the XMT 64 processor computer in Wen amp Vishkin 2008 Since making parallel programming easy is one of the biggest challenges facing computer science today the demonstration also sought to include teaching the basics of PRAM algorithms and XMTC programming to students ranging from high school Torbert et al 2010 to graduate school Experimental work reported in Caragea amp Vishkin 2011 for the Maximum flow problem and in two papers by Edwards and Vishkin 2012a 2012b for the Graph Connectivity Connectivity graph theory Graph Biconnectivity biconnected graph and Graph Triconnectivity Triconnected component problems demonstrated that for some of the most advanced algorithms in the parallel algorithmic literature the XMT paradigm can offer 8 times to over 100 times greater speedups than for the same problems on state of the art multi core computers Each reported speedup was obtained by comparing clock cycles on an XMT prototype relative to the fastest serial algorithm running on the fastest serial machines XMT prototyping was culminated in Ghanim Vishkin amp Barua 2018 establishing that lock step parallel programming using ICE can achieve the same performance as the fastest hand tuned multi threaded code on XMT systems This 2018 result sharpens the contrast between XMT programming and the multi threaded programming approaches employed by nearly all many other core systems whose race conditions and other demands tend to challenge and sometimes even fail programmers Vishkin 2014 References editBrent Richard P 1974 The parallel evaluation of general arithmetic expressions Journal of the ACM 21 2 201 208 CiteSeerX 10 1 1 100 9361 doi 10 1145 321812 321815 S2CID 16416106 Shiloach Yossi Vishkin Uzi 1982 An O n2 log n parallel max flow algorithm Journal of Algorithms 3 2 128 146 doi 10 1016 0196 6774 82 90013 X JaJa Joseph 1992 An Introduction to Parallel Algorithms Addison Wesley ISBN 978 0 201 54856 3 Keller Jorg Kessler Cristoph W Traeff Jesper L 2001 Practical PRAM Programming Wiley Interscience ISBN 978 0 471 35351 5 Naishlos Dorit Nuzman Joseph Tseng Chau Wen Vishkin Uzi 2003 Towards a First Vertical Prototyping of an Extremely Fine Grained Parallel Programming Approach PDF Theory of Computing Systems 36 5 551 552 doi 10 1007 s00224 003 1086 6 S2CID 1929495 Torbert Shane Vishkin Uzi Tzur Ron Ellison David 2010 Is teaching parallel algorithmic thinking to high school students possible Proceedings of the 41st ACM technical symposium on Computer science education SIGCSE 10 p 290 doi 10 1145 1734263 1734363 ISBN 9781450300063 Vishkin Uzi Dascal Shlomit Berkovich Efraim Nuzman Joseph 1998 Explicit Multi Threading XMT bridging models for instruction parallelism Proc 1998 ACM Symposium on Parallel Algorithms and Architectures SPAA pp 140 151 Vishkin Uzi 2009 Thinking in Parallel Some Basic Data Parallel Algorithms and Techniques 104 pages PDF Class notes of courses on parallel algorithms taught since 1992 at the University of Maryland College Park Tel Aviv University and the Technion Wen Xingzhi Vishkin Uzi 2008 FPGA based prototype of a PRAM on chip processor Proc 2008 ACM Conference on Computing Frontiers Ischia Italy PDF pp 55 66 doi 10 1145 1366230 1366240 ISBN 9781605580777 S2CID 11557669 Vishkin Uzi 2011 Using simple abstraction to reinvent computing for parallelism Communications of the ACM 54 75 85 doi 10 1145 1866739 1866757 Caragea George Vishkin Uzi 2011 Brief announcement better speedups for parallel max flow Proc 23rd ACM Symposium on Parallelism in Algorithms and Architectures SPAA pp 131 134 doi 10 1145 1989493 1989511 ISBN 9781450307437 S2CID 5511743 Edwards James A Vishkin Uzi 2012a Better speedups using simpler parallel programming for graph connectivity and biconnectivity Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores pp 103 114 doi 10 1145 2141702 2141714 ISBN 9781450312110 S2CID 15095569 Edwards James A Vishkin Uzi 2012b Brief announcement speedups for parallel graph triconnectivity Proc 24th ACM Symposium on Parallelism in Algorithms and Architectures SPAA pp 190 192 doi 10 1145 2312005 2312042 ISBN 9781450312134 S2CID 16908459 Vishkin Uzi 2014 Is Multi Core Hardware for General Purpose Parallel Processing Broken Viewpoint article Communications of the ACM 57 4 35 39 doi 10 1145 2580945 S2CID 30098719 Ghanim Fady Vishkin Uzi Barua Rajeev February 2018 Easy PRAM Based High Performance Parallel Programming with ICE IEEE Transactions on Parallel and Distributed Systems 29 2 377 390 doi 10 1109 TPDS 2017 2754376 hdl 1903 18521 Notes edit Vishkin Uzi Spawn join instruction set architecture for providing explicit multithreading U S Patent 6 463 527 See also Vishkin et al 1998 University of Maryland press release June 26 2007 Maryland Professor Creates Desktop Supercomputer Archived 2009 12 14 at the Wayback Machine University of Maryland A James Clark School of Engineering press release November 28 2007 Next Big Leap in Computing Technology Gets a Name External links editHome page of the XMT project with links to a software release on line tutorial and to material for teaching parallelism Retrieved from https en wikipedia org w index php title Explicit multi threading amp oldid 1193503389, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.