fbpx
Wikipedia

Algorithmic skeleton

In computing, algorithmic skeletons, or parallelism patterns, are a high-level parallel programming model for parallel and distributed computing.

Algorithmic skeletons take advantage of common programming patterns to hide the complexity of parallel and distributed applications. Starting from a basic set of patterns (skeletons), more complex patterns can be built by combining the basic ones.

Overview edit

The most outstanding feature of algorithmic skeletons, which differentiates them from other high-level parallel programming models, is that orchestration and synchronization of the parallel activities is implicitly defined by the skeleton patterns. Programmers do not have to specify the synchronizations between the application's sequential parts. This yields two implications. First, as the communication/data access patterns are known in advance, cost models can be applied to schedule skeletons programs.[1] Second, that algorithmic skeleton programming reduces the number of errors when compared to traditional lower-level parallel programming models (Threads, MPI).

Example program edit

The following example is based on the Java Skandium library for parallel programming.

The objective is to implement an Algorithmic Skeleton-based parallel version of the QuickSort algorithm using the Divide and Conquer pattern. Notice that the high-level approach hides Thread management from the programmer.

// 1. Define the skeleton program Skeleton<Range, Range> sort = new DaC<Range, Range>(  new ShouldSplit(threshold, maxTimes),  new SplitList(),  new Sort(),  new MergeList());   // 2. Input parameters Future<Range> future = sort.input(new Range(generate(...)));   // 3. Do something else here. // ...   // 4. Block for the results Range result = future.get(); 
  1. The first thing is to define a new instance of the skeleton with the functional code that fills the pattern (ShouldSplit, SplitList, Sort, MergeList). The functional code is written by the programmer without parallelism concerns.
  2. The second step is the input of data which triggers the computation. In this case Range is a class holding an array and two indexes which allow the representation of a subarray. For every data entered into the framework a new Future object is created. More than one Future can be entered into a skeleton simultaneously.
  3. The Future allows for asynchronous computation, as other tasks can be performed while the results are computed.
  4. We can retrieve the result of the computation, blocking if necessary (i.e. results not yet available).

The functional codes in this example correspond to four types Condition, Split, Execute, and Merge.

public class ShouldSplit implements Condition<Range>{  int threshold, maxTimes, times;    public ShouldSplit(int threshold, int maxTimes){  this.threshold = threshold;  this.maxTimes = maxTimes;  this.times = 0;  }    @Override  public synchronized boolean condition(Range r){  return r.right - r.left > threshold &&  times++ < this.maxTimes;  } } 

The ShouldSplit class implements the Condition interface. The function receives an input, Range r in this case, and returning true or false. In the context of the Divide and Conquer where this function will be used, this will decide whether a sub-array should be subdivided again or not.

The SplitList class implements the split interface, which in this case divides an (sub-)array into smaller sub-arrays. The class uses a helper function partition(...) which implements the well-known QuickSort pivot and swap scheme.

public class SplitList implements Split<Range, Range>{  @Override  public Range[] split(Range r){    int i = partition(r.array, r.left, r.right);    Range[] intervals = {new Range(r.array, r.left, i-1),   new Range(r.array, i+1, r.right)};    return intervals;  } } 

The Sort class implements and Execute interface, and is in charge of sorting the sub-array specified by Range r. In this case we simply invoke Java's default (Arrays.sort) method for the given sub-array.

public class Sort implements Execute<Range, Range> {  @Override  public Range execute(Range r){    if (r.right <= r.left) return r;    Arrays.sort(r.array, r.left, r.right+1);    return r;  } } 

Finally, once a set of sub-arrays are sorted we merge the sub-array parts into a bigger array with the MergeList class which implements a Merge interface.

public class MergeList implements Merge<Range, Range>{  @Override  public Range merge(Range[] r){    Range result = new Range( r[0].array, r[0].left, r[1].right);    return result;  } } 

Frameworks and libraries edit

ASSIST edit

ASSIST[2][3] is a programming environment which provides programmers with a structured coordination language. The coordination language can express parallel programs as an arbitrary graph of software modules. The module graph describes how a set of modules interact with each other using a set of typed data streams. The modules can be sequential or parallel. Sequential modules can be written in C, C++, or Fortran; and parallel modules are programmed with a special ASSIST parallel module (parmod).

AdHoc,[4][5] a hierarchical and fault-tolerant Distributed Shared Memory (DSM) system is used to interconnect streams of data between processing elements by providing a repository with: get/put/remove/execute operations. Research around AdHoc has focused on transparency, scalability, and fault-tolerance of the data repository.

While not a classical skeleton framework, in the sense that no skeletons are provided, ASSIST's generic parmod can be specialized into classical skeletons such as: farm, map, etc. ASSIST also supports autonomic control of parmods, and can be subject to a performance contract by dynamically adapting the number of resources used.

CO2P3S edit

CO2P3S (Correct Object-Oriented Pattern-based Parallel Programming System), is a pattern oriented development environment,[6] which achieves parallelism using threads in Java.

CO2P3S is concerned with the complete development process of a parallel application. Programmers interact through a programming GUI to choose a pattern and its configuration options. Then, programmers fill the hooks required for the pattern, and new code is generated as a framework in Java for the parallel execution of the application. The generated framework uses three levels, in descending order of abstraction: patterns layer, intermediate code layer, and native code layer. Thus, advanced programmers may intervene the generated code at multiple levels to tune the performance of their applications. The generated code is mostly type safe, using the types provided by the programmer which do not require extension of superclass, but fails to be completely type safe such as in the reduce(..., Object reducer) method in the mesh pattern.

The set of patterns supported in CO2P3S corresponds to method-sequence, distributor, mesh, and wavefront. Complex applications can be built by composing frameworks with their object references. Nevertheless, if no pattern is suitable, the MetaCO2P3S graphical tool addresses extensibility by allowing programmers to modify the pattern designs and introduce new patterns into CO2P3S.

Support for distributed memory architectures in CO2P3S was introduced in later.[7] To use a distributed memory pattern, programmers must change the pattern's memory option from shared to distributed, and generate the new code. From the usage perspective, the distributed memory version of the code requires the management of remote exceptions.

Calcium & Skandium edit

Calcium is greatly inspired by Lithium and Muskel. As such, it provides algorithmic skeleton programming as a Java library. Both task and data parallel skeletons are fully nestable; and are instantiated via parametric skeleton objects, not inheritance.

Calcium supports the execution of skeleton applications on top of the ProActive environment for distributed cluster like infrastructure. Additionally, Calcium has three distinctive features for algorithmic skeleton programming. First, a performance tuning model which helps programmers identify code responsible for performance bugs.[8] Second, a type system for nestable skeletons which is proven to guarantee subject reduction properties and is implemented using Java Generics.[9] Third, a transparent algorithmic skeleton file access model, which enables skeletons for data intensive applications.[10]

Skandium is a complete re-implementation of Calcium for multi-core computing. Programs written on Skandium may take advantage of shared memory to simplify parallel programming.[11]

Eden edit

Eden[12] is a parallel programming language for distributed memory environments, which extends Haskell. Processes are defined explicitly to achieve parallel programming, while their communications remain implicit. Processes communicate through unidirectional channels, which connect one writer to exactly one reader. Programmers only need to specify which data a processes depends on. Eden's process model provides direct control over process granularity, data distribution and communication topology.

Eden is not a skeleton language in the sense that skeletons are not provided as language constructs. Instead, skeletons are defined on top of Eden's lower-level process abstraction, supporting both task and data parallelism. So, contrary to most other approaches, Eden lets the skeletons be defined in the same language and at the same level, the skeleton instantiation is written: Eden itself. Because Eden is an extension of a functional language, Eden skeletons are higher order functions. Eden introduces the concept of implementation skeleton, which is an architecture independent scheme that describes a parallel implementation of an algorithmic skeleton.

eSkel edit

The Edinburgh Skeleton Library (eSkel) is provided in C and runs on top of MPI. The first version of eSkel was described in,[13] while a later version is presented in.[14]

In,[15] nesting-mode and interaction-mode for skeletons are defined. The nesting-mode can be either transient or persistent, while the interaction-mode can be either implicit or explicit. Transient nesting means that the nested skeleton is instantiated for each invocation and destroyed Afterwards, while persistent means that the skeleton is instantiated once and the same skeleton instance will be invoked throughout the application. Implicit interaction means that the flow of data between skeletons is completely defined by the skeleton composition, while explicit means that data can be generated or removed from the flow in a way not specified by the skeleton composition. For example, a skeleton that produces an output without ever receiving an input has explicit interaction.

Performance prediction for scheduling and resource mapping, mainly for pipe-lines, has been explored by Benoit et al.[16][17][18][19] They provided a performance model for each mapping, based on process algebra, and determine the best scheduling strategy based on the results of the model.

More recent works have addressed the problem of adaptation on structured parallel programming,[20] in particular for the pipe skeleton.[21][22]

FastFlow edit

FastFlow is a skeletal parallel programming framework specifically targeted to the development of streaming and data-parallel applications. Being initially developed to target multi-core platforms, it has been successively extended to target heterogeneous platforms composed of clusters of shared-memory platforms,[23][24] possibly equipped with computing accelerators such as NVidia GPGPUs, Xeon Phi, Tilera TILE64. The main design philosophy of FastFlow is to provide application designers with key features for parallel programming (e.g. time-to-market, portability, efficiency and performance portability) via suitable parallel programming abstractions and a carefully designed run-time support.[25] FastFlow is a general-purpose C++ programming framework for heterogeneous parallel platforms. Like other high-level programming frameworks, such as Intel TBB and OpenMP, it simplifies the design and engineering of portable parallel applications. However, it has a clear edge in terms of expressiveness and performance with respect to other parallel programming frameworks in specific application scenarios, including, inter alia: fine-grain parallelism on cache-coherent shared-memory platforms; streaming applications; coupled usage of multi-core and accelerators. In other cases FastFlow is typically comparable to (and is some cases slightly faster than) state-of-the-art parallel programming frameworks such as Intel TBB, OpenMP, Cilk, etc.[26]

HDC edit

Higher-order Divide and Conquer (HDC)[27] is a subset of the functional language Haskell. Functional programs are presented as polymorphic higher-order functions, which can be compiled into C/MPI, and linked with skeleton implementations. The language focus on divide and conquer paradigm, and starting from a general kind of divide and conquer skeleton, more specific cases with efficient implementations are derived. The specific cases correspond to: fixed recursion depth, constant recursion degree, multiple block recursion, elementwise operations, and correspondent communications[28]

HDC pays special attention to the subproblem's granularity and its relation with the number of Available processors. The total number of processors is a key parameter for the performance of the skeleton program as HDC strives to estimate an adequate assignment of processors for each part of the program. Thus, the performance of the application is strongly related with the estimated number of processors leading to either exceeding number of subproblems, or not enough parallelism to exploit available processors.

HOC-SA edit

HOC-SA is an .
HOC-SA stands for Higher-Order Components-Service Architecture. Higher-Order Components () have the aim of simplifying Grid application development.
The objective of HOC-SA is to provide Globus users, who do not want to know about all the details of the Globus middleware (GRAM RSL documents, Web services and resource configuration etc.), with HOCs that provide a higher-level interface to the Grid than the core Globus Toolkit.
HOCs are Grid-enabled skeletons, implemented as components on top of the Globus Toolkit, remotely accessibly via Web Services.[29]

JaSkel edit

JaSkel[30] is a Java-based skeleton framework providing skeletons such as farm, pipe and heartbeat. Skeletons are specialized using inheritance. Programmers implement the abstract methods for each skeleton to provide their application specific code. Skeletons in JaSkel are provided in both sequential, concurrent and dynamic versions. For example, the concurrent farm can be used in shared memory environments (threads), but not in distributed environments (clusters) where the distributed farm should be used. To change from one version to the other, programmers must change their classes' signature to inherit from a different skeleton. The nesting of skeletons uses the basic Java Object class, and therefore no type system is enforced during the skeleton composition.

The distribution aspects of the computation are handled in JaSkel using AOP, more specifically the AspectJ implementation. Thus, JaSkel can be deployed on both cluster and Grid like infrastructures.[31] Nevertheless, a drawback of the JaSkel approach is that the nesting of the skeleton strictly relates to the deployment infrastructure. Thus, a double nesting of farm yields a better performance than a single farm on hierarchical infrastructures. This defeats the purpose of using AOP to separate the distribution and functional concerns of the skeleton program.

Lithium & Muskel edit

Lithium[32][33][34] and its successor Muskel are skeleton frameworks developed at University of Pisa, Italy. Both of them provide nestable skeletons to the programmer as Java libraries. The evaluation of a skeleton application follows a formal definition of operational semantics introduced by Aldinucci and Danelutto,[35][36] which can handle both task and data parallelism. The semantics describe both functional and parallel behavior of the skeleton language using a labeled transition system. Additionally, several performance optimization are applied such as: skeleton rewriting techniques [18, 10], task lookahead, and server-to-server lazy binding.[37]

At the implementation level, Lithium exploits macro-data flow[38][39] to achieve parallelism. When the input stream receives a new parameter, the skeleton program is processed to obtain a macro-data flow graph. The nodes of the graph are macro-data flow instructions (MDFi) which represent the sequential pieces of code provided by the programmer. Tasks are used to group together several MDFi, and are consumed by idle processing elements from a task pool. When the computation of the graph is concluded, the result is placed into the output stream and thus delivered back to the user.

Muskel also provides non-functional features such as Quality of Service (QoS);[40] security between task pool and interpreters;[41][42] and resource discovery, load balancing, and fault tolerance when interfaced with Java / Jini Parallel Framework (JJPF),[43] a distributed execution framework. Muskel also provides support for combining structured with unstructured programming[44] and recent research has addressed extensibility.[45]

Mallba edit

Mallba[46] is a library for combinatorial optimizations supporting exact, heuristic and hybrid search strategies.[47] Each strategy is implemented in Mallba as a generic skeleton which can be used by providing the required code. On the exact search algorithms Mallba provides branch-and-bound and dynamic-optimization skeletons. For local search heuristics Mallba supports: hill climbing, metropolis, simulated annealing, and tabu search; and also population based heuristics derived from evolutionary algorithms such as genetic algorithms, evolution strategy, and others (CHC). The hybrid skeletons combine strategies, such as: GASA, a mixture of genetic algorithm and simulated annealing, and CHCCES which combines CHC and ES.

The skeletons are provided as a C++ library and are not nestable but type safe. A custom MPI abstraction layer is used, NetStream, which takes care of primitive data type marshalling, synchronization, etc. A skeleton may have multiple lower-level parallel implementations depending on the target architectures: sequential, LAN, and WAN. For example: centralized master-slave, distributed master-slave, etc.

Mallba also provides state variables which hold the state of the search skeleton. The state links the search with the environment, and can be accessed to inspect the evolution of the search and decide on future actions. For example, the state can be used to store the best solution found so far, or α, β values for branch and bound pruning.[48]

Compared with other frameworks, Mallba's usage of skeletons concepts is unique. Skeletons are provided as parametric search strategies rather than parametric parallelization patterns.

Marrow edit

Marrow[49][50] is a C++ algorithmic skeleton framework for the orchestration of OpenCL computations in, possibly heterogeneous, multi-GPU environments. It provides a set of both task and data-parallel skeletons that can be composed, through nesting, to build compound computations. The leaf nodes of the resulting composition trees represent the GPU computational kernels, while the remainder nodes denote the skeleton applied to the nested sub-tree. The framework takes upon itself the entire host-side orchestration required to correctly execute these trees in heterogeneous multi-GPU environments, including the proper ordering of the data-transfer and of the execution requests, and the communication required between the tree's nodes.

Among Marrow's most distinguishable features are a set of skeletons previously unavailable in the GPU context, such as Pipeline and Loop, and the skeleton nesting ability – a feature also new in this context. Moreover, the framework introduces optimizations that overlap communication and computation, hence masking the latency imposed by the PCIe bus.

The parallel execution of a Marrow composition tree by multiple GPUs follows a data-parallel decomposition strategy, that concurrently applies the entire computational tree to different partitions of the input dataset. Other than expressing which kernel parameters may be decomposed and, when required, defining how the partial results should be merged, the programmer is completely abstracted from the underlying multi-GPU architecture.

More information, as well as the source code, can be found at the Marrow website

Muesli edit

The Muenster Skeleton Library Muesli[51][52] is a C++ template library which re-implements many of the ideas and concepts introduced in Skil, e.g. higher order functions, currying, and polymorphic types . It is built on top of MPI 1.2 and OpenMP 2.5 and supports, unlike many other skeleton libraries, both task and data parallel skeletons. Skeleton nesting (composition) is similar to the two tier approach of P3L, i.e. task parallel skeletons can be nested arbitrarily while data parallel skeletons cannot, but may be used at the leaves of a task parallel nesting tree.[53] C++ templates are used to render skeletons polymorphic, but no type system is enforced. However, the library implements an automated serialization mechanism inspired by[54] such that, in addition to the standard MPI data types, arbitrary user-defined data types can be used within the skeletons. The supported task parallel skeletons[55] are Branch & Bound,[56] Divide & Conquer,[57][58] Farm,[59][60] and Pipe, auxiliary skeletons are Filter, Final, and Initial. Data parallel skeletons, such as fold (reduce), map, permute, zip, and their variants are implemented as higher order member functions of a distributed data structure. Currently, Muesli supports distributed data structures for arrays, matrices, and sparse matrices.[61]

As a unique feature, Muesli's data parallel skeletons automatically scale both on single- as well as on multi-core, multi-node cluster architectures.[62][63] Here, scalability across nodes and cores is ensured by simultaneously using MPI and OpenMP, respectively. However, this feature is optional in the sense that a program written with Muesli still compiles and runs on a single-core, multi-node cluster computer without changes to the source code, i.e. backward compatibility is guaranteed. This is ensured by providing a very thin OpenMP abstraction layer such that the support of multi-core architectures can be switched on/off by simply providing/omitting the OpenMP compiler flag when compiling the program. By doing so, virtually no overhead is introduced at runtime.

P3L, SkIE, SKElib edit

P3L[64] (Pisa Parallel Programming Language) is a skeleton based coordination language. P3L provides skeleton constructs which are used to coordinate the parallel or sequential execution of C code. A compiler named Anacleto[65] is provided for the language. Anacleto uses implementation templates to compile P3 L code into a target architecture. Thus, a skeleton can have several templates each optimized for a different architecture. A template implements a skeleton on a specific architecture and provides a parametric process graph with a performance model. The performance model can then be used to decide program transformations which can lead to performance optimizations.[66]

A P3L module corresponds to a properly defined skeleton construct with input and output streams, and other sub-modules or sequential C code. Modules can be nested using the two tier model, where the outer level is composed of task parallel skeletons, while data parallel skeletons may be used in the inner level [64]. Type verification is performed at the data flow level, when the programmer explicitly specifies the type of the input and output streams, and by specifying the flow of data between sub-modules.

SkIE[67] (Skeleton-based Integrated Environment) is quite similar to P3L, as it is also based on a coordination language, but provides advanced features such as debugging tools, performance analysis, visualization and graphical user interface. Instead of directly using the coordination language, programmers interact with a graphical tool, where parallel modules based on skeletons can be composed.

SKELib[68] builds upon the contributions of P3L and SkIE by inheriting, among others, the template system. It differs from them because a coordination language is no longer used, but instead skeletons are provided as a library in C, with performance similar as the one achieved in P3L. Contrary to Skil, another C like skeleton framework, type safety is not addressed in SKELib.

PAS and EPAS edit

PAS (Parallel Architectural Skeletons) is a framework for skeleton programming developed in C++ and MPI.[69][70] Programmers use an extension of C++ to write their skeleton applications1 . The code is then passed through a Perl script which expands the code to pure C++ where skeletons are specialized through inheritance.

In PAS, every skeleton has a Representative (Rep) object which must be provided by the programmer and is in charge of coordinating the skeleton's execution. Skeletons can be nested in a hierarchical fashion via the Rep objects. Besides the skeleton's execution, the Rep also explicitly manages the reception of data from the higher level skeleton, and the sending of data to the sub-skeletons. A parametrized communication/synchronization protocol is used to send and receive data between parent and sub-skeletons.

An extension of PAS labeled as SuperPas[71] and later as EPAS[72] addresses skeleton extensibility concerns. With the EPAS tool, new skeletons can be added to PAS. A Skeleton Description Language (SDL) is used to describe the skeleton pattern by specifying the topology with respect to a virtual processor grid. The SDL can then be compiled into native C++ code, which can be used as any other skeleton.

SBASCO edit

SBASCO (Skeleton-BAsed Scientific COmponents) is a programming environment oriented towards efficient development of parallel and distributed numerical applications.[73] SBASCO aims at integrating two programming models: skeletons and components with a custom composition language. An application view of a component provides a description of its interfaces (input and output type); while a configuration view provides, in addition, a description of the component's internal structure and processor layout. A component's internal structure can be defined using three skeletons: farm, pipe and multi-block.

SBASCO's addresses domain decomposable applications through its multi-block skeleton. Domains are specified through arrays (mainly two dimensional), which are decomposed into sub-arrays with possible overlapping boundaries. The computation then takes place in an iterative BSP like fashion. The first stage consists of local computations, while the second stage performs boundary exchanges. A use case is presented for a reaction-diffusion problem in.[74]

Two type of components are presented in.[75] Scientific Components (SC) which provide the functional code; and Communication Aspect Components (CAC) which encapsulate non-functional behavior such as communication, distribution processor layout and replication. For example, SC components are connected to a CAC component which can act as a manager at runtime by dynamically re-mapping processors assigned to a SC. A use case showing improved performance when using CAC components is shown in.[76]

SCL edit

The Structured Coordination Language (SCL)[77] was one of the earliest skeleton programming languages. It provides a co-ordination language approach for skeleton programming over software components. SCL is considered a base language, and was designed to be integrated with a host language, for example Fortran or C, used for developing sequential software components. In SCL, skeletons are classified into three types: configuration, elementary and computation. Configuration skeletons abstract patterns for commonly used data structures such as distributed arrays (ParArray). Elementary skeletons correspond to data parallel skeletons such as map, scan, and fold. Computation skeletons which abstract the control flow and correspond mainly to task parallel skeletons such as farm, SPMD, and iterateUntil. The coordination language approach was used in conjunction with performance models for programming traditional parallel machines as well as parallel heterogeneous machines that have different multiple cores on each processing node.[78]

SkePU edit

SkePU[79] SkePU is a skeleton programming framework for multicore CPUs and multi-GPU systems. It is a C++ template library with six data-parallel and one task-parallel skeletons, two container types, and support for execution on multi-GPU systems both with CUDA and OpenCL. Recently, support for hybrid execution, performance-aware dynamic scheduling and load balancing is developed in SkePU by implementing a backend for the StarPU runtime system. SkePU is being extended for GPU clusters.

SKiPPER & QUAFF edit

SKiPPER is a domain specific skeleton library for vision applications[80] which provides skeletons in CAML, and thus relies on CAML for type safety. Skeletons are presented in two ways: declarative and operational. Declarative skeletons are directly used by programmers, while their operational versions provide an architecture specific target implementation. From the runtime environment, CAML skeleton specifications, and application specific functions (provided in C by the programmer), new C code is generated and compiled to run the application on the target architecture. One of the interesting things about SKiPPER is that the skeleton program can be executed sequentially for debugging.

Different approaches have been explored in SKiPPER for writing operational skeletons: static data-flow graphs, parametric process networks, hierarchical task graphs, and tagged-token data-flow graphs.[81]

QUAFF[82] is a more recent skeleton library written in C++ and MPI. QUAFF relies on template-based meta-programming techniques to reduce runtime overheads and perform skeleton expansions and optimizations at compilation time. Skeletons can be nested and sequential functions are stateful. Besides type checking, QUAFF takes advantage of C++ templates to generate, at compilation time, new C/MPI code. QUAFF is based on the CSP-model, where the skeleton program is described as a process network and production rules (single, serial, par, join).[83]

SkeTo edit

The SkeTo[84] project is a C++ library which achieves parallelization using MPI. SkeTo is different from other skeleton libraries because instead of providing nestable parallelism patterns, SkeTo provides parallel skeletons for parallel data structures such as: lists, trees,[85][86] and matrices.[87] The data structures are typed using templates, and several parallel operations can be invoked on them. For example, the list structure provides parallel operations such as: map, reduce, scan, zip, shift, etc...

Additional research around SkeTo has also focused on optimizations strategies by transformation, and more recently domain specific optimizations.[88] For example, SkeTo provides a fusion transformation[89] which merges two successive function invocations into a single one, thus decreasing the function call overheads and avoiding the creation of intermediate data structures passed between functions.

Skil edit

Skil[90] is an imperative language for skeleton programming. Skeletons are not directly part of the language but are implemented with it. Skil uses a subset of C language which provides functional language like features such as higher order functions, curring and polymorphic types. When Skil is compiled, such features are eliminated and a regular C code is produced. Thus, Skil transforms polymorphic high order functions into monomorphic first order C functions. Skil does not support nestable composition of skeletons. Data parallelism is achieved using specific data parallel structures, for example to spread arrays among available processors. Filter skeletons can be used.

STAPL Skeleton Framework edit

In STAPL Skeleton Framework [91][92] skeletons are defined as parametric data flow graphs, letting them scale beyond 100,000 cores. In addition, this framework addresses composition of skeletons as point-to-point composition of their corresponding data flow graphs through the notion of ports, allowing new skeletons to be easily added to the framework. As a result, this framework eliminate the need for reimplementation and global synchronizations in composed skeletons. STAPL Skeleton Framework supports nested composition and can switch between parallel and sequential execution in each level of nesting. This framework benefits from scalable implementation of STAPL parallel containers[93] and can run skeletons on various containers including vectors, multidimensional arrays, and lists.

T4P edit

T4P was one of the first systems introduced for skeleton programming.[94] The system relied heavily on functional programming properties, and five skeletons were defined as higher order functions: Divide-and-Conquer, Farm, Map, Pipe and RaMP. A program could have more than one implementation, each using a combination of different skeletons. Furthermore, each skeleton could have different parallel implementations. A methodology based on functional program transformations guided by performance models of the skeletons was used to select the most appropriate skeleton to be used for the program as well as the most appropriate implementation of the skeleton.[95]

Frameworks comparison edit

  • Activity years is the known activity years span. The dates represented in this column correspond to the first and last publication date of a related article in a scientific journal or conference proceeding. Note that a project may still be active beyond the activity span, and that we have failed to find a publication for it beyond the given date.
  • Programming language is the interface with which programmers interact to code their skeleton applications. These languages are diverse, encompassing paradigms such as: functional languages, coordination languages, markup languages, imperative languages, object-oriented languages, and even graphical user interfaces. Inside the programming language, skeletons have been provided either as language constructs or libraries. Providing skeletons as language construct implies the development of a custom domain specific language and its compiler. This was clearly the stronger trend at the beginning of skeleton research. The more recent trend is to provide skeletons as libraries, in particular with object-oriented languages such as C++ and Java.
  • Execution language is the language in which the skeleton applications are run or compiled. It was recognized very early that the programming languages (specially in the functional cases), were not efficient enough to execute the skeleton programs. Therefore, skeleton programming languages were simplified by executing skeleton application on other languages. Transformation processes were introduced to convert the skeleton applications (defined in the programming language) into an equivalent application on the target execution language. Different transformation processes were introduced, such as code generation or instantiation of lowerlevel skeletons (sometimes called operational skeletons) which were capable of interacting with a library in the execution language. The transformed application also gave the opportunity to introduce target architecture code, customized for performance, into the transformed application. Table 1 shows that a favorite for execution language has been the C language.
  • Distribution library provides the functionality to achieve parallel/distributed computations. The big favorite in this sense has been MPI, which is not surprising since it integrates well with the C language, and is probably the most used tool for parallelism in cluster computing. The dangers of directly programming with the distribution library are, of course, safely hidden away from the programmers who never interact with the distribution library. Recently, the trend has been to develop skeleton frameworks capable of interacting with more than one distribution library. For example, CO2 P3 S can use Threads, RMI or Sockets; Mallba can use Netstream or MPI; or JaSkel which uses AspectJ to execute the skeleton applications on different skeleton frameworks.
  • Type safety refers to the capability of detecting type incompatibility errors in skeleton program. Since the first skeleton frameworks were built on functional languages such as Haskell, type safety was simply inherited from the host language. Nevertheless, as custom languages were developed for skeleton programming, compilers had to be written to take type checking into consideration; which was not as difficult as skeleton nesting was not fully supported. Recently however, as we began to host skeleton frameworks on object-oriented languages with full nesting, the type safety issue has resurfaced. Unfortunately, type checking has been mostly overlooked (with the exception of QUAFF), and specially in Java based skeleton frameworks.
  • Skeleton nesting is the capability of hierarchical composition of skeleton patterns. Skeleton Nesting was identified as an important feature in skeleton programming from the very beginning, because it allows the composition of more complex patterns starting from a basic set of simpler patterns. Nevertheless, it has taken the community a long time to fully support arbitrary nesting of skeletons, mainly because of the scheduling and type verification difficulties. The trend is clear that recent skeleton frameworks support full nesting of skeletons.
  • File access is the capability to access and manipulate files from an application. In the past, skeleton programming has proven useful mostly for computational intensive applications, where small amounts of data require big amounts of computation time. Nevertheless, many distributed applications require or produce large amounts of data during their computation. This is the case for astrophysics, particle physics, bio-informatics, etc. Thus, providing file transfer support that integrates with skeleton programming is a key concern which has been mostly overlooked.
  • Skeleton set is the list of supported skeleton patterns. Skeleton sets vary greatly from one framework to the other, and more shocking, some skeletons with the same name have different semantics on different frameworks. The most common skeleton patterns in the literature are probably farm, pipe, and map.
Non-object oriented algorithmic skeleton frameworks
Activity years Programming language Execution language Distribution library Type safe Skeleton nesting File access Skeleton set
ASSIST 2004–2007 Custom control language C++ TCP/IP + ssh/scp Yes No explicit seq, parmod
SBSACO 2004–2006 Custom composition language C++ MPI Yes Yes No farm, pipe, multi-block
eSkel 2004–2005 C C MPI No ? No pipeline, farm, deal, butterfly, hallowSwap
HDC 2004–2005 Haskell subset C MPI Yes ? No dcA, dcB, dcD, dcE, dcF, map, red, scan, filter
SKELib 2000-2000 C C MPI No No No farm, pipe
SkiPPER 1999–2002 CAML C SynDex Yes limited No scm, df, tf, intermem
SkIE 1999-1999 GUI/Custom control language C++ MPI Yes limited No pipe, farm, map, reduce, loop
Eden 1997–2011 Haskell extension Haskell PVM/MPI Yes Yes No map, farm, workpool, nr, dc, pipe, iterUntil, torus, ring
P3L 1995–1998 Custom control language C MPI Yes limited No map, reduce, scan, comp, pipe, farm, seq, loop
Skil 1995–1998 C subset C ? Yes No No pardata, map, fold
SCL 1994–1999 Custom control language Fortran/C MPI Yes limited No map, scan, fold, farm, SPMD, iterateUntil
T4P 1990–1994 Hope+ Hope+ CSTools Yes limited No D&C (Divide-and-Conquer), Map, Pipe, RaMP
Object-oriented algorithmic skeleton frameworks
Activity years Programming language Execution language Distribution library Type safe Skeleton nesting File access Skeleton set
Skandium 2009–2012 Java Java Threads Yes Yes No seq, pipe, farm, for, while, map, d&c, fork
FastFlow 2009– C++ C++11 / CUDA / OpenCL C++11 threads / Posix threads / TCP-IP / OFED-IB / CUDA / OpenCL Yes Yes Yes Pipeline, Farm, ParallelFor, ParallelForReduce, MapReduce, StencilReduce, PoolEvolution, MacroDataFlow
Calcium 2006–2008 Java Java ProActive Yes Yes Yes seq, pipe, farm, for, while, map, d&c, fork
QUAFF 2006–2007 C++ C MPI Yes Yes No seq, pipe, farm, scm, pardo
JaSkel 2006–2007 Java Java/AspectJ MPP / RMI No Yes No farm, pipeline, heartbeat
Muskel 2005–2008 Java Java RMI No Yes No farm, pipe, seq, + custom MDF Graphs
HOC-SA 2004–2008 Java Java Globus, KOALA No No No farm, pipeline, wavefront
SkeTo 2003–2013 C++ C++ MPI Yes No No list, matrix, tree
Mallba 2002–2007 C++ C++ NetStream / MPI Yes No No exact, heuristic, hybrid
Marrow 2013– C++ C++ plus OpenCL (none) No Yes No data parallel: map, map-reduce. task parallel: pipeline, loop, for
Muesli 2002–2013 C++ C++ MPI / OpenMP Yes Yes No data parallel: fold, map, permute, scan, zip, and variants. task parallel: branch & bound, divide & conquer, farm, pipe. auxiliary: filter, final, initial
Alt 2002–2003 Java/GworkflowDL Java Java RMI Yes No No map, zip, reduction, scan, dh, replicate, apply, sort
(E)PAS 1999–2005 C++ extension C++ MPI No Yes No singleton, replication, compositional, pipeline, divideconquer, dataparallel
Lithium 1999–2004 Java Java RMI No Yes No pipe, map, farm, reduce
CO2P3S 1999–2003 GUI/Java Java (generated) Threads / RMI / Sockets Partial No No method-sequence, distributor, mesh, wavefront
STAPL 2010– C++ C++11 STAPL Runtime Library( MPI, OpenMP, PThreads) Yes Yes Yes map, zip<arity>, reduce, scan, farm, (reverse-)butterfly, (reverse-)tree<k-ary>, recursive-doubling, serial, transpose, stencil<n-dim>, wavefront<n-dim>, allreduce, allgather, gather, scatter, broadcast

Operators: compose, repeat, do-while, do-all, do-across

See also edit

References edit

  1. ^ K. Hammond and G. Michelson, editors. "Research Directions in Parallel Functional Programming." Springer-Verlag, London, UK, 1999.
  2. ^ Vanneschi, M. (2002). "The programming model of ASSIST, an environment for parallel and distributed portable applications". Parallel Computing. 28 (12): 1709–1732. CiteSeerX 10.1.1.59.5543. doi:10.1016/S0167-8191(02)00188-6.
  3. ^ M. Aldinucci, M. Coppola, M. Danelutto, N. Tonellotto, M. Vanneschi, and C. Zoccolo. "High level grid programming with ASSIST." Computational Methods in Science and Technology, 12(1):21–32, 2006.
  4. ^ M. Aldinucci and M. Torquati. Accelerating apache farms through ad hoc distributed scalable object repository. In Proc. of 10th Intl. Euro-Par 2004 Parallel Processing, volume 3149 of LNCS, pages 596–605. Springer, 2004.
  5. ^ Aldinucci, M.; Danelutto, M.; Antoniu, G.; Jan, M. (2008). "Fault-Tolerant Data Sharing for High-level Grid: A Hierarchical Storage Architecture". Achievements in European Research on Grid Systems. p. 67. doi:10.1007/978-0-387-72812-4_6. ISBN 978-0-387-72811-7.
  6. ^ 'S. MacDonald, J. Anvik, S. Bromling, J. Schaeffer, D. Szafron, and K. Tan.' "From patterns to frameworks to parallel programs." Parallel Comput., 28(12):1663–1683, 2002.
  7. ^ K. Tan, D. Szafron, J. Schaeffer, J. Anvik, and S. MacDonald. "Using generative design patterns to generate parallel code for a distributed memory environment." In PPoPP '03: Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 203–215, New York, NY, USA, 2003. ACM.
  8. ^ D. Caromel and M. Leyton. "Fine tuning algorithmic skeletons." In 13th International Euro-Par Conference: Parallel Processing, volume 4641 of Lecture Notes in Computer Science, pages 72–81. Springer-Verlag, 2007.
  9. ^ D. Caromel, L. Henrio, and M. Leyton. "Type safe algorithmic skeletons." In Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-based Processing, pages 45–53, Toulouse, France, Feb. 2008. IEEE CS Press.
  10. ^ D. Caromel and M. Leyton. "A transparent non-invasive file data model for algorithmic skeletons." In 22nd International Parallel and Distributed Processing Symposium (IPDPS), pages 1–8, Miami, USA, March 2008. IEEE Computer Society.
  11. ^ Mario Leyton, Jose M. Piquer. "Skandium: Multi-core Programming with algorithmic skeletons", IEEE Euro-micro PDP 2010.
  12. ^ Rita Loogen and Yolanda Ortega-Mallén and Ricardo Peña-Marí. "Parallel Functional Programming in Eden", Journal of Functional Programming, No. 15(2005),3, pages 431–475
  13. ^ Murray Cole. "Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming." Parallel Computing, 30(3):389–406, 2004.
  14. ^ A. Benoit, M. Cole, S. Gilmore, and J. Hillston. "Flexible skeletal programming with eskel." In J. C. Cunha and P. D. Medeiros, editors, Euro-Par, volume 3648 of Lecture Notes in Computer Science, pages 761–770. Springer, 2005.
  15. ^ A. Benoit and M. Cole. "Two fundamental concepts in skeletal parallel programming." In V. Sunderam, D. van Albada, P. Sloot, and J. Dongarra, editors, The International Confer-ence on Computational Science (ICCS 2005), Part II, LNCS 3515, pages 764–771. Springer Verlag, 2005.
  16. ^ A. Benoit, M. Cole, S. Gilmore, and J. Hillston. Evaluating the performance of skeleton-based high level parallel programs. In M. Bubak, D. van Albada, P. Sloot, and J. Dongarra, editors, The International Conference on Computational Science (ICCS 2004), Part III, LNCS 3038, pages 289–296. Springer Verlag, 2004.
  17. ^ A. Benoit, M. Cole, S. Gilmore, and J. Hillston. "Evaluating the performance of pipeline structured parallel programs with skeletons and process algebra." Scalable Computing: Practice and Experience, 6(4):1–16, December 2005.
  18. ^ A. Benoit, M. Cole, S. Gilmore, and J. Hillston. "Scheduling skeleton-based grid applications using pepa and nws." The Computer Journal, Special issue on Grid Performability Modelling and Measurement, 48(3):369–378, 2005.
  19. ^ A. Benoit and Y. Robert. "Mapping pipeline skeletons onto heterogeneous platforms." In ICCS 2007, the 7th International Conference on Computational Science, LNCS 4487, pages 591–598. Springer Verlag, 2007.
  20. ^ G. Yaikhom, M. Cole, S. Gilmore, and J. Hillston. "A structural approach for modelling performance of systems using skeletons." Electr. Notes Theor. Comput. Sci., 190(3):167–183, 2007.
  21. ^ H. Gonzalez-Velez and M. Cole. "Towards fully adaptive pipeline parallelism for heterogeneous distributed environments." In Parallel and Distributed Processing and Applications, 4th International Symposium (ISPA), Lecture Notes in Computer Science, pages 916–926. Springer-Verlag, 2006.
  22. ^ H. Gonzalez-Velez and M. Cole. "Adaptive structured parallelism for computational grids." In PPoPP '07: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 140–141, New York, NY, USA, 2007. ACM.
  23. ^ Aldinucci, M.; Campa, S.; Danelutto, M.; Kilpatrick, P.; Torquati, M. (2013). "Targeting Distributed Systems in FastFlow" (PDF). Euro-Par 2012: Parallel Processing Workshops. Euro-Par 2012: Parallel Processing Workshops. Lecture Notes in Computer Science. Vol. 7640. pp. 47–56. doi:10.1007/978-3-642-36949-0_7. ISBN 978-3-642-36948-3.
  24. ^ Aldinucci, M.; Spampinato, C.; Drocco, M.; Torquati, M.; Palazzo, S. (2012). "A parallel edge preserving algorithm for salt and pepper image denoising". 2012 3rd International Conference on Image Processing Theory, Tools and Applications (IPTA). 3rd International Conference on Image Processing Theory, Tools and Applications (IPTA). pp. 97–104. doi:10.1109/IPTA.2012.6469567. hdl:2318/154520.
  25. ^ Aldinucci, M.; Danelutto, M.; Kilpatrick, P.; Meneghin, M.; Torquati, M. (2012). "An Efficient Unbounded Lock-Free Queue for Multi-core Systems". Euro-Par 2012 Parallel Processing. Euro-Par 2012 Parallel Processing. Lecture Notes in Computer Science. Vol. 7484. pp. 662–673. doi:10.1007/978-3-642-32820-6_65. hdl:2318/121343. ISBN 978-3-642-32819-0.
  26. ^ Aldinucci, M.; Meneghin, M.; Torquati, M. (2010). "Efficient Smith-Waterman on Multi-core with Fast Flow". 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing. IEEE. p. 195. CiteSeerX 10.1.1.163.9092. doi:10.1109/PDP.2010.93. ISBN 978-1-4244-5672-7. S2CID 1925361.
  27. ^ C. A. Herrmann and C. Lengauer. "HDC: A higher-order language for divide-and-conquer." Parallel Processing Letters, 10(2–3):239–250, 2000.
  28. ^ C. A. Herrmann. The Skeleton-Based Parallelization of Divide-and-Conquer Recursions. PhD thesis, 2000. ISBN 3-89722-556-5".
  29. ^ J. Dünnweber, S. Gorlatch. "Higher-Order Components for Grid Programming. Making Grids More Usable. ". Springer-Verlag, 2009. ISBN 978-3-642-00840-5
  30. ^ J. F. Ferreira, J. L. Sobral, and A. J. Proenca. "Jaskel: A java skeleton-based framework for structured cluster and grid computing". In CCGRID '06: Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid, pages 301–304, Washington, DC, USA, 2006. IEEE Computer Society.
  31. ^ J. Sobral and A. Proenca. "Enabling jaskel skeletons for clusters and computational grids." In IEEE Cluster. IEEE Press, 9 2007.
  32. ^ M. Aldinucci and M. Danelutto. "Stream parallel skeleton optimization." In Proc. of PDCS: Intl. Conference on Parallel and Distributed Computing and Systems, pages 955–962, Cambridge, Massachusetts, USA, Nov. 1999. IASTED, ACTA press.
  33. ^ Aldinucci, M.; Danelutto, M.; Teti, P. (2003). "An advanced environment supporting structured parallel programming in Java". Future Generation Computer Systems. 19 (5): 611. CiteSeerX 10.1.1.59.3748. doi:10.1016/S0167-739X(02)00172-3.
  34. ^ M. Danelutto and P. Teti. "Lithium: A structured parallel programming environment in Java." In Proc. of ICCS: International Conference on Computational Science, volume 2330 of LNCS, pages 844–853. Springer Verlag, Apr. 2002.
  35. ^ M. Aldinucci and M. Danelutto. "An operational semantics for skeletons." In G. R. Joubert, W. E. Nagel, F. J. Peters, and W. V. Walter, editors, Parallel Computing: Software Technology, Algorithms, Architectures and Applications, PARCO 2003, volume 13 of Advances in Parallel Computing, pages 63–70, Dresden, Germany, 2004. Elsevier.
  36. ^ Aldinucci, M.; Danelutto, M. (2007). "Skeleton-based parallel programming: Functional and parallel semantics in a single shot☆". Computer Languages, Systems & Structures. 33 (3–4): 179. CiteSeerX 10.1.1.164.368. doi:10.1016/j.cl.2006.07.004.
  37. ^ M. Aldinucci, M. Danelutto, and J. Dünnweber. "Optimization techniques for implementing parallel skeletons in grid environments." In S. Gorlatch, editor, Proc. of CMPP: Intl. Workshop on Constructive Methods for Parallel Programming, pages 35–47, Stirling, Scotland, UK, July 2004. Universität Munster, Germany.
  38. ^ M. Danelutto. Efficient support for skeletons on workstation clusters. Parallel Processing Letters, 11(1):41–56, 2001.
  39. ^ M. Danelutto. "Dynamic run time support for skeletons." Technical report, 1999.
  40. ^ M. Danelutto. "Qos in parallel programming through application managers." In PDP '05: Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP'05), pages 282–289, Washington, DC, USA, 2005. IEEE Computer Society.
  41. ^ M. Aldinucci and M. Danelutto. "The cost of security in skeletal systems." In P. D'Ambra and M. R. Guarracino, editors, Proc. of Intl. Euromicro PDP 2007: Parallel Distributed and network-based Processing, pages 213–220, Napoli, Italia, February 2007. IEEE.
  42. ^ M. Aldinucci and M. Danelutto. "Securing skeletal systems with limited performance penalty: the muskel experience." Journal of Systems Architecture, 2008.
  43. ^ M. Danelutto and P. Dazzi. "A Java/Jini framework supporting stream parallel computations." In Proc. of Intl. PARCO 2005: Parallel Computing, Sept. 2005.
  44. ^ M. Danelutto and P. Dazzi. "Joint structured/non-structured parallelism exploitation through data flow." In V. Alexandrov, D. van Albada, P. Sloot, and J. Dongarra, editors, Proc. of ICCS: International Conference on Computational Science, Workshop on Practical Aspects of High-level Parallel Programming, LNCS, Reading, UK, May 2006. Springer Verlag.
  45. ^ M. Aldinucci, M. Danelutto, and P. Dazzi. "Muskel: an expandable skeleton environment." Scalable Computing: Practice and Experience, 8(4):325–341, December 2007.
  46. ^ E. Alba, F. Almeida, M. J. Blesa, J. Cabeza, C. Cotta, M. Diaz, I. Dorta, J. Gabarro, C. Leon, J. Luna, L. M. Moreno, C. Pablos, J. Petit, A. Rojas, and F. Xhafa. "Mallba: A library of skeletons for combinatorial optimisation (research note)." In Euro-Par '02: Proceedings of the 8th International Euro-Par Conference on Parallel Processing, pages 927–932, London, UK, 2002. Springer-Verlag.
  47. ^ E. Alba, F. Almeida, M. Blesa, C. Cotta, M. Diaz, I. Dorta, J. Gabarro, C. Leon, G. Luque, J. Petit, C. Rodriguez, A. Rojas, and F. Xhafa. Efficient parallel lan/wan algorithms for optimization: the mallba project. Parallel Computing, 32(5):415–440, 2006.
  48. ^ E. Alba, G. Luque, J. Garcia-Nieto, G. Ordonez, and G. Leguizamon. "Mallba a software library to design efficient optimisation algorithms." International Journal of Innovative Computing and Applications, 1(1):74–85, 2007.
  49. ^ "Ricardo Marques, Hervé Paulino, Fernando Alexandre, Pedro D. Medeiros." "Algorithmic Skeleton Framework for the Orchestration of GPU Computations." Euro-Par 2013: 874–885
  50. ^ "Fernando Alexandre, Ricardo Marques, Hervé Paulino." "On the Support of Task-Parallel Algorithmic Skeletons for Multi-GPU Computing." ACM SAC 2014: 880–885
  51. ^ H. Kuchen and J. Striegnitz. "Features from functional programming for a C++ skeleton library". Concurrency – Practice and Experience, 17(7–8):739–756, 2005.
  52. ^ Philipp Ciechanowicz, Michael Poldner, and Herbert Kuchen. "The Muenster Skeleton Library Muesli – A Comprehensive Overview." ERCIS Working Paper No. 7, 2009
  53. ^ H. Kuchen and M. Cole. "The integration of task and data parallel skeletons." Parallel Processing Letters, 12(2):141–155, 2002.
  54. ^ A. Alexandrescu. "Modern C++ Design: Generic Programming and Design Patterns Applied". Addison-Wesley, 2001.
  55. ^ Michael Poldner. "Task Parallel Algorithmic Skeletons." PhD Thesis, University of Münster, 2008.
  56. ^ Michael Poldner and Herbert Kuchen. "Algorithmic Skeletons for Branch and Bound." Proceedings of the 1st International Conference on Software and Data Technology (ICSOFT), 1:291–300, 2006.
  57. ^ Michael Poldner and Herbert Kuchen. "Optimizing Skeletal Stream Processing for Divide and Conquer." Proceedings of the 3rd International Conference on Software and Data Technologies (ICSOFT), 181–189, 2008.
  58. ^ Michael Poldner and Herbert Kuchen. "Skeletons for Divide and Conquer." Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN), 181–188, 2008.
  59. ^ Michael Poldner and Herbert Kuchen. "Scalable Farms." Proceedings of the International Conference on Parallel Processing (ParCo) 33:795–802, 2006.
  60. ^ Michael Poldner and Herbert Kuchen. "On Implementing the Farm Skeleton." Parallel Processing Letters, 18(1):117–131, 2008.
  61. ^ Philipp Ciechanowicz. "Algorithmic Skeletons for General Sparse Matrices." Proceedings of the 20th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS), 188–197, 2008.
  62. ^ Philipp Ciechanowicz, Philipp Kegel, Maraike Schellmann, Sergei Gorlatch, and Herbert Kuchen. "Parallelizing the LM OSEM Image Reconstruction on Multi-Core Clusters." Parallel Computing: From Multicores and GPU's to Petascale, 19: 169–176, 2010.
  63. ^ Philipp Ciechanowicz and Herbert Kuchen. "Enhancing Muesli's Data Parallel Skeletons for Multi-Core Computer Architectures". International Conference on High Performance Computing and Communications (HPCC), 108–113, 2010.
  64. ^ Bacci, B.; Danelutto, M.; Orlando, S.; Pelagatti, S.; Vanneschi, M. (1995). "P3L: A structured high-level parallel language, and its structured support". Concurrency: Practice and Experience. 7 (3): 225. CiteSeerX 10.1.1.215.6425. doi:10.1002/cpe.4330070305.
  65. ^ S. Ciarpaglini, M. Danelutto, L. Folchi, C. Manconi, and S. Pelagatti. "ANACLETO: a template-based p3l compiler." In Proceedings of the Seventh Parallel Computing Workshop (PCW '97), Australian National University, Canberra, August 1997.
  66. ^ M. Aldinucci, M. Coppola, and M. Danelutto. Rewriting skeleton programs: How to evaluate the data-parallel stream-parallel tradeoff. In S. Gorlatch, editor, Proc of CMPP: Intl. Workshop on Constructive Methods for Parallel Programming, pages 44–58. Uni. Passau, Germany, May 1998.
  67. ^ B. Bacci, M. Danelutto, S. Pelagatti, and M. Vanneschi. "Skie: a heterogeneous environment for HPC applications." Parallel Comput., 25(13–14):1827–1852, 1999.
  68. ^ M. Danelutto and M. Stigliani. "Skelib: Parallel programming with skeletons in C." In Euro-Par '00: Proceedings from the 6th International Euro-Par Conference on Parallel Processing, pages 1175–1184, London, UK, 2000. Springer-Verlag.
  69. ^ D. Goswami, A. Singh, and B. R. Preiss. "From design patterns to parallel architectural skeletons." J. Parallel Distrib. Comput., 62(4):669–695, 2002. doi:10.1006/jpdc.2001.1809
  70. ^ D. Goswami, A. Singh, and B. R. Preiss. "Using object-oriented techniques for realizing parallel architectural skeletons." In ISCOPE '99: Proceedings of the Third International Symposium on Computing in Object-Oriented Parallel Environments, Lecture Notes in Computer Science, pages 130–141, London, UK, 1999. Springer-Verlag.
  71. ^ M. M. Akon, D. Goswami, and H. F. Li. "Superpas: A parallel architectural skeleton model supporting extensibility and skeleton composition." In Parallel and Distributed Processing and Applications Second International Symposium, ISPA, Lecture Notes in Computer Science, pages 985–996. Springer-Verlag, 2004.
  72. ^ M. M. Akon, A. Singh, D. Goswami, and H. F. Li. "Extensible parallel architectural skeletons." In High Performance Computing HiPC 2005, 12th International Conference, volume 3769 of Lecture Notes in Computer Science, pages 290–301, Goa, India, December 2005. Springer-Verlag.
  73. ^ M. Diaz, B. Rubio, E. Soler, and J. M. Troya. "SBASCO: Skeleton-based scientific components." In PDP, pages 318–. IEEE Computer Society, 2004.
  74. ^ M. Diaz, S. Romero, B. Rubio, E. Soler, and J. M. Troya. "Using SBASCO to solve reaction-diffusion equations in two-dimensional irregular domains." In Practical Aspects of High-Level Parallel Programming (PAPP), affiliated to the International Conference on Computational Science (ICCS), volume 3992 of Lecture Notes in Computer Science, pages 912–919. Springer, 2006.
  75. ^ M. Diaz, S. Romero, B. Rubio, E. Soler, and J. M. Troya. "An aspect oriented framework for scientific component development." In PDP '05: Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing, pages 290–296, Washington, DC, USA, 2005. IEEE Computer Society.
  76. ^ M. Diaz, S. Romero, B. Rubio, E. Soler, and J. M. Troya. "Dynamic reconfiguration of scientific components using aspect oriented programming: A case study." In R. Meersman And Z. Tari, editors, On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE, volume 4276 of Lecture Notes in Computer Science, pages 1351–1360. Springer-Verlag, 2006.
  77. ^ J. Darlington, Y. ke Guo, H. W. To, and J. Yang. "Parallel skeletons for structured composition." In PPOPP '95: Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 19–28, New York, NY, USA, 1995. ACM.
  78. ^ John Darlinton; Moustafa Ghanem; Yike Guo; Hing Wing To (1996), "Guided Resource Organisation in Heterogeneous Parallel Computing", Journal of High Performance Computing, 4 (1): 13–23, CiteSeerX 10.1.1.37.4309
  79. ^ "SkePU".
  80. ^ J. Serot, D. Ginhac, and J. Derutin. "SKiPPER: a skeleton-based parallel programming environment for real-time image processing applications." In V. Malyshkin, editor, 5th International Conference on Parallel Computing Technologies (PaCT-99), volume 1662 of LNCS, pages 296–305. Springer, 6–10 September 1999.
  81. ^ J. Serot and D. Ginhac. "Skeletons for parallel image processing : an overview of the SKiPPER project". Parallel Computing, 28(12):1785–1808, Dec 2002.
  82. ^ J. Falcou, J. Serot, T. Chateau, and J. T. Lapreste. "Quaff: efficient c++ design for parallel skeletons." Parallel Computing, 32(7):604–615, 2006.
  83. ^ J. Falcou and J. Serot. "Formal semantics applied to the implementation of a skeleton-based parallel programming library." In G. R. Joubert, C. Bischof, F. J. Peters, T. Lippert, M. Bcker, P. Gibbon, and B. Mohr, editors, Parallel Computing: Architectures, Algorithms and Applications (Proc. of PARCO 2007, Julich, Germany), volume 38 of NIC, pages 243–252, Germany, September 2007. John von Neumann Institute for Computing.
  84. ^ K. Matsuzaki, H. Iwasaki, K. Emoto, and Z. Hu. "A library of constructive skeletons for sequential style of parallel programming." In InfoScale '06: Proceedings of the 1st international conference on Scalable information systems, page 13, New York, NY, USA, 2006. ACM.
  85. ^ K. Matsuzaki, Z. Hu, and M. Takeichi. "Parallelization with tree skeletons." In Euro-Par, volume 2790 of Lecture Notes in Computer Science, pages 789–798. Springer, 2003.
  86. ^ K. Matsuzaki, Z. Hu, and M. Takeichi. "Parallel skeletons for manipulating general trees." Parallel Computation, 32(7):590–603, 2006.
  87. ^ K. Emoto, Z. Hu, K. Kakehi, and M. Takeichi. "A compositional framework for developing parallel programs on two dimensional arrays." Technical report, Department of Mathematical Informatics, University of Tokyo, 2005.
  88. ^ K. Emoto, K. Matsuzaki, Z. Hu, and M. Takeichi. "Domain-specific optimization strategy for skeleton programs." In Euro-Par, volume 4641 of Lecture Notes in Computer Science, pages 705–714. Springer, 2007.
  89. ^ K. Matsuzaki, K. Kakehi, H. Iwasaki, Z. Hu, and Y. Akashi. "A fusion-embedded skeleton library." In M. Danelutto, M. Vanneschi, and D. Laforenza, editors, Euro-Par, volume 3149 of Lecture Notes in Computer Science, pages 644–653. Springer, 2004.
  90. ^ G. H. Botorog and H. Kuchen. "Efficient high-level parallel programming." Theor. Comput. Sci., 196(1–2):71–107, 1998.
  91. ^ Zandifar, Mani; Abduljabbar, Mustafa; Majidi, Alireza; Keyes, David; Amato, Nancy; Rauchwerger, Lawrence (2015). "Composing Algorithmic Skeletons to Express High-Performance Scientific Applications". Proceedings of the 29th ACM on International Conference on Supercomputing. pp. 415–424. doi:10.1145/2751205.2751241. ISBN 9781450335591. S2CID 13764901.
  92. ^ Zandifar, Mani; Thomas, Nathan; Amato, Nancy M.; Rauchwerger, Lawrence (15 September 2014). Brodman, James; Tu, Peng (eds.). Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science. Springer International Publishing. pp. 176–190. doi:10.1007/978-3-319-17473-0_12. ISBN 9783319174723.
  93. ^ "G. Tanase, et.al." "STAPL Parallel Container Framework" Proceeding PPoPP '11 Proceedings of the 16th ACM symposium on Principles and practice of parallel programming Pages 235–246
  94. ^ J. Darlington, A. J. Field, P. G. Harrison, P. H. J. Kelly, D. W. N. Sharp, and Q. Wu. "Parallel programming using skeleton functions." In PARLE '93: Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe, pages 146–160, London, UK, 1993. Springer-Verlag.
  95. ^ J. Darlinton; M. Ghanem; H. W. To (1993), "Structured Parallel Programming", In Programming Models for Massively Parallel Computers. IEEE Computer Society Press. 1993: 160–169, CiteSeerX 10.1.1.37.4610

algorithmic, skeleton, computing, algorithmic, skeletons, parallelism, patterns, high, level, parallel, programming, model, parallel, distributed, computing, take, advantage, common, programming, patterns, hide, complexity, parallel, distributed, applications,. In computing algorithmic skeletons or parallelism patterns are a high level parallel programming model for parallel and distributed computing Algorithmic skeletons take advantage of common programming patterns to hide the complexity of parallel and distributed applications Starting from a basic set of patterns skeletons more complex patterns can be built by combining the basic ones Contents 1 Overview 2 Example program 3 Frameworks and libraries 3 1 ASSIST 3 2 CO2P3S 3 3 Calcium amp Skandium 3 4 Eden 3 5 eSkel 3 6 FastFlow 3 7 HDC 3 8 HOC SA 3 9 JaSkel 3 10 Lithium amp Muskel 3 11 Mallba 3 12 Marrow 3 13 Muesli 3 14 P3L SkIE SKElib 3 15 PAS and EPAS 3 16 SBASCO 3 17 SCL 3 18 SkePU 3 19 SKiPPER amp QUAFF 3 20 SkeTo 3 21 Skil 3 22 STAPL Skeleton Framework 3 23 T4P 4 Frameworks comparison 5 See also 6 ReferencesOverview editThe most outstanding feature of algorithmic skeletons which differentiates them from other high level parallel programming models is that orchestration and synchronization of the parallel activities is implicitly defined by the skeleton patterns Programmers do not have to specify the synchronizations between the application s sequential parts This yields two implications First as the communication data access patterns are known in advance cost models can be applied to schedule skeletons programs 1 Second that algorithmic skeleton programming reduces the number of errors when compared to traditional lower level parallel programming models Threads MPI Example program editThe following example is based on the Java Skandium library for parallel programming The objective is to implement an Algorithmic Skeleton based parallel version of the QuickSort algorithm using the Divide and Conquer pattern Notice that the high level approach hides Thread management from the programmer 1 Define the skeleton program Skeleton lt Range Range gt sort new DaC lt Range Range gt new ShouldSplit threshold maxTimes new SplitList new Sort new MergeList 2 Input parameters Future lt Range gt future sort input new Range generate 3 Do something else here 4 Block for the results Range result future get The first thing is to define a new instance of the skeleton with the functional code that fills the pattern ShouldSplit SplitList Sort MergeList The functional code is written by the programmer without parallelism concerns The second step is the input of data which triggers the computation In this case Range is a class holding an array and two indexes which allow the representation of a subarray For every data entered into the framework a new Future object is created More than one Future can be entered into a skeleton simultaneously The Future allows for asynchronous computation as other tasks can be performed while the results are computed We can retrieve the result of the computation blocking if necessary i e results not yet available The functional codes in this example correspond to four types Condition Split Execute and Merge public class ShouldSplit implements Condition lt Range gt int threshold maxTimes times public ShouldSplit int threshold int maxTimes this threshold threshold this maxTimes maxTimes this times 0 Override public synchronized boolean condition Range r return r right r left gt threshold amp amp times lt this maxTimes The ShouldSplit class implements the Condition interface The function receives an input Range r in this case and returning true or false In the context of the Divide and Conquer where this function will be used this will decide whether a sub array should be subdivided again or not The SplitList class implements the split interface which in this case divides an sub array into smaller sub arrays The class uses a helper function partition which implements the well known QuickSort pivot and swap scheme public class SplitList implements Split lt Range Range gt Override public Range split Range r int i partition r array r left r right Range intervals new Range r array r left i 1 new Range r array i 1 r right return intervals The Sort class implements and Execute interface and is in charge of sorting the sub array specified by Range r In this case we simply invoke Java s default Arrays sort method for the given sub array public class Sort implements Execute lt Range Range gt Override public Range execute Range r if r right lt r left return r Arrays sort r array r left r right 1 return r Finally once a set of sub arrays are sorted we merge the sub array parts into a bigger array with the MergeList class which implements a Merge interface public class MergeList implements Merge lt Range Range gt Override public Range merge Range r Range result new Range r 0 array r 0 left r 1 right return result Frameworks and libraries editASSIST edit ASSIST 2 3 is a programming environment which provides programmers with a structured coordination language The coordination language can express parallel programs as an arbitrary graph of software modules The module graph describes how a set of modules interact with each other using a set of typed data streams The modules can be sequential or parallel Sequential modules can be written in C C or Fortran and parallel modules are programmed with a special ASSIST parallel module parmod AdHoc 4 5 a hierarchical and fault tolerant Distributed Shared Memory DSM system is used to interconnect streams of data between processing elements by providing a repository with get put remove execute operations Research around AdHoc has focused on transparency scalability and fault tolerance of the data repository While not a classical skeleton framework in the sense that no skeletons are provided ASSIST s generic parmod can be specialized into classical skeletons such as farm map etc ASSIST also supports autonomic control of parmods and can be subject to a performance contract by dynamically adapting the number of resources used CO2P3S edit CO2P3S Correct Object Oriented Pattern based Parallel Programming System is a pattern oriented development environment 6 which achieves parallelism using threads in Java CO2P3S is concerned with the complete development process of a parallel application Programmers interact through a programming GUI to choose a pattern and its configuration options Then programmers fill the hooks required for the pattern and new code is generated as a framework in Java for the parallel execution of the application The generated framework uses three levels in descending order of abstraction patterns layer intermediate code layer and native code layer Thus advanced programmers may intervene the generated code at multiple levels to tune the performance of their applications The generated code is mostly type safe using the types provided by the programmer which do not require extension of superclass but fails to be completely type safe such as in the reduce Object reducer method in the mesh pattern The set of patterns supported in CO2P3S corresponds to method sequence distributor mesh and wavefront Complex applications can be built by composing frameworks with their object references Nevertheless if no pattern is suitable the MetaCO2P3S graphical tool addresses extensibility by allowing programmers to modify the pattern designs and introduce new patterns into CO2P3S Support for distributed memory architectures in CO2P3S was introduced in later 7 To use a distributed memory pattern programmers must change the pattern s memory option from shared to distributed and generate the new code From the usage perspective the distributed memory version of the code requires the management of remote exceptions Calcium amp Skandium edit Calcium is greatly inspired by Lithium and Muskel As such it provides algorithmic skeleton programming as a Java library Both task and data parallel skeletons are fully nestable and are instantiated via parametric skeleton objects not inheritance Calcium supports the execution of skeleton applications on top of the ProActive environment for distributed cluster like infrastructure Additionally Calcium has three distinctive features for algorithmic skeleton programming First a performance tuning model which helps programmers identify code responsible for performance bugs 8 Second a type system for nestable skeletons which is proven to guarantee subject reduction properties and is implemented using Java Generics 9 Third a transparent algorithmic skeleton file access model which enables skeletons for data intensive applications 10 Skandium is a complete re implementation of Calcium for multi core computing Programs written on Skandium may take advantage of shared memory to simplify parallel programming 11 Eden edit Eden 12 is a parallel programming language for distributed memory environments which extends Haskell Processes are defined explicitly to achieve parallel programming while their communications remain implicit Processes communicate through unidirectional channels which connect one writer to exactly one reader Programmers only need to specify which data a processes depends on Eden s process model provides direct control over process granularity data distribution and communication topology Eden is not a skeleton language in the sense that skeletons are not provided as language constructs Instead skeletons are defined on top of Eden s lower level process abstraction supporting both task and data parallelism So contrary to most other approaches Eden lets the skeletons be defined in the same language and at the same level the skeleton instantiation is written Eden itself Because Eden is an extension of a functional language Eden skeletons are higher order functions Eden introduces the concept of implementation skeleton which is an architecture independent scheme that describes a parallel implementation of an algorithmic skeleton eSkel edit The Edinburgh Skeleton Library eSkel is provided in C and runs on top of MPI The first version of eSkel was described in 13 while a later version is presented in 14 In 15 nesting mode and interaction mode for skeletons are defined The nesting mode can be either transient or persistent while the interaction mode can be either implicit or explicit Transient nesting means that the nested skeleton is instantiated for each invocation and destroyed Afterwards while persistent means that the skeleton is instantiated once and the same skeleton instance will be invoked throughout the application Implicit interaction means that the flow of data between skeletons is completely defined by the skeleton composition while explicit means that data can be generated or removed from the flow in a way not specified by the skeleton composition For example a skeleton that produces an output without ever receiving an input has explicit interaction Performance prediction for scheduling and resource mapping mainly for pipe lines has been explored by Benoit et al 16 17 18 19 They provided a performance model for each mapping based on process algebra and determine the best scheduling strategy based on the results of the model More recent works have addressed the problem of adaptation on structured parallel programming 20 in particular for the pipe skeleton 21 22 FastFlow edit FastFlow is a skeletal parallel programming framework specifically targeted to the development of streaming and data parallel applications Being initially developed to target multi core platforms it has been successively extended to target heterogeneous platforms composed of clusters of shared memory platforms 23 24 possibly equipped with computing accelerators such as NVidia GPGPUs Xeon Phi Tilera TILE64 The main design philosophy of FastFlow is to provide application designers with key features for parallel programming e g time to market portability efficiency and performance portability via suitable parallel programming abstractions and a carefully designed run time support 25 FastFlow is a general purpose C programming framework for heterogeneous parallel platforms Like other high level programming frameworks such as Intel TBB and OpenMP it simplifies the design and engineering of portable parallel applications However it has a clear edge in terms of expressiveness and performance with respect to other parallel programming frameworks in specific application scenarios including inter alia fine grain parallelism on cache coherent shared memory platforms streaming applications coupled usage of multi core and accelerators In other cases FastFlow is typically comparable to and is some cases slightly faster than state of the art parallel programming frameworks such as Intel TBB OpenMP Cilk etc 26 HDC edit Higher order Divide and Conquer HDC 27 is a subset of the functional language Haskell Functional programs are presented as polymorphic higher order functions which can be compiled into C MPI and linked with skeleton implementations The language focus on divide and conquer paradigm and starting from a general kind of divide and conquer skeleton more specific cases with efficient implementations are derived The specific cases correspond to fixed recursion depth constant recursion degree multiple block recursion elementwise operations and correspondent communications 28 HDC pays special attention to the subproblem s granularity and its relation with the number of Available processors The total number of processors is a key parameter for the performance of the skeleton program as HDC strives to estimate an adequate assignment of processors for each part of the program Thus the performance of the application is strongly related with the estimated number of processors leading to either exceeding number of subproblems or not enough parallelism to exploit available processors HOC SA edit HOC SA is an Globus Incubator project HOC SA stands for Higher Order Components Service Architecture Higher Order Components HOCs have the aim of simplifying Grid application development The objective of HOC SA is to provide Globus users who do not want to know about all the details of the Globus middleware GRAM RSL documents Web services and resource configuration etc with HOCs that provide a higher level interface to the Grid than the core Globus Toolkit HOCs are Grid enabled skeletons implemented as components on top of the Globus Toolkit remotely accessibly via Web Services 29 JaSkel edit JaSkel 30 is a Java based skeleton framework providing skeletons such as farm pipe and heartbeat Skeletons are specialized using inheritance Programmers implement the abstract methods for each skeleton to provide their application specific code Skeletons in JaSkel are provided in both sequential concurrent and dynamic versions For example the concurrent farm can be used in shared memory environments threads but not in distributed environments clusters where the distributed farm should be used To change from one version to the other programmers must change their classes signature to inherit from a different skeleton The nesting of skeletons uses the basic Java Object class and therefore no type system is enforced during the skeleton composition The distribution aspects of the computation are handled in JaSkel using AOP more specifically the AspectJ implementation Thus JaSkel can be deployed on both cluster and Grid like infrastructures 31 Nevertheless a drawback of the JaSkel approach is that the nesting of the skeleton strictly relates to the deployment infrastructure Thus a double nesting of farm yields a better performance than a single farm on hierarchical infrastructures This defeats the purpose of using AOP to separate the distribution and functional concerns of the skeleton program Lithium amp Muskel edit Lithium 32 33 34 and its successor Muskel are skeleton frameworks developed at University of Pisa Italy Both of them provide nestable skeletons to the programmer as Java libraries The evaluation of a skeleton application follows a formal definition of operational semantics introduced by Aldinucci and Danelutto 35 36 which can handle both task and data parallelism The semantics describe both functional and parallel behavior of the skeleton language using a labeled transition system Additionally several performance optimization are applied such as skeleton rewriting techniques 18 10 task lookahead and server to server lazy binding 37 At the implementation level Lithium exploits macro data flow 38 39 to achieve parallelism When the input stream receives a new parameter the skeleton program is processed to obtain a macro data flow graph The nodes of the graph are macro data flow instructions MDFi which represent the sequential pieces of code provided by the programmer Tasks are used to group together several MDFi and are consumed by idle processing elements from a task pool When the computation of the graph is concluded the result is placed into the output stream and thus delivered back to the user Muskel also provides non functional features such as Quality of Service QoS 40 security between task pool and interpreters 41 42 and resource discovery load balancing and fault tolerance when interfaced with Java Jini Parallel Framework JJPF 43 a distributed execution framework Muskel also provides support for combining structured with unstructured programming 44 and recent research has addressed extensibility 45 Mallba edit Mallba 46 is a library for combinatorial optimizations supporting exact heuristic and hybrid search strategies 47 Each strategy is implemented in Mallba as a generic skeleton which can be used by providing the required code On the exact search algorithms Mallba provides branch and bound and dynamic optimization skeletons For local search heuristics Mallba supports hill climbing metropolis simulated annealing and tabu search and also population based heuristics derived from evolutionary algorithms such as genetic algorithms evolution strategy and others CHC The hybrid skeletons combine strategies such as GASA a mixture of genetic algorithm and simulated annealing and CHCCES which combines CHC and ES The skeletons are provided as a C library and are not nestable but type safe A custom MPI abstraction layer is used NetStream which takes care of primitive data type marshalling synchronization etc A skeleton may have multiple lower level parallel implementations depending on the target architectures sequential LAN and WAN For example centralized master slave distributed master slave etc Mallba also provides state variables which hold the state of the search skeleton The state links the search with the environment and can be accessed to inspect the evolution of the search and decide on future actions For example the state can be used to store the best solution found so far or a b values for branch and bound pruning 48 Compared with other frameworks Mallba s usage of skeletons concepts is unique Skeletons are provided as parametric search strategies rather than parametric parallelization patterns Marrow edit Marrow 49 50 is a C algorithmic skeleton framework for the orchestration of OpenCL computations in possibly heterogeneous multi GPU environments It provides a set of both task and data parallel skeletons that can be composed through nesting to build compound computations The leaf nodes of the resulting composition trees represent the GPU computational kernels while the remainder nodes denote the skeleton applied to the nested sub tree The framework takes upon itself the entire host side orchestration required to correctly execute these trees in heterogeneous multi GPU environments including the proper ordering of the data transfer and of the execution requests and the communication required between the tree s nodes Among Marrow s most distinguishable features are a set of skeletons previously unavailable in the GPU context such as Pipeline and Loop and the skeleton nesting ability a feature also new in this context Moreover the framework introduces optimizations that overlap communication and computation hence masking the latency imposed by the PCIe bus The parallel execution of a Marrow composition tree by multiple GPUs follows a data parallel decomposition strategy that concurrently applies the entire computational tree to different partitions of the input dataset Other than expressing which kernel parameters may be decomposed and when required defining how the partial results should be merged the programmer is completely abstracted from the underlying multi GPU architecture More information as well as the source code can be found at the Marrow website Muesli edit The Muenster Skeleton Library Muesli 51 52 is a C template library which re implements many of the ideas and concepts introduced in Skil e g higher order functions currying and polymorphic types 1 It is built on top of MPI 1 2 and OpenMP 2 5 and supports unlike many other skeleton libraries both task and data parallel skeletons Skeleton nesting composition is similar to the two tier approach of P3L i e task parallel skeletons can be nested arbitrarily while data parallel skeletons cannot but may be used at the leaves of a task parallel nesting tree 53 C templates are used to render skeletons polymorphic but no type system is enforced However the library implements an automated serialization mechanism inspired by 54 such that in addition to the standard MPI data types arbitrary user defined data types can be used within the skeletons The supported task parallel skeletons 55 are Branch amp Bound 56 Divide amp Conquer 57 58 Farm 59 60 and Pipe auxiliary skeletons are Filter Final and Initial Data parallel skeletons such as fold reduce map permute zip and their variants are implemented as higher order member functions of a distributed data structure Currently Muesli supports distributed data structures for arrays matrices and sparse matrices 61 As a unique feature Muesli s data parallel skeletons automatically scale both on single as well as on multi core multi node cluster architectures 62 63 Here scalability across nodes and cores is ensured by simultaneously using MPI and OpenMP respectively However this feature is optional in the sense that a program written with Muesli still compiles and runs on a single core multi node cluster computer without changes to the source code i e backward compatibility is guaranteed This is ensured by providing a very thin OpenMP abstraction layer such that the support of multi core architectures can be switched on off by simply providing omitting the OpenMP compiler flag when compiling the program By doing so virtually no overhead is introduced at runtime P3L SkIE SKElib edit P3L 64 Pisa Parallel Programming Language is a skeleton based coordination language P3L provides skeleton constructs which are used to coordinate the parallel or sequential execution of C code A compiler named Anacleto 65 is provided for the language Anacleto uses implementation templates to compile P3 L code into a target architecture Thus a skeleton can have several templates each optimized for a different architecture A template implements a skeleton on a specific architecture and provides a parametric process graph with a performance model The performance model can then be used to decide program transformations which can lead to performance optimizations 66 A P3L module corresponds to a properly defined skeleton construct with input and output streams and other sub modules or sequential C code Modules can be nested using the two tier model where the outer level is composed of task parallel skeletons while data parallel skeletons may be used in the inner level 64 Type verification is performed at the data flow level when the programmer explicitly specifies the type of the input and output streams and by specifying the flow of data between sub modules SkIE 67 Skeleton based Integrated Environment is quite similar to P3L as it is also based on a coordination language but provides advanced features such as debugging tools performance analysis visualization and graphical user interface Instead of directly using the coordination language programmers interact with a graphical tool where parallel modules based on skeletons can be composed SKELib 68 builds upon the contributions of P3L and SkIE by inheriting among others the template system It differs from them because a coordination language is no longer used but instead skeletons are provided as a library in C with performance similar as the one achieved in P3L Contrary to Skil another C like skeleton framework type safety is not addressed in SKELib PAS and EPAS edit PAS Parallel Architectural Skeletons is a framework for skeleton programming developed in C and MPI 69 70 Programmers use an extension of C to write their skeleton applications1 The code is then passed through a Perl script which expands the code to pure C where skeletons are specialized through inheritance In PAS every skeleton has a Representative Rep object which must be provided by the programmer and is in charge of coordinating the skeleton s execution Skeletons can be nested in a hierarchical fashion via the Rep objects Besides the skeleton s execution the Rep also explicitly manages the reception of data from the higher level skeleton and the sending of data to the sub skeletons A parametrized communication synchronization protocol is used to send and receive data between parent and sub skeletons An extension of PAS labeled as SuperPas 71 and later as EPAS 72 addresses skeleton extensibility concerns With the EPAS tool new skeletons can be added to PAS A Skeleton Description Language SDL is used to describe the skeleton pattern by specifying the topology with respect to a virtual processor grid The SDL can then be compiled into native C code which can be used as any other skeleton SBASCO edit SBASCO Skeleton BAsed Scientific COmponents is a programming environment oriented towards efficient development of parallel and distributed numerical applications 73 SBASCO aims at integrating two programming models skeletons and components with a custom composition language An application view of a component provides a description of its interfaces input and output type while a configuration view provides in addition a description of the component s internal structure and processor layout A component s internal structure can be defined using three skeletons farm pipe and multi block SBASCO s addresses domain decomposable applications through its multi block skeleton Domains are specified through arrays mainly two dimensional which are decomposed into sub arrays with possible overlapping boundaries The computation then takes place in an iterative BSP like fashion The first stage consists of local computations while the second stage performs boundary exchanges A use case is presented for a reaction diffusion problem in 74 Two type of components are presented in 75 Scientific Components SC which provide the functional code and Communication Aspect Components CAC which encapsulate non functional behavior such as communication distribution processor layout and replication For example SC components are connected to a CAC component which can act as a manager at runtime by dynamically re mapping processors assigned to a SC A use case showing improved performance when using CAC components is shown in 76 SCL edit The Structured Coordination Language SCL 77 was one of the earliest skeleton programming languages It provides a co ordination language approach for skeleton programming over software components SCL is considered a base language and was designed to be integrated with a host language for example Fortran or C used for developing sequential software components In SCL skeletons are classified into three types configuration elementary and computation Configuration skeletons abstract patterns for commonly used data structures such as distributed arrays ParArray Elementary skeletons correspond to data parallel skeletons such as map scan and fold Computation skeletons which abstract the control flow and correspond mainly to task parallel skeletons such as farm SPMD and iterateUntil The coordination language approach was used in conjunction with performance models for programming traditional parallel machines as well as parallel heterogeneous machines that have different multiple cores on each processing node 78 SkePU edit SkePU 79 SkePU is a skeleton programming framework for multicore CPUs and multi GPU systems It is a C template library with six data parallel and one task parallel skeletons two container types and support for execution on multi GPU systems both with CUDA and OpenCL Recently support for hybrid execution performance aware dynamic scheduling and load balancing is developed in SkePU by implementing a backend for the StarPU runtime system SkePU is being extended for GPU clusters SKiPPER amp QUAFF edit SKiPPER is a domain specific skeleton library for vision applications 80 which provides skeletons in CAML and thus relies on CAML for type safety Skeletons are presented in two ways declarative and operational Declarative skeletons are directly used by programmers while their operational versions provide an architecture specific target implementation From the runtime environment CAML skeleton specifications and application specific functions provided in C by the programmer new C code is generated and compiled to run the application on the target architecture One of the interesting things about SKiPPER is that the skeleton program can be executed sequentially for debugging Different approaches have been explored in SKiPPER for writing operational skeletons static data flow graphs parametric process networks hierarchical task graphs and tagged token data flow graphs 81 QUAFF 82 is a more recent skeleton library written in C and MPI QUAFF relies on template based meta programming techniques to reduce runtime overheads and perform skeleton expansions and optimizations at compilation time Skeletons can be nested and sequential functions are stateful Besides type checking QUAFF takes advantage of C templates to generate at compilation time new C MPI code QUAFF is based on the CSP model where the skeleton program is described as a process network and production rules single serial par join 83 SkeTo edit The SkeTo 84 project is a C library which achieves parallelization using MPI SkeTo is different from other skeleton libraries because instead of providing nestable parallelism patterns SkeTo provides parallel skeletons for parallel data structures such as lists trees 85 86 and matrices 87 The data structures are typed using templates and several parallel operations can be invoked on them For example the list structure provides parallel operations such as map reduce scan zip shift etc Additional research around SkeTo has also focused on optimizations strategies by transformation and more recently domain specific optimizations 88 For example SkeTo provides a fusion transformation 89 which merges two successive function invocations into a single one thus decreasing the function call overheads and avoiding the creation of intermediate data structures passed between functions Skil edit Skil 90 is an imperative language for skeleton programming Skeletons are not directly part of the language but are implemented with it Skil uses a subset of C language which provides functional language like features such as higher order functions curring and polymorphic types When Skil is compiled such features are eliminated and a regular C code is produced Thus Skil transforms polymorphic high order functions into monomorphic first order C functions Skil does not support nestable composition of skeletons Data parallelism is achieved using specific data parallel structures for example to spread arrays among available processors Filter skeletons can be used STAPL Skeleton Framework edit In STAPL Skeleton Framework 91 92 skeletons are defined as parametric data flow graphs letting them scale beyond 100 000 cores In addition this framework addresses composition of skeletons as point to point composition of their corresponding data flow graphs through the notion of ports allowing new skeletons to be easily added to the framework As a result this framework eliminate the need for reimplementation and global synchronizations in composed skeletons STAPL Skeleton Framework supports nested composition and can switch between parallel and sequential execution in each level of nesting This framework benefits from scalable implementation of STAPL parallel containers 93 and can run skeletons on various containers including vectors multidimensional arrays and lists T4P edit T4P was one of the first systems introduced for skeleton programming 94 The system relied heavily on functional programming properties and five skeletons were defined as higher order functions Divide and Conquer Farm Map Pipe and RaMP A program could have more than one implementation each using a combination of different skeletons Furthermore each skeleton could have different parallel implementations A methodology based on functional program transformations guided by performance models of the skeletons was used to select the most appropriate skeleton to be used for the program as well as the most appropriate implementation of the skeleton 95 Frameworks comparison editActivity years is the known activity years span The dates represented in this column correspond to the first and last publication date of a related article in a scientific journal or conference proceeding Note that a project may still be active beyond the activity span and that we have failed to find a publication for it beyond the given date Programming language is the interface with which programmers interact to code their skeleton applications These languages are diverse encompassing paradigms such as functional languages coordination languages markup languages imperative languages object oriented languages and even graphical user interfaces Inside the programming language skeletons have been provided either as language constructs or libraries Providing skeletons as language construct implies the development of a custom domain specific language and its compiler This was clearly the stronger trend at the beginning of skeleton research The more recent trend is to provide skeletons as libraries in particular with object oriented languages such as C and Java Execution language is the language in which the skeleton applications are run or compiled It was recognized very early that the programming languages specially in the functional cases were not efficient enough to execute the skeleton programs Therefore skeleton programming languages were simplified by executing skeleton application on other languages Transformation processes were introduced to convert the skeleton applications defined in the programming language into an equivalent application on the target execution language Different transformation processes were introduced such as code generation or instantiation of lowerlevel skeletons sometimes called operational skeletons which were capable of interacting with a library in the execution language The transformed application also gave the opportunity to introduce target architecture code customized for performance into the transformed application Table 1 shows that a favorite for execution language has been the C language Distribution library provides the functionality to achieve parallel distributed computations The big favorite in this sense has been MPI which is not surprising since it integrates well with the C language and is probably the most used tool for parallelism in cluster computing The dangers of directly programming with the distribution library are of course safely hidden away from the programmers who never interact with the distribution library Recently the trend has been to develop skeleton frameworks capable of interacting with more than one distribution library For example CO2 P3 S can use Threads RMI or Sockets Mallba can use Netstream or MPI or JaSkel which uses AspectJ to execute the skeleton applications on different skeleton frameworks Type safety refers to the capability of detecting type incompatibility errors in skeleton program Since the first skeleton frameworks were built on functional languages such as Haskell type safety was simply inherited from the host language Nevertheless as custom languages were developed for skeleton programming compilers had to be written to take type checking into consideration which was not as difficult as skeleton nesting was not fully supported Recently however as we began to host skeleton frameworks on object oriented languages with full nesting the type safety issue has resurfaced Unfortunately type checking has been mostly overlooked with the exception of QUAFF and specially in Java based skeleton frameworks Skeleton nesting is the capability of hierarchical composition of skeleton patterns Skeleton Nesting was identified as an important feature in skeleton programming from the very beginning because it allows the composition of more complex patterns starting from a basic set of simpler patterns Nevertheless it has taken the community a long time to fully support arbitrary nesting of skeletons mainly because of the scheduling and type verification difficulties The trend is clear that recent skeleton frameworks support full nesting of skeletons File access is the capability to access and manipulate files from an application In the past skeleton programming has proven useful mostly for computational intensive applications where small amounts of data require big amounts of computation time Nevertheless many distributed applications require or produce large amounts of data during their computation This is the case for astrophysics particle physics bio informatics etc Thus providing file transfer support that integrates with skeleton programming is a key concern which has been mostly overlooked Skeleton set is the list of supported skeleton patterns Skeleton sets vary greatly from one framework to the other and more shocking some skeletons with the same name have different semantics on different frameworks The most common skeleton patterns in the literature are probably farm pipe and map Non object oriented algorithmic skeleton frameworks Activity years Programming language Execution language Distribution library Type safe Skeleton nesting File access Skeleton setASSIST 2004 2007 Custom control language C TCP IP ssh scp Yes No explicit seq parmodSBSACO 2004 2006 Custom composition language C MPI Yes Yes No farm pipe multi blockeSkel 2004 2005 C C MPI No No pipeline farm deal butterfly hallowSwapHDC 2004 2005 Haskell subset C MPI Yes No dcA dcB dcD dcE dcF map red scan filterSKELib 2000 2000 C C MPI No No No farm pipeSkiPPER 1999 2002 CAML C SynDex Yes limited No scm df tf intermemSkIE 1999 1999 GUI Custom control language C MPI Yes limited No pipe farm map reduce loopEden 1997 2011 Haskell extension Haskell PVM MPI Yes Yes No map farm workpool nr dc pipe iterUntil torus ringP3L 1995 1998 Custom control language C MPI Yes limited No map reduce scan comp pipe farm seq loopSkil 1995 1998 C subset C Yes No No pardata map foldSCL 1994 1999 Custom control language Fortran C MPI Yes limited No map scan fold farm SPMD iterateUntilT4P 1990 1994 Hope Hope CSTools Yes limited No D amp C Divide and Conquer Map Pipe RaMPObject oriented algorithmic skeleton frameworks Activity years Programming language Execution language Distribution library Type safe Skeleton nesting File access Skeleton setSkandium 2009 2012 Java Java Threads Yes Yes No seq pipe farm for while map d amp c forkFastFlow 2009 C C 11 CUDA OpenCL C 11 threads Posix threads TCP IP OFED IB CUDA OpenCL Yes Yes Yes Pipeline Farm ParallelFor ParallelForReduce MapReduce StencilReduce PoolEvolution MacroDataFlowCalcium 2006 2008 Java Java ProActive Yes Yes Yes seq pipe farm for while map d amp c forkQUAFF 2006 2007 C C MPI Yes Yes No seq pipe farm scm pardoJaSkel 2006 2007 Java Java AspectJ MPP RMI No Yes No farm pipeline heartbeatMuskel 2005 2008 Java Java RMI No Yes No farm pipe seq custom MDF GraphsHOC SA 2004 2008 Java Java Globus KOALA No No No farm pipeline wavefrontSkeTo 2003 2013 C C MPI Yes No No list matrix treeMallba 2002 2007 C C NetStream MPI Yes No No exact heuristic hybridMarrow 2013 C C plus OpenCL none No Yes No data parallel map map reduce task parallel pipeline loop forMuesli 2002 2013 C C MPI OpenMP Yes Yes No data parallel fold map permute scan zip and variants task parallel branch amp bound divide amp conquer farm pipe auxiliary filter final initialAlt 2002 2003 Java GworkflowDL Java Java RMI Yes No No map zip reduction scan dh replicate apply sort E PAS 1999 2005 C extension C MPI No Yes No singleton replication compositional pipeline divideconquer dataparallelLithium 1999 2004 Java Java RMI No Yes No pipe map farm reduceCO2P3S 1999 2003 GUI Java Java generated Threads RMI Sockets Partial No No method sequence distributor mesh wavefrontSTAPL 2010 C C 11 STAPL Runtime Library MPI OpenMP PThreads Yes Yes Yes map zip lt arity gt reduce scan farm reverse butterfly reverse tree lt k ary gt recursive doubling serial transpose stencil lt n dim gt wavefront lt n dim gt allreduce allgather gather scatter broadcast Operators compose repeat do while do all do acrossSee also editHalide programming language Cuneiform programming language Parallel programming modelReferences edit K Hammond and G Michelson editors Research Directions in Parallel Functional Programming Springer Verlag London UK 1999 Vanneschi M 2002 The programming model of ASSIST an environment for parallel and distributed portable applications Parallel Computing 28 12 1709 1732 CiteSeerX 10 1 1 59 5543 doi 10 1016 S0167 8191 02 00188 6 M Aldinucci M Coppola M Danelutto N Tonellotto M Vanneschi and C Zoccolo High level grid programming with ASSIST Computational Methods in Science and Technology 12 1 21 32 2006 M Aldinucci and M Torquati Accelerating apache farms through ad hoc distributed scalable object repository In Proc of 10th Intl Euro Par 2004 Parallel Processing volume 3149 of LNCS pages 596 605 Springer 2004 Aldinucci M Danelutto M Antoniu G Jan M 2008 Fault Tolerant Data Sharing for High level Grid A Hierarchical Storage Architecture Achievements in European Research on Grid Systems p 67 doi 10 1007 978 0 387 72812 4 6 ISBN 978 0 387 72811 7 S MacDonald J Anvik S Bromling J Schaeffer D Szafron and K Tan From patterns to frameworks to parallel programs Parallel Comput 28 12 1663 1683 2002 K Tan D Szafron J Schaeffer J Anvik and S MacDonald Using generative design patterns to generate parallel code for a distributed memory environment In PPoPP 03 Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming pages 203 215 New York NY USA 2003 ACM D Caromel and M Leyton Fine tuning algorithmic skeletons In 13th International Euro Par Conference Parallel Processing volume 4641 of Lecture Notes in Computer Science pages 72 81 Springer Verlag 2007 D Caromel L Henrio and M Leyton Type safe algorithmic skeletons In Proceedings of the 16th Euromicro Conference on Parallel Distributed and Network based Processing pages 45 53 Toulouse France Feb 2008 IEEE CS Press D Caromel and M Leyton A transparent non invasive file data model for algorithmic skeletons In 22nd International Parallel and Distributed Processing Symposium IPDPS pages 1 8 Miami USA March 2008 IEEE Computer Society Mario Leyton Jose M Piquer Skandium Multi core Programming with algorithmic skeletons IEEE Euro micro PDP 2010 Rita Loogen and Yolanda Ortega Mallen and Ricardo Pena Mari Parallel Functional Programming in Eden Journal of Functional Programming No 15 2005 3 pages 431 475 Murray Cole Bringing skeletons out of the closet a pragmatic manifesto for skeletal parallel programming Parallel Computing 30 3 389 406 2004 A Benoit M Cole S Gilmore and J Hillston Flexible skeletal programming with eskel In J C Cunha and P D Medeiros editors Euro Par volume 3648 of Lecture Notes in Computer Science pages 761 770 Springer 2005 A Benoit and M Cole Two fundamental concepts in skeletal parallel programming In V Sunderam D van Albada P Sloot and J Dongarra editors The International Confer ence on Computational Science ICCS 2005 Part II LNCS 3515 pages 764 771 Springer Verlag 2005 A Benoit M Cole S Gilmore and J Hillston Evaluating the performance of skeleton based high level parallel programs In M Bubak D van Albada P Sloot and J Dongarra editors The International Conference on Computational Science ICCS 2004 Part III LNCS 3038 pages 289 296 Springer Verlag 2004 A Benoit M Cole S Gilmore and J Hillston Evaluating the performance of pipeline structured parallel programs with skeletons and process algebra Scalable Computing Practice and Experience 6 4 1 16 December 2005 A Benoit M Cole S Gilmore and J Hillston Scheduling skeleton based grid applications using pepa and nws The Computer Journal Special issue on Grid Performability Modelling and Measurement 48 3 369 378 2005 A Benoit and Y Robert Mapping pipeline skeletons onto heterogeneous platforms In ICCS 2007 the 7th International Conference on Computational Science LNCS 4487 pages 591 598 Springer Verlag 2007 G Yaikhom M Cole S Gilmore and J Hillston A structural approach for modelling performance of systems using skeletons Electr Notes Theor Comput Sci 190 3 167 183 2007 H Gonzalez Velez and M Cole Towards fully adaptive pipeline parallelism for heterogeneous distributed environments In Parallel and Distributed Processing and Applications 4th International Symposium ISPA Lecture Notes in Computer Science pages 916 926 Springer Verlag 2006 H Gonzalez Velez and M Cole Adaptive structured parallelism for computational grids In PPoPP 07 Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming pages 140 141 New York NY USA 2007 ACM Aldinucci M Campa S Danelutto M Kilpatrick P Torquati M 2013 Targeting Distributed Systems in FastFlow PDF Euro Par 2012 Parallel Processing Workshops Euro Par 2012 Parallel Processing Workshops Lecture Notes in Computer Science Vol 7640 pp 47 56 doi 10 1007 978 3 642 36949 0 7 ISBN 978 3 642 36948 3 Aldinucci M Spampinato C Drocco M Torquati M Palazzo S 2012 A parallel edge preserving algorithm for salt and pepper image denoising 2012 3rd International Conference on Image Processing Theory Tools and Applications IPTA 3rd International Conference on Image Processing Theory Tools and Applications IPTA pp 97 104 doi 10 1109 IPTA 2012 6469567 hdl 2318 154520 Aldinucci M Danelutto M Kilpatrick P Meneghin M Torquati M 2012 An Efficient Unbounded Lock Free Queue for Multi core Systems Euro Par 2012 Parallel Processing Euro Par 2012 Parallel Processing Lecture Notes in Computer Science Vol 7484 pp 662 673 doi 10 1007 978 3 642 32820 6 65 hdl 2318 121343 ISBN 978 3 642 32819 0 Aldinucci M Meneghin M Torquati M 2010 Efficient Smith Waterman on Multi core with Fast Flow 2010 18th Euromicro Conference on Parallel Distributed and Network based Processing IEEE p 195 CiteSeerX 10 1 1 163 9092 doi 10 1109 PDP 2010 93 ISBN 978 1 4244 5672 7 S2CID 1925361 C A Herrmann and C Lengauer HDC A higher order language for divide and conquer Parallel Processing Letters 10 2 3 239 250 2000 C A Herrmann The Skeleton Based Parallelization of Divide and Conquer Recursions PhD thesis 2000 ISBN 3 89722 556 5 J Dunnweber S Gorlatch Higher Order Components for Grid Programming Making Grids More Usable Springer Verlag 2009 ISBN 978 3 642 00840 5 J F Ferreira J L Sobral and A J Proenca Jaskel A java skeleton based framework for structured cluster and grid computing In CCGRID 06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid pages 301 304 Washington DC USA 2006 IEEE Computer Society J Sobral and A Proenca Enabling jaskel skeletons for clusters and computational grids In IEEE Cluster IEEE Press 9 2007 M Aldinucci and M Danelutto Stream parallel skeleton optimization In Proc of PDCS Intl Conference on Parallel and Distributed Computing and Systems pages 955 962 Cambridge Massachusetts USA Nov 1999 IASTED ACTA press Aldinucci M Danelutto M Teti P 2003 An advanced environment supporting structured parallel programming in Java Future Generation Computer Systems 19 5 611 CiteSeerX 10 1 1 59 3748 doi 10 1016 S0167 739X 02 00172 3 M Danelutto and P Teti Lithium A structured parallel programming environment in Java In Proc of ICCS International Conference on Computational Science volume 2330 of LNCS pages 844 853 Springer Verlag Apr 2002 M Aldinucci and M Danelutto An operational semantics for skeletons In G R Joubert W E Nagel F J Peters and W V Walter editors Parallel Computing Software Technology Algorithms Architectures and Applications PARCO 2003 volume 13 of Advances in Parallel Computing pages 63 70 Dresden Germany 2004 Elsevier Aldinucci M Danelutto M 2007 Skeleton based parallel programming Functional and parallel semantics in a single shot Computer Languages Systems amp Structures 33 3 4 179 CiteSeerX 10 1 1 164 368 doi 10 1016 j cl 2006 07 004 M Aldinucci M Danelutto and J Dunnweber Optimization techniques for implementing parallel skeletons in grid environments In S Gorlatch editor Proc of CMPP Intl Workshop on Constructive Methods for Parallel Programming pages 35 47 Stirling Scotland UK July 2004 Universitat Munster Germany M Danelutto Efficient support for skeletons on workstation clusters Parallel Processing Letters 11 1 41 56 2001 M Danelutto Dynamic run time support for skeletons Technical report 1999 M Danelutto Qos in parallel programming through application managers In PDP 05 Proceedings of the 13th Euromicro Conference on Parallel Distributed and Network Based Processing PDP 05 pages 282 289 Washington DC USA 2005 IEEE Computer Society M Aldinucci and M Danelutto The cost of security in skeletal systems In P D Ambra and M R Guarracino editors Proc of Intl Euromicro PDP 2007 Parallel Distributed and network based Processing pages 213 220 Napoli Italia February 2007 IEEE M Aldinucci and M Danelutto Securing skeletal systems with limited performance penalty the muskel experience Journal of Systems Architecture 2008 M Danelutto and P Dazzi A Java Jini framework supporting stream parallel computations In Proc of Intl PARCO 2005 Parallel Computing Sept 2005 M Danelutto and P Dazzi Joint structured non structured parallelism exploitation through data flow In V Alexandrov D van Albada P Sloot and J Dongarra editors Proc of ICCS International Conference on Computational Science Workshop on Practical Aspects of High level Parallel Programming LNCS Reading UK May 2006 Springer Verlag M Aldinucci M Danelutto and P Dazzi Muskel an expandable skeleton environment Scalable Computing Practice and Experience 8 4 325 341 December 2007 E Alba F Almeida M J Blesa J Cabeza C Cotta M Diaz I Dorta J Gabarro C Leon J Luna L M Moreno C Pablos J Petit A Rojas and F Xhafa Mallba A library of skeletons for combinatorial optimisation research note In Euro Par 02 Proceedings of the 8th International Euro Par Conference on Parallel Processing pages 927 932 London UK 2002 Springer Verlag E Alba F Almeida M Blesa C Cotta M Diaz I Dorta J Gabarro C Leon G Luque J Petit C Rodriguez A Rojas and F Xhafa Efficient parallel lan wan algorithms for optimization the mallba project Parallel Computing 32 5 415 440 2006 E Alba G Luque J Garcia Nieto G Ordonez and G Leguizamon Mallba a software library to design efficient optimisation algorithms International Journal of Innovative Computing and Applications 1 1 74 85 2007 Ricardo Marques Herve Paulino Fernando Alexandre Pedro D Medeiros Algorithmic Skeleton Framework for the Orchestration of GPU Computations Euro Par 2013 874 885 Fernando Alexandre Ricardo Marques Herve Paulino On the Support of Task Parallel Algorithmic Skeletons for Multi GPU Computing ACM SAC 2014 880 885 H Kuchen and J Striegnitz Features from functional programming for a C skeleton library Concurrency Practice and Experience 17 7 8 739 756 2005 Philipp Ciechanowicz Michael Poldner and Herbert Kuchen The Muenster Skeleton Library Muesli A Comprehensive Overview ERCIS Working Paper No 7 2009 H Kuchen and M Cole The integration of task and data parallel skeletons Parallel Processing Letters 12 2 141 155 2002 A Alexandrescu Modern C Design Generic Programming and Design Patterns Applied Addison Wesley 2001 Michael Poldner Task Parallel Algorithmic Skeletons PhD Thesis University of Munster 2008 Michael Poldner and Herbert Kuchen Algorithmic Skeletons for Branch and Bound Proceedings of the 1st International Conference on Software and Data Technology ICSOFT 1 291 300 2006 Michael Poldner and Herbert Kuchen Optimizing Skeletal Stream Processing for Divide and Conquer Proceedings of the 3rd International Conference on Software and Data Technologies ICSOFT 181 189 2008 Michael Poldner and Herbert Kuchen Skeletons for Divide and Conquer Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks PDCN 181 188 2008 Michael Poldner and Herbert Kuchen Scalable Farms Proceedings of the International Conference on Parallel Processing ParCo 33 795 802 2006 Michael Poldner and Herbert Kuchen On Implementing the Farm Skeleton Parallel Processing Letters 18 1 117 131 2008 Philipp Ciechanowicz Algorithmic Skeletons for General Sparse Matrices Proceedings of the 20th IASTED International Conference on Parallel and Distributed Computing and Systems PDCS 188 197 2008 Philipp Ciechanowicz Philipp Kegel Maraike Schellmann Sergei Gorlatch and Herbert Kuchen Parallelizing the LM OSEM Image Reconstruction on Multi Core Clusters Parallel Computing From Multicores and GPU s to Petascale 19 169 176 2010 Philipp Ciechanowicz and Herbert Kuchen Enhancing Muesli s Data Parallel Skeletons for Multi Core Computer Architectures International Conference on High Performance Computing and Communications HPCC 108 113 2010 Bacci B Danelutto M Orlando S Pelagatti S Vanneschi M 1995 P3L A structured high level parallel language and its structured support Concurrency Practice and Experience 7 3 225 CiteSeerX 10 1 1 215 6425 doi 10 1002 cpe 4330070305 S Ciarpaglini M Danelutto L Folchi C Manconi and S Pelagatti ANACLETO a template based p3l compiler In Proceedings of the Seventh Parallel Computing Workshop PCW 97 Australian National University Canberra August 1997 M Aldinucci M Coppola and M Danelutto Rewriting skeleton programs How to evaluate the data parallel stream parallel tradeoff In S Gorlatch editor Proc of CMPP Intl Workshop on Constructive Methods for Parallel Programming pages 44 58 Uni Passau Germany May 1998 B Bacci M Danelutto S Pelagatti and M Vanneschi Skie a heterogeneous environment for HPC applications Parallel Comput 25 13 14 1827 1852 1999 M Danelutto and M Stigliani Skelib Parallel programming with skeletons in C In Euro Par 00 Proceedings from the 6th International Euro Par Conference on Parallel Processing pages 1175 1184 London UK 2000 Springer Verlag D Goswami A Singh and B R Preiss From design patterns to parallel architectural skeletons J Parallel Distrib Comput 62 4 669 695 2002 doi 10 1006 jpdc 2001 1809 D Goswami A Singh and B R Preiss Using object oriented techniques for realizing parallel architectural skeletons In ISCOPE 99 Proceedings of the Third International Symposium on Computing in Object Oriented Parallel Environments Lecture Notes in Computer Science pages 130 141 London UK 1999 Springer Verlag M M Akon D Goswami and H F Li Superpas A parallel architectural skeleton model supporting extensibility and skeleton composition In Parallel and Distributed Processing and Applications Second International Symposium ISPA Lecture Notes in Computer Science pages 985 996 Springer Verlag 2004 M M Akon A Singh D Goswami and H F Li Extensible parallel architectural skeletons In High Performance Computing HiPC 2005 12th International Conference volume 3769 of Lecture Notes in Computer Science pages 290 301 Goa India December 2005 Springer Verlag M Diaz B Rubio E Soler and J M Troya SBASCO Skeleton based scientific components In PDP pages 318 IEEE Computer Society 2004 M Diaz S Romero B Rubio E Soler and J M Troya Using SBASCO to solve reaction diffusion equations in two dimensional irregular domains In Practical Aspects of High Level Parallel Programming PAPP affiliated to the International Conference on Computational Science ICCS volume 3992 of Lecture Notes in Computer Science pages 912 919 Springer 2006 M Diaz S Romero B Rubio E Soler and J M Troya An aspect oriented framework for scientific component development In PDP 05 Proceedings of the 13th Euromicro Conference on Parallel Distributed and Network Based Processing pages 290 296 Washington DC USA 2005 IEEE Computer Society M Diaz S Romero B Rubio E Soler and J M Troya Dynamic reconfiguration of scientific components using aspect oriented programming A case study In R Meersman And Z Tari editors On the Move to Meaningful Internet Systems 2006 CoopIS DOA GADA and ODBASE volume 4276 of Lecture Notes in Computer Science pages 1351 1360 Springer Verlag 2006 J Darlington Y ke Guo H W To and J Yang Parallel skeletons for structured composition In PPOPP 95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming pages 19 28 New York NY USA 1995 ACM John Darlinton Moustafa Ghanem Yike Guo Hing Wing To 1996 Guided Resource Organisation in Heterogeneous Parallel Computing Journal of High Performance Computing 4 1 13 23 CiteSeerX 10 1 1 37 4309 SkePU J Serot D Ginhac and J Derutin SKiPPER a skeleton based parallel programming environment for real time image processing applications In V Malyshkin editor 5th International Conference on Parallel Computing Technologies PaCT 99 volume 1662 of LNCS pages 296 305 Springer 6 10 September 1999 J Serot and D Ginhac Skeletons for parallel image processing an overview of the SKiPPER project Parallel Computing 28 12 1785 1808 Dec 2002 J Falcou J Serot T Chateau and J T Lapreste Quaff efficient c design for parallel skeletons Parallel Computing 32 7 604 615 2006 J Falcou and J Serot Formal semantics applied to the implementation of a skeleton based parallel programming library In G R Joubert C Bischof F J Peters T Lippert M Bcker P Gibbon and B Mohr editors Parallel Computing Architectures Algorithms and Applications Proc of PARCO 2007 Julich Germany volume 38 of NIC pages 243 252 Germany September 2007 John von Neumann Institute for Computing K Matsuzaki H Iwasaki K Emoto and Z Hu A library of constructive skeletons for sequential style of parallel programming In InfoScale 06 Proceedings of the 1st international conference on Scalable information systems page 13 New York NY USA 2006 ACM K Matsuzaki Z Hu and M Takeichi Parallelization with tree skeletons In Euro Par volume 2790 of Lecture Notes in Computer Science pages 789 798 Springer 2003 K Matsuzaki Z Hu and M Takeichi Parallel skeletons for manipulating general trees Parallel Computation 32 7 590 603 2006 K Emoto Z Hu K Kakehi and M Takeichi A compositional framework for developing parallel programs on two dimensional arrays Technical report Department of Mathematical Informatics University of Tokyo 2005 K Emoto K Matsuzaki Z Hu and M Takeichi Domain specific optimization strategy for skeleton programs In Euro Par volume 4641 of Lecture Notes in Computer Science pages 705 714 Springer 2007 K Matsuzaki K Kakehi H Iwasaki Z Hu and Y Akashi A fusion embedded skeleton library In M Danelutto M Vanneschi and D Laforenza editors Euro Par volume 3149 of Lecture Notes in Computer Science pages 644 653 Springer 2004 G H Botorog and H Kuchen Efficient high level parallel programming Theor Comput Sci 196 1 2 71 107 1998 Zandifar Mani Abduljabbar Mustafa Majidi Alireza Keyes David Amato Nancy Rauchwerger Lawrence 2015 Composing Algorithmic Skeletons to Express High Performance Scientific Applications Proceedings of the 29th ACM on International Conference on Supercomputing pp 415 424 doi 10 1145 2751205 2751241 ISBN 9781450335591 S2CID 13764901 Zandifar Mani Thomas Nathan Amato Nancy M Rauchwerger Lawrence 15 September 2014 Brodman James Tu Peng eds Languages and Compilers for Parallel Computing Lecture Notes in Computer Science Springer International Publishing pp 176 190 doi 10 1007 978 3 319 17473 0 12 ISBN 9783319174723 G Tanase et al STAPL Parallel Container Framework Proceeding PPoPP 11 Proceedings of the 16th ACM symposium on Principles and practice of parallel programming Pages 235 246 J Darlington A J Field P G Harrison P H J Kelly D W N Sharp and Q Wu Parallel programming using skeleton functions In PARLE 93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe pages 146 160 London UK 1993 Springer Verlag J Darlinton M Ghanem H W To 1993 Structured Parallel Programming In Programming Models for Massively Parallel Computers IEEE Computer Society Press 1993 160 169 CiteSeerX 10 1 1 37 4610 Retrieved from https en wikipedia org w index php title Algorithmic skeleton amp oldid 1190792141, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.