fbpx
Wikipedia

OpenCL

OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies programming languages (based on C99, C++14 and C++17) for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.

OpenCL API
Original author(s)Apple Inc.
Developer(s)Khronos Group
Initial releaseAugust 28, 2009; 13 years ago (2009-08-28)
Stable release
3.0.11[1] / May 6, 2022; 9 months ago (2022-05-06)
Written inC with C++ bindings
Operating systemAndroid (vendor dependent),[2] FreeBSD,[3] Linux, macOS (via Pocl), Windows
PlatformARMv7, ARMv8,[4] Cell, IA-32, Power, x86-64
TypeHeterogeneous computing API
LicenseOpenCL specification license
Websitewww.khronos.org/opencl/
OpenCL C/C++ and C++ for OpenCL
ParadigmImperative (procedural), structured, (C++ only) object-oriented, generic programming
FamilyC
Stable release
OpenCL C++ 1.0 revision V2.2-11[5]

OpenCL C 3.0 revision V3.0.11[6]

C++ for OpenCL 1.0 and 2021[7]

/ December 20, 2021; 13 months ago (2021-12-20)
Typing disciplineStatic, weak, manifest, nominal
Implementation languageImplementation specific
Filename extensions.cl .clcpp
Websitewww.khronos.org/opencl
Major implementations
AMD, Gallium Compute, IBM, Intel NEO, Intel SDK, Texas Instruments, Nvidia, POCL, Arm
Influenced by
C99, CUDA, C++14, C++17

OpenCL is an open standard maintained by the non-profit technology consortium Khronos Group. Conformant implementations are available from Altera, AMD, ARM, Creative, IBM, Imagination, Intel, Nvidia, Qualcomm, Samsung, Vivante, Xilinx, and ZiiLABS.[8][9]

Overview

OpenCL views a computing system as consisting of a number of compute devices, which might be central processing units (CPUs) or "accelerators" such as graphics processing units (GPUs), attached to a host processor (a CPU). It defines a C-like language for writing programs. Functions executed on an OpenCL device are called "kernels".[10]: 17  A single compute device typically consists of several compute units, which in turn comprise multiple processing elements (PEs). A single kernel execution can run on all or many of the PEs in parallel. How a compute device is subdivided into compute units and PEs is up to the vendor; a compute unit can be thought of as a "core", but the notion of core is hard to define across all the types of devices supported by OpenCL (or even within the category of "CPUs"),[11]: 49–50  and the number of compute units may not correspond to the number of cores claimed in vendors' marketing literature (which may actually be counting SIMD lanes).[12]

In addition to its C-like programming language, OpenCL defines an application programming interface (API) that allows programs running on the host to launch kernels on the compute devices and manage device memory, which is (at least conceptually) separate from host memory. Programs in the OpenCL language are intended to be compiled at run-time, so that OpenCL-using applications are portable between implementations for various host devices.[13] The OpenCL standard defines host APIs for C and C++; third-party APIs exist for other programming languages and platforms such as Python,[14] Java, Perl,[15] D[16] and .NET.[11]: 15  An implementation of the OpenCL standard consists of a library that implements the API for C and C++, and an OpenCL C compiler for the compute device(s) targeted.

In order to open the OpenCL programming model to other languages or to protect the kernel source from inspection, the Standard Portable Intermediate Representation (SPIR)[17] can be used as a target-independent way to ship kernels between a front-end compiler and the OpenCL back-end.

More recently Khronos Group has ratified SYCL,[18] a higher-level programming model for OpenCL as a single-source eDSL based on pure C++17 to improve programming productivity. People interested by C++ kernels but not by SYCL single-source programming style can use C++ features with compute kernel sources written in "C++ for OpenCL" language.[19]

Memory hierarchy

OpenCL defines a four-level memory hierarchy for the compute device:[13]

  • global memory: shared by all processing elements, but has high access latency (__global);
  • read-only memory: smaller, low latency, writable by the host CPU but not the compute devices (__constant);
  • local memory: shared by a group of processing elements (__local);
  • per-element private memory (registers; __private).

Not every device needs to implement each level of this hierarchy in hardware. Consistency between the various levels in the hierarchy is relaxed, and only enforced by explicit synchronization constructs, notably barriers.

Devices may or may not share memory with the host CPU.[13] The host API provides handles on device memory buffers and functions to transfer data back and forth between host and devices.

OpenCL kernel language

The programming language that is used to write compute kernels is called kernel language. OpenCL adopts C/C++-based languages to specify the kernel computations performed on the device with some restrictions and additions to facilitate efficient mapping to the heterogeneous hardware resources of accelerators. Traditionally OpenCL C was used to program the accelerators in OpenCL standard, later C++ for OpenCL kernel language was developed that inherited all functionality from OpenCL C but allowed to use C++ features in the kernel sources.

OpenCL C language

OpenCL C[20] is a C99-based language dialect adapted to fit the device model in OpenCL. Memory buffers reside in specific levels of the memory hierarchy, and pointers are annotated with the region qualifiers __global, __local, __constant, and __private, reflecting this. Instead of a device program having a main function, OpenCL C functions are marked __kernel to signal that they are entry points into the program to be called from the host program. Function pointers, bit fields and variable-length arrays are omitted, and recursion is forbidden.[21] The C standard library is replaced by a custom set of standard functions, geared toward math programming.

OpenCL C is extended to facilitate use of parallelism with vector types and operations, synchronization, and functions to work with work-items and work-groups.[21] In particular, besides scalar types such as float and double, which behave similarly to the corresponding types in C, OpenCL provides fixed-length vector types such as float4 (4-vector of single-precision floats); such vector types are available in lengths two, three, four, eight and sixteen for various base types.[20]: § 6.1.2  Vectorized operations on these types are intended to map onto SIMD instructions sets, e.g., SSE or VMX, when running OpenCL programs on CPUs.[13] Other specialized types include 2-d and 3-d image types.[20]: 10–11 

Example: matrix–vector multiplication

 
Each invocation (work-item) of the kernel takes a row of the green matrix (A in the code), multiplies this row with the red vector (x) and places the result in an entry of the blue vector (y). The number of columns n is passed to the kernel as ncols; the number of rows is implicit in the number of work-items produced by the host program.

The following is a matrix–vector multiplication algorithm in OpenCL C.

// Multiplies A*x, leaving the result in y. // A is a row-major matrix, meaning the (i,j) element is at A[i*ncols+j]. __kernel void matvec(__global const float *A, __global const float *x,  uint ncols, __global float *y) {  size_t i = get_global_id(0); // Global id, used as the row index  __global float const *a = &A[i*ncols]; // Pointer to the i'th row  float sum = 0.f; // Accumulator for dot product  for (size_t j = 0; j < ncols; j++) {  sum += a[j] * x[j];  }  y[i] = sum; } 

The kernel function matvec computes, in each invocation, the dot product of a single row of a matrix A and a vector x:

 .

To extend this into a full matrix–vector multiplication, the OpenCL runtime maps the kernel over the rows of the matrix. On the host side, the clEnqueueNDRangeKernel function does this; it takes as arguments the kernel to execute, its arguments, and a number of work-items, corresponding to the number of rows in the matrix A.

Example: computing the FFT

This example will load a fast Fourier transform (FFT) implementation and execute it. The implementation is shown below.[22] The code asks the OpenCL library for the first available graphics card, creates memory buffers for reading and writing (from the perspective of the graphics card), JIT-compiles the FFT-kernel and then finally asynchronously runs the kernel. The result from the transform is not read in this example.

#include <stdio.h> #include <time.h> #include "CL/opencl.h" #define NUM_ENTRIES 1024 int main() // (int argc, const char* argv[]) {  // CONSTANTS  // The source code of the kernel is represented as a string  // located inside file: "fft1D_1024_kernel_src.cl". For the details see the next listing.  const char *KernelSource =  #include "fft1D_1024_kernel_src.cl"  ;  // Looking up the available GPUs  const cl_uint num = 1;  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 0, NULL, (cl_uint*)&num);  cl_device_id devices[1];  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, num, devices, NULL);  // create a compute context with GPU device  cl_context context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);  // create a command queue  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_DEFAULT, 1, devices, NULL);  cl_command_queue queue = clCreateCommandQueue(context, devices[0], 0, NULL);  // allocate the buffer memory objects  cl_mem memobjs[] = { clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(float) * 2 * NUM_ENTRIES, NULL, NULL),  clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(float) * 2 * NUM_ENTRIES, NULL, NULL) };  // create the compute program  // const char* fft1D_1024_kernel_src[1] = { };  cl_program program = clCreateProgramWithSource(context, 1, (const char **)& KernelSource, NULL, NULL);  // build the compute program executable  clBuildProgram(program, 0, NULL, NULL, NULL, NULL);  // create the compute kernel  cl_kernel kernel = clCreateKernel(program, "fft1D_1024", NULL);  // set the args values  size_t local_work_size[1] = { 256 };  clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&memobjs[0]);  clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&memobjs[1]);  clSetKernelArg(kernel, 2, sizeof(float)*(local_work_size[0] + 1) * 16, NULL);  clSetKernelArg(kernel, 3, sizeof(float)*(local_work_size[0] + 1) * 16, NULL);  // create N-D range object with work-item dimensions and execute kernel  size_t global_work_size[1] = { 256 };    global_work_size[0] = NUM_ENTRIES;  local_work_size[0] = 64; //Nvidia: 192 or 256  clEnqueueNDRangeKernel(queue, kernel, 1, NULL, global_work_size, local_work_size, 0, NULL, NULL); } 

The actual calculation inside file "fft1D_1024_kernel_src.cl" (based on Fitting FFT onto the G80 Architecture):[23]

R"(  // This kernel computes FFT of length 1024. The 1024 length FFT is decomposed into  // calls to a radix 16 function, another radix 16 function and then a radix 4 function  __kernel void fft1D_1024 (__global float2 *in, __global float2 *out,  __local float *sMemx, __local float *sMemy) {  int tid = get_local_id(0);  int blockIdx = get_group_id(0) * 1024 + tid;  float2 data[16];  // starting index of data to/from global memory  in = in + blockIdx; out = out + blockIdx;  globalLoads(data, in, 64); // coalesced global reads  fftRadix16Pass(data); // in-place radix-16 pass  twiddleFactorMul(data, tid, 1024, 0);  // local shuffle using local memory  localShuffle(data, sMemx, sMemy, tid, (((tid & 15) * 65) + (tid >> 4)));  fftRadix16Pass(data); // in-place radix-16 pass  twiddleFactorMul(data, tid, 64, 4); // twiddle factor multiplication  localShuffle(data, sMemx, sMemy, tid, (((tid >> 4) * 64) + (tid & 15)));  // four radix-4 function calls  fftRadix4Pass(data); // radix-4 function number 1  fftRadix4Pass(data + 4); // radix-4 function number 2  fftRadix4Pass(data + 8); // radix-4 function number 3  fftRadix4Pass(data + 12); // radix-4 function number 4  // coalesced global writes  globalStores(data, out, 64);  } )" 

A full, open source implementation of an OpenCL FFT can be found on Apple's website.[24]

C++ for OpenCL language

In 2020, Khronos announced[25] the transition to the community driven C++ for OpenCL programming language[26] that provides features from C++17 in combination with the traditional OpenCL C features. This language allows to leverage a rich variety of language features from standard C++ while preserving backward compatibility to OpenCL C. This opens up a smooth transition path to C++ functionality for the OpenCL kernel code developers as they can continue using familiar programming flow and even tools as well as leverage existing extensions and libraries available for OpenCL C.

The language semantics is described in the documentation published in the releases of OpenCL-Docs[27] repository hosted by the Khronos Group but it is currently not ratified by the Khronos Group. The C++ for OpenCL language is not documented in a stand-alone document and it is based on the specification of C++ and OpenCL C. The open source Clang compiler has supported C++ for OpenCL since release 9.[28]

C++ for OpenCL has been originally developed as a Clang compiler extension and appeared in the release 9.[29] As it was tightly coupled with OpenCL C and did not contain any Clang specific functionality its documentation has been re-hosted to the OpenCL-Docs repository[27] from the Khronos Group along with the sources of other specifications and reference cards. The first official release of this document describing C++ for OpenCL version 1.0 has been published in December 2020.[30] C++ for OpenCL 1.0 contains features from C++17 and it is backward compatible with OpenCL C 2.0. In December 2021 a new provisional C++ for OpenCL version 2021 has been released which is fully compatible with the OpenCL 3.0 standard.[31] A work in progress draft of the latest C++ for OpenCL documentation can be found on the Khronos website.[32]

Features

C++ for OpenCL supports most of the features (syntactically and semantically) from OpenCL C except for nested parallelism and blocks.[33] However, there are minor differences in some supported features mainly related to differences in semantics between C++ and C. For example, C++ is more strict with the implicit type conversions and it does not support the restrict type qualifier.[33] The following C++ features are not supported by C++ for OpenCL: virtual functions, dynamic_cast operator, non-placement new/delete operators, exceptions, pointer to member functions, references to functions, C++ standard libraries.[33] C++ for OpenCL extends the concept of separate memory regions (address spaces) from OpenCL C to C++ features - functional casts, templates, class members, references, lambda functions,  operators. Most of C++ features are not available for the kernel functions e.g. overloading or templating, arbitrary class layout in parameter type.[33]

Example: complex-number arithmetic

The following code snippet illustrates how kernels with complex-number arithmetic can be implemented in C++ for OpenCL language with convenient use of C++ features.

// Define a class Complex, that can perform complex-number computations with // various precision when different types for T are used - double, float, half. template<typename T> class complex_t {  T m_re; // Real component.  T m_im; // Imaginary component. public:  complex_t(T re, T im): m_re{re}, m_im{im} {};  // Define operator for complex-number multiplication.  complex_t operator*(const complex_t &other) const  {  return {m_re * other.m_re - m_im * other.m_im,   m_re * other.m_im + m_im * other.m_re};  }  T get_re() const { return m_re; }  T get_im() const { return m_im; } }; // A helper function to compute multiplication over complex numbers read from // the input buffer and to store the computed result into the output buffer. template<typename T> void compute_helper(__global T *in, __global T *out) {  auto idx = get_global_id(0);   // Every work-item uses 4 consecutive items from the input buffer  // - two for each complex number.  auto offset = idx * 4;  auto num1 = complex_t{in[offset], in[offset + 1]};  auto num2 = complex_t{in[offset + 2], in[offset + 3]};  // Perform complex-number multiplication.  auto res = num1 * num2;  // Every work-item writes 2 consecutive items to the output buffer.  out[idx * 2] = res.get_re();  out[idx * 2 + 1] = res.get_im(); } // This kernel is used for complex-number multiplication in single precision. __kernel void compute_sp(__global float *in, __global float *out) {  compute_helper(in, out); } #ifdef cl_khr_fp16 // This kernel is used for complex-number multiplication in half precision when // it is supported by the device. #pragma OPENCL EXTENSION cl_khr_fp16: enable __kernel void compute_hp(__global half *in, __global half *out) {  compute_helper(in, out);  } #endif 

Tooling and Execution Environment

C++ for OpenCL language can be used for the same applications or libraries and in the same way as OpenCL C language is used. Due to the rich variety of C++ language features, applications written in C++ for OpenCL can express complex functionality more conveniently than applications written in OpenCL C and in particular generic programming paradigm from C++ is very attractive to the library developers.

C++ for OpenCL sources can be compiled by OpenCL drivers that support cl_ext_cxx_for_opencl extension.[34] Arm has announced support for this extension in December 2020.[35] However, due to increasing complexity of the algorithms accelerated on OpenCL devices, it is expected that more applications will compile C++ for OpenCL kernels offline using stand alone compilers such as Clang[36] into executable binary format or portable binary format e.g. SPIR-V.[37] Such an executable can be loaded during the OpenCL applications execution using a dedicated OpenCL API.[38]

Binaries compiled from sources in C++ for OpenCL 1.0 can be executed on OpenCL 2.0 conformant devices. Depending on the language features used in such kernel sources it can also be executed on devices supporting earlier OpenCL versions or OpenCL 3.0.

Aside from OpenCL drivers kernels written in C++ for OpenCL can be compiled for execution on Vulkan devices using clspv[39] compiler and clvk[40] runtime layer just the same way as OpenCL C kernels.

Contributions

C++ for OpenCL is an open language developed by the community of contributors listed in its documentation.[32]  New contributions to the language semantic definition or open source tooling support are accepted from anyone interested as soon as they are aligned with the main design philosophy and they are reviewed and approved by the experienced contributors.[19]

History

OpenCL was initially developed by Apple Inc., which holds trademark rights, and refined into an initial proposal in collaboration with technical teams at AMD, IBM, Qualcomm, Intel, and Nvidia. Apple submitted this initial proposal to the Khronos Group. On June 16, 2008, the Khronos Compute Working Group was formed[41] with representatives from CPU, GPU, embedded-processor, and software companies. This group worked for five months to finish the technical details of the specification for OpenCL 1.0 by November 18, 2008.[42] This technical specification was reviewed by the Khronos members and approved for public release on December 8, 2008.[43]

OpenCL 1.0

OpenCL 1.0 released with Mac OS X Snow Leopard on August 28, 2009. According to an Apple press release:[44]

Snow Leopard further extends support for modern hardware with Open Computing Language (OpenCL), which lets any application tap into the vast gigaflops of GPU computing power previously available only to graphics applications. OpenCL is based on the C programming language and has been proposed as an open standard.

AMD decided to support OpenCL instead of the now deprecated Close to Metal in its Stream framework.[45][46] RapidMind announced their adoption of OpenCL underneath their development platform to support GPUs from multiple vendors with one interface.[47] On December 9, 2008, Nvidia announced its intention to add full support for the OpenCL 1.0 specification to its GPU Computing Toolkit.[48] On October 30, 2009, IBM released its first OpenCL implementation as a part of the XL compilers.[49]

Acceleration of calculations with factor to 1000 are possible with OpenCL in graphic cards against normal CPU.[50] Some important features of next Version of OpenCL are optional in 1.0 like double- or half-precision operations.[51]

OpenCL 1.1

OpenCL 1.1 was ratified by the Khronos Group on June 14, 2010[52] and adds significant functionality for enhanced parallel programming flexibility, functionality, and performance including:

  • New data types including 3-component vectors and additional image formats;
  • Handling commands from multiple host threads and processing buffers across multiple devices;
  • Operations on regions of a buffer including read, write and copy of 1D, 2D, or 3D rectangular regions;
  • Enhanced use of events to drive and control command execution;
  • Additional OpenCL built-in C functions such as integer clamp, shuffle, and asynchronous strided copies;
  • Improved OpenGL interoperability through efficient sharing of images and buffers by linking OpenCL and OpenGL events.

OpenCL 1.2

On November 15, 2011, the Khronos Group announced the OpenCL 1.2 specification,[53] which added significant functionality over the previous versions in terms of performance and features for parallel programming. Most notable features include:

  • Device partitioning: the ability to partition a device into sub-devices so that work assignments can be allocated to individual compute units. This is useful for reserving areas of the device to reduce latency for time-critical tasks.
  • Separate compilation and linking of objects: the functionality to compile OpenCL into external libraries for inclusion into other programs.
  • Enhanced image support (optional): 1.2 adds support for 1D images and 1D/2D image arrays. Furthermore, the OpenGL sharing extensions now allow for OpenGL 1D textures and 1D/2D texture arrays to be used to create OpenCL images.
  • Built-in kernels: custom devices that contain specific unique functionality are now integrated more closely into the OpenCL framework. Kernels can be called to use specialised or non-programmable aspects of underlying hardware. Examples include video encoding/decoding and digital signal processors.
  • DirectX functionality: DX9 media surface sharing allows for efficient sharing between OpenCL and DX9 or DXVA media surfaces. Equally, for DX11, seamless sharing between OpenCL and DX11 surfaces is enabled.
  • The ability to force IEEE 754 compliance for single-precision floating-point math: OpenCL by default allows the single-precision versions of the division, reciprocal, and square root operation to be less accurate than the correctly rounded values that IEEE 754 requires.[54] If the programmer passes the "-cl-fp32-correctly-rounded-divide-sqrt" command line argument to the compiler, these three operations will be computed to IEEE 754 requirements if the OpenCL implementation supports this, and will fail to compile if the OpenCL implementation does not support computing these operations to their correctly-rounded values as defined by the IEEE 754 specification.[54] This ability is supplemented by the ability to query the OpenCL implementation to determine if it can perform these operations to IEEE 754 accuracy.[54]

OpenCL 2.0

On November 18, 2013, the Khronos Group announced the ratification and public release of the finalized OpenCL 2.0 specification.[55] Updates and additions to OpenCL 2.0 include:

  • Shared virtual memory
  • Nested parallelism
  • Generic address space
  • Images (optional, include 3D-Image)
  • C11 atomics
  • Pipes
  • Android installable client driver extension
  • half precision extended with optional cl_khr_fp16 extension
  • cl_double: double precision IEEE 754 (optional)

OpenCL 2.1

The ratification and release of the OpenCL 2.1 provisional specification was announced on March 3, 2015 at the Game Developer Conference in San Francisco. It was released on November 16, 2015.[56] It introduced the OpenCL C++ kernel language, based on a subset of C++14, while maintaining support for the preexisting OpenCL C kernel language. Vulkan and OpenCL 2.1 share SPIR-V as an intermediate representation allowing high-level language front-ends to share a common compilation target. Updates to the OpenCL API include:

  • Additional subgroup functionality
  • Copying of kernel objects and states
  • Low-latency device timer queries
  • Ingestion of SPIR-V code by runtime
  • Execution priority hints for queues
  • Zero-sized dispatches from host

AMD, ARM, Intel, HPC, and YetiWare have declared support for OpenCL 2.1.[57][58]

OpenCL 2.2

OpenCL 2.2 brings the OpenCL C++ kernel language into the core specification for significantly enhanced parallel programming productivity.[59][60][61] It was released on May 16, 2017.[62] Maintenance Update released in May 2018 with bugfixes.[63]

  • The OpenCL C++ kernel language is a static subset of the C++14 standard and includes classes, templates, lambda expressions, function overloads and many other constructs for generic and meta-programming.
  • Uses the new Khronos SPIR-V 1.1 intermediate language which fully supports the OpenCL C++ kernel language.
  • OpenCL library functions can now use the C++ language to provide increased safety and reduced undefined behavior while accessing features such as atomics, iterators, images, samplers, pipes, and device queue built-in types and address spaces.
  • Pipe storage is a new device-side type in OpenCL 2.2 that is useful for FPGA implementations by making connectivity size and type known at compile time, enabling efficient device-scope communication between kernels.
  • OpenCL 2.2 also includes features for enhanced optimization of generated code: applications can provide the value of specialization constant at SPIR-V compilation time, a new query can detect non-trivial constructors and destructors of program scope global objects, and user callbacks can be set at program release time.
  • Runs on any OpenCL 2.0-capable hardware (only a driver update is required).

OpenCL 3.0

The OpenCL 3.0 specification was released on September 30, 2020 after being in preview since April 2020. OpenCL 1.2 functionality has become a mandatory baseline, while all OpenCL 2.x and OpenCL 3.0 features were made optional. The specification retains the OpenCL C language and deprecates the OpenCL C++ Kernel Language, replacing it with the C++ for OpenCL language[19] based on a Clang/LLVM compiler which implements a subset of C++17 and SPIR-V intermediate code.[64][65][66] Version 3.0.7 of C++ for OpenCL with some Khronos openCL extensions were presented at IWOCL 21.[67] Actual is 3.0.11 with some new extensions and corrections. NVIDIA, working closely with the Khronos OpenCL Working Group, improved Vulkan Interop with semaphores and memory sharing.[68]

Roadmap

 
The International Workshop on OpenCL (IWOCL) held by the Khronos Group

When releasing OpenCL 2.2, the Khronos Group announced that OpenCL would converge where possible with Vulkan to enable OpenCL software deployment flexibility over both APIs.[69][70] This has been now demonstrated by Adobe's Premiere Rush using the clspv[39] open source compiler to compile significant amounts of OpenCL C kernel code to run on a Vulkan runtime for deployment on Android.[71] OpenCL has a forward looking roadmap independent of Vulkan, with 'OpenCL Next' under development and targeting release in 2020. OpenCL Next may integrate extensions such as Vulkan / OpenCL Interop, Scratch-Pad Memory Management, Extended Subgroups, SPIR-V 1.4 ingestion and SPIR-V Extended debug info. OpenCL is also considering Vulkan-like loader and layers and a ‘Flexible Profile’ for deployment flexibility on multiple accelerator types.[72]

Open source implementations

 
clinfo, a command-line tool to see OpenCL information

OpenCL consists of a set of headers and a shared object that is loaded at runtime. An installable client driver (ICD) must be installed on the platform for every class of vendor for which the runtime would need to support. That is, for example, in order to support Nvidia devices on a Linux platform, the Nvidia ICD would need to be installed such that the OpenCL runtime (the ICD loader) would be able to locate the ICD for the vendor and redirect the calls appropriately. The standard OpenCL header is used by the consumer application; calls to each function are then proxied by the OpenCL runtime to the appropriate driver using the ICD. Each vendor must implement each OpenCL call in their driver.[73]

The Apple,[74] Nvidia,[75] ROCm, RapidMind[76] and Gallium3D[77] implementations of OpenCL are all based on the LLVM Compiler technology and use the Clang compiler as their frontend.

MESA Gallium Compute
An implementation of OpenCL (actual 1.1 incomplete, mostly done AMD Radeon GCN) for a number of platforms is maintained as part of the Gallium Compute Project,[78] which builds on the work of the Mesa project to support multiple platforms. Formerly this was known as CLOVER.,[79] actual development: mostly support for running incomplete framework with actual LLVM and CLANG, some new features like fp16 in 17.3,[80] Target complete OpenCL 1.0, 1.1 and 1.2 for AMD and Nvidia. New Basic Development is done by Red Hat with SPIR-V also for Clover.[81][82] New Target is modular OpenCL 3.0 with full support of OpenCL 1.2. Actual state is available in Mesamatrix. Image supports are here in the focus of development.

RustiCL is a new implementation for Gallium compute with Rust instead of C for better code. In Mesa 22.2 experimental implementation will be available with openCL 3.0-support and image extension implementation for programs like Darktable.[83]

BEIGNET
An implementation by Intel for its Ivy Bridge + hardware was released in 2013.[84] This software from Intel's China Team, has attracted criticism from developers at AMD and Red Hat,[85] as well as Michael Larabel of Phoronix.[86] Actual Version 1.3.2 support OpenCL 1.2 complete (Ivy Bridge and higher) and OpenCL 2.0 optional for Skylake and newer.[87][88] support for Android has been added to Beignet.,[89] actual development targets: only support for 1.2 and 2.0, road to OpenCL 2.1, 2.2, 3.0 is gone to NEO.
NEO
An implementation by Intel for Gen. 8 Broadwell + Gen. 9 hardware released in 2018.[90] This driver replaces Beignet implementation for supported platforms (not older 6.gen to Haswell). NEO provides OpenCL 2.1 support on Core platforms and OpenCL 1.2 on Atom platforms.[91] Actual in 2020 also Graphic Gen 11 Ice Lake and Gen 12 Tiger Lake are supported. New OpenCL 3.0 is available for Alder Lake, Tiger Lake to Broadwell with Version 20.41+. It includes now optional OpenCL 2.0, 2.1 Features complete and some of 2.2.
ROCm
Created as part of AMD's GPUOpen, ROCm (Radeon Open Compute) is an open source Linux project built on OpenCL 1.2 with language support for 2.0. The system is compatible with all modern AMD CPUs and APUs (actual partly GFX 7, GFX 8 and 9), as well as Intel Gen7.5+ CPUs (only with PCI 3.0).[92][93] With version 1.9 support is in some points extended experimental to Hardware with PCIe 2.0 and without atomics. An overview of actual work is done on XDC2018.[94][95] ROCm Version 2.0 supports Full OpenCL 2.0, but some errors and limitations are on the todo list.[96][97] Version 3.3 is improving in details.[98] Version 3.5 does support OpenCL 2.2.[99] Version 3.10 was with improvements and new APIs.[100] Announced at SC20 is ROCm 4.0 with support of AMD Compute Card Instinct MI 100.[101] Actual documentation of 5.1.1 and before is available at github.[102][103] OpenCL 3.0 is available.
POCL
A portable implementation supporting CPUs and some GPUs (via CUDA and HSA). Building on Clang and LLVM.[104] With version 1.0 OpenCL 1.2 was nearly fully implemented along with some 2.x features.[105] Version 1.2 is with LLVM/CLANG 6.0, 7.0 and Full OpenCL 1.2 support with all closed tickets in Milestone 1.2.[105][106] OpenCL 2.0 is nearly full implemented.[107] Version 1.3 Supports Mac OS X.[108] Version 1.4 includes support for LLVM 8.0 and 9.0.[109] Version 1.5 implements LLVM/Clang 10 support.[110] Version 1.6 implements LLVM/Clang 11 support and CUDA Acceleration.[111] Actual targets are complete OpenCL 2.x, OpenCL 3.0 and improvement of performance. POCL 1.6 is with manual optimization at the same level of Intel compute runtime.[112] Version 1.7 implements LLVM/Clang 12 support and some new OpenCL 3.0 features.[113] Version 1.8 implements LLVM/Clang 13 support.[114] Version 3.0 implements OpenCL 3.0 at minimum level and LLVM/Clang 14.[115]
Shamrock
A Port of Mesa Clover for ARM with full support of OpenCL 1.2,[116][117] no actual development for 2.0.
FreeOCL
A CPU focused implementation of OpenCL 1.2 that implements an external compiler to create a more reliable platform,[118] no actual development.
MOCL
An OpenCL implementation based on POCL by the NUDT researchers for Matrix-2000 was released in 2018. The Matrix-2000 architecture is designed to replace the Intel Xeon Phi accelerators of the TianHe-2 supercomputer. This programming framework is built on top of LLVM v5.0 and reuses some code pieces from POCL as well. To unlock the hardware potential, the device runtime uses a push-based task dispatching strategy and the performance of the kernel atomics is improved significantly. This framework has been deployed on the TH-2A system and is readily available to the public.[119] Some of the software will next ported to improve POCL.[105]
VC4CL
An OpenCL 1.2 implementation for the VideoCore IV (BCM2763) processor used in the Raspberry Pi before its model 4.[120]

Vendor implementations

Timeline of vendor implementations

  • June, 2008: During Apple’s WWDC conference an early beta of Mac OS X Snow Leopard was made available to the participants, it included the first beta implementation of OpenCL, about 6 months before the final version 1.0 specification was ratified late 2008. They also showed two demos. One was a grid of 8x8 screens rendered, each displaying the screen of an emulated Apple II machine — 64 independent instances in total, each running a famous karate game. This showed task parallelism, on the CPU. The other demo was a N-body simulation running on the GPU of a Mac Pro, a data parallel task.
  • December 10, 2008: AMD and Nvidia held the first public OpenCL demonstration, a 75-minute presentation at SIGGRAPH Asia 2008. AMD showed a CPU-accelerated OpenCL demo explaining the scalability of OpenCL on one or more cores while Nvidia showed a GPU-accelerated demo.[121][122]
  • March 16, 2009: at the 4th Multicore Expo, Imagination Technologies announced the PowerVR SGX543MP, the first GPU of this company to feature OpenCL support.[123]
  • March 26, 2009: at GDC 2009, AMD and Havok demonstrated the first working implementation for OpenCL accelerating Havok Cloth on AMD Radeon HD 4000 series GPU.[124]
  • April 20, 2009: Nvidia announced the release of its OpenCL driver and SDK to developers participating in its OpenCL Early Access Program.[125]
  • August 5, 2009: AMD unveiled the first development tools for its OpenCL platform as part of its ATI Stream SDK v2.0 Beta Program.[126]
  • August 28, 2009: Apple released Mac OS X Snow Leopard, which contains a full implementation of OpenCL.[127]
  • September 28, 2009: Nvidia released its own OpenCL drivers and SDK implementation.
  • October 13, 2009: AMD released the fourth beta of the ATI Stream SDK 2.0, which provides a complete OpenCL implementation on both R700/R800 GPUs and SSE3 capable CPUs. The SDK is available for both Linux and Windows.[128]
  • November 26, 2009: Nvidia released drivers for OpenCL 1.0 (rev 48).
  • October 27, 2009: S3 released their first product supporting native OpenCL 1.0 – the Chrome 5400E embedded graphics processor.[129]
  • December 10, 2009: VIA released their first product supporting OpenCL 1.0 – ChromotionHD 2.0 video processor included in VN1000 chipset.[130]
  • December 21, 2009: AMD released the production version of the ATI Stream SDK 2.0,[131] which provides OpenCL 1.0 support for R800 GPUs and beta support for R700 GPUs.
  • June 1, 2010: ZiiLABS released details of their first OpenCL implementation for the ZMS processor for handheld, embedded and digital home products.[132]
  • June 30, 2010: IBM released a fully conformant version of OpenCL 1.0.[4]
  • September 13, 2010: Intel released details of their first OpenCL implementation for the Sandy Bridge chip architecture. Sandy Bridge will integrate Intel's newest graphics chip technology directly onto the central processing unit.[133]
  • November 15, 2010: Wolfram Research released Mathematica 8 with OpenCLLink package.
  • March 3, 2011: Khronos Group announces the formation of the WebCL working group to explore defining a JavaScript binding to OpenCL. This creates the potential to harness GPU and multi-core CPU parallel processing from a Web browser.[134][135]
  • March 31, 2011: IBM released a fully conformant version of OpenCL 1.1.[4][136]
  • April 25, 2011: IBM released OpenCL Common Runtime v0.1 for Linux on x86 Architecture.[137]
  • May 4, 2011: Nokia Research releases an open source WebCL extension for the Firefox web browser, providing a JavaScript binding to OpenCL.[138]
  • July 1, 2011: Samsung Electronics releases an open source prototype implementation of WebCL for WebKit, providing a JavaScript binding to OpenCL.[139]
  • August 8, 2011: AMD released the OpenCL-driven AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK) v2.5, replacing the ATI Stream SDK as technology and concept.[140]
  • December 12, 2011: AMD released AMD APP SDK v2.6[141] which contains a preview of OpenCL 1.2.
  • February 27, 2012: The Portland Group released the PGI OpenCL compiler for multi-core ARM CPUs.[142]
  • April 17, 2012: Khronos released a WebCL working draft.[143]
  • May 6, 2013: Altera released the Altera SDK for OpenCL, version 13.0.[144] It is conformant to OpenCL 1.0.[145]
  • November 18, 2013: Khronos announced that the specification for OpenCL 2.0 had been finalized.[146]
  • March 19, 2014: Khronos releases the WebCL 1.0 specification.[147][148]
  • August 29, 2014: Intel releases HD Graphics 5300 driver that supports OpenCL 2.0.[149]
  • September 25, 2014: AMD releases Catalyst 14.41 RC1, which includes an OpenCL 2.0 driver.[150]
  • January 14, 2015: Xilinx Inc. announces SDAccel development environment for OpenCL, C, and C++, achieves Khronos Conformance.[151]
  • April 13, 2015: Nvidia releases WHQL driver v350.12, which includes OpenCL 1.2 support for GPUs based on Kepler or later architectures.[152] Driver 340+ support OpenCL 1.1 for Tesla and Fermi.
  • August 26, 2015: AMD released AMD APP SDK v3.0[153] which contains full support of OpenCL 2.0 and sample coding.
  • November 16, 2015: Khronos announced that the specification for OpenCL 2.1 had been finalized.[154]
  • April 18, 2016: Khronos announced that the specification for OpenCL 2.2 had been provisionally finalized.[60]
  • November 3, 2016 Intel support for Gen7+ of OpenCL 2.1 in SDK 2016 r3.[155]
  • February 17, 2017: Nvidia begins evaluation support of OpenCL 2.0 with driver 378.66.[156][157][158]
  • May 16, 2017: Khronos announced that the specification for OpenCL 2.2 had been finalized with SPIR-V 1.2.[159]
  • May 14, 2018: Khronos announced Maintenance Update for OpenCL 2.2 with Bugfix and unified headers.[63]
  • April 27, 2020: Khronos announced provisional Version of OpenCL 3.0.
  • June 1, 2020: Intel Neo Runtime with OpenCL 3.0 for new Tiger Lake.
  • June 3, 2020: AMD announced RocM 3.5 with OpenCL 2.2 support.[160]
  • September 30, 2020: Khronos announced that the specifications for OpenCL 3.0 had been finalized (CTS also available).
  • October 16, 2020: Intel announced with Neo 20.41 support for OpenCL 3.0 (includes mostly of optional OpenCL 2.x).
  • April 6, 2021: Nvidia supports OpenCL 3.0 for Ampere. Maxwell and later GPUs also supports OpenCL 3.0 with Nvidia driver 465+.[161]
  • August 20, 2022: Intel Arc Alchemist GPUs (Arc A380, A350M, A370M, A550M, A730M and A770M) are conformant with OpenCL 3.0.[162]
  • October 14, 2022: Arm Mali-G615 and Mali-G715 Immortalis are conformant with OpenCL 3.0.[162]
  • November 11, 2022: The Rusticl OpenCL Library is conformant with OpenCL 3.0.[162][163]

Devices

As of 2016, OpenCL runs on graphics processing units (GPUs), CPUs with SIMD instructions, FPGAs, Movidius Myriad 2, Adapteva Epiphany and DSPs.

Khronos Conformance Test Suite

To be officially conformant, an implementation must pass the Khronos Conformance Test Suite (CTS), with results being submitted to the Khronos Adopters Program.[164] The Khronos CTS code for all OpenCL versions has been available in open source since 2017.[165]

Conformant products

The Khronos Group maintains an extended list of OpenCL-conformant products.[4]

Synopsis of OpenCL conformant products[4]
AMD SDKs (supports OpenCL CPU and APU devices), (GPU: Terascale 1: OpenCL 1.1, Terascale 2: 1.2, GCN 1: 1.2+, GCN 2+: 2.0+) X86 + SSE2 (or higher) compatible CPUs 64-bit & 32-bit,[166] Linux 2.6 PC, Windows Vista/7/8.x/10 PC AMD Fusion E-350, E-240, C-50, C-30 with HD 6310/HD 6250 AMD Radeon/Mobility HD 6800, HD 5x00 series GPU, iGPU HD 6310/HD 6250, HD 7xxx, HD 8xxx, R2xx, R3xx, RX 4xx, RX 5xx, Vega Series AMD FirePro Vx800 series GPU and later, Radeon Pro
Intel SDK for OpenCL Applications 2013[167] (supports Intel Core processors and Intel HD Graphics 4000/2500) 2017 R2 with OpenCL 2.1 (Gen7+), SDK 2019 removed OpenCL 2.1,[168] Actual SDK 2020 update 3 Intel CPUs with SSE 4.1, SSE 4.2 or AVX support.[169][170] Microsoft Windows, Linux Intel Core i7, i5, i3; 2nd Generation Intel Core i7/5/3, 3rd Generation Intel Core Processors with Intel HD Graphics 4000/2500 and newer Intel Core 2 Solo, Duo Quad, Extreme and newer Intel Xeon 7x00,5x00,3x00 (Core based) and newer
IBM Servers with OpenCL Development Kit for Linux on Power running on Power VSX[171][172] IBM Power 775 (PERCS), 750 IBM BladeCenter PS70x Express IBM BladeCenter JS2x, JS43 IBM BladeCenter QS22
IBM OpenCL Common Runtime (OCR)

[173]

X86 + SSE2 (or higher) compatible CPUs 64-bit & 32-bit;[174] Linux 2.6 PC AMD Fusion, Nvidia Ion and Intel Core i7, i5, i3; 2nd Generation Intel Core i7/5/3 AMD Radeon, Nvidia GeForce and Intel Core 2 Solo, Duo, Quad, Extreme ATI FirePro, Nvidia Quadro and Intel Xeon 7x00,5x00,3x00 (Core based)
Nvidia OpenCL Driver and Tools,[175] Chips: Tesla : OpenCL 1.1(Driver 340), Fermi : OpenCL 1.1(Driver 390), Kepler : OpenCL 1.2 (Driver 470), OpenCL 2.0 beta (378.66), OpenCL 3.0: Maxwell to Ada Lovelace (Driver 525+) Nvidia Tesla C/D/S Nvidia GeForce GTS/GT/GTX, Nvidia Ion Nvidia Quadro FX/NVX/Plex, Quadro, Quadro K, Quadro M, Quadro P, Quadro with Volta, Quadro RTX with Turing, Ampere

All standard-conformant implementations can be queried using one of the clinfo tools (there are multiple tools with the same name and similar feature set).[176][177][178]

Version support

Products and their version of OpenCL support include:[179]

OpenCL 3.0 support

All hardware with OpenCL 1.2+ is possible, OpenCL 2.x only optional, Khronos Test Suite available since 2020-10[180][181]

  • (2020) Intel NEO Compute: 20.41+ for Gen 12 Tiger Lake to Broadwell (include full 2.0 and 2.1 support and parts of 2.2)[182]
  • (2020) Intel 6th, 7th, 8th, 9th, 10th, 11th gen processors (Skylake, Kaby Lake, Coffee Lake, Comet Lake, Ice Lake, Tiger Lake) with latest Intel Windows graphics driver
  • (2021) Intel 11th, 12th gen processors (Rocket Lake, Alder Lake) with latest Intel Windows graphics driver
  • (2021) Arm Mali-G78, Mali-G310, Mali-G510, Mali-G610, Mali-G710 and Mali-G78AE.
  • (2022) Intel 13th gen processors (Raptor Lake) with latest Intel Windows graphics driver
  • (2022) Intel Arc discrete graphics with latest Intel Arc Windows graphics driver
  • (2021) Nvidia Maxwell, Pascal, Volta, Turing and Ampere with Nvidia graphics driver 465+.[161]
  • (2022) Nvidia Ada Lovelace with Nvidia graphics driver 525+.
  • (2022) Samsung Xclipse 920 GPU (based on AMD RDNA2)

OpenCL 2.2 support

None yet: Khronos Test Suite ready, with Driver Update all Hardware with 2.0 and 2.1 support possible

  • Intel NEO Compute: Work in Progress for actual products[183]
  • ROCm: Version 3.5+ mostly

OpenCL 2.1 support

OpenCL 2.0 support

  • (2011+) AMD GCN GPU's (HD 7700+/HD 8000/Rx 200/Rx 300/Rx 400/Rx 500/Rx 5000-Series), some GCN 1st Gen only 1.2 with some Extensions
  • (2013+) AMD GCN APU's (Jaguar, Steamroller, Puma, Excavator & Zen-based)
  • (2014+) Intel 5th & 6th gen processors (Broadwell, Skylake)
  • (2015+) Qualcomm Adreno 5xx series
  • (2018+) Qualcomm Adreno 6xx series
  • (2017+) ARM Mali (Bifrost) G51 and G71 in Android 7.1 and Linux
  • (2018+) ARM Mali (Bifrost) G31, G52, G72 and G76
  • (2017+) incomplete Evaluation support: Nvidia Kepler, Maxwell, Pascal, Volta and Turing GPU's (GeForce 600, 700, 800, 900 & 10-series, Quadro K-, M- & P-series, Tesla K-, M- & P-series) with Driver Version 378.66+

OpenCL 1.2 support

  • (2011+) for some AMD GCN 1st Gen some OpenCL 2.0 Features not possible today, but many more Extensions than Terascale
  • (2009+) AMD TeraScale 2 & 3 GPU's (RV8xx, RV9xx in HD 5000, 6000 & 7000 Series)
  • (2011+) AMD TeraScale APU's (K10, Bobcat & Piledriver-based)
  • (2012+) Nvidia Kepler, Maxwell, Pascal, Volta and Turing GPU's (GeForce 600, 700, 800, 900, 10, 16, 20 series, Quadro K-, M- & P-series, Tesla K-, M- & P-series)
  • (2012+) Intel 3rd & 4th gen processors (Ivy Bridge, Haswell)
  • (2013+) Qualcomm Adreno 4xx series
  • (2013+) ARM Mali Midgard 3rd gen (T760)
  • (2015+) ARM Mali Midgard 4th gen (T8xx)

OpenCL 1.1 support

  • (2008+) some AMD TeraScale 1 GPU's (RV7xx in HD4000-series)
  • (2008+) Nvidia Tesla, Fermi GPU's (GeForce 8, 9, 100, 200, 300, 400, 500-series, Quadro-series or Tesla-series with Tesla or Fermi GPU)
  • (2011+) Qualcomm Adreno 3xx series
  • (2012+) ARM Mali Midgard 1st and 2nd gen (T-6xx, T720)

OpenCL 1.0 support

  • mostly updated to 1.1 and 1.2 after first Driver for 1.0 only

Portability, performance and alternatives

A key feature of OpenCL is portability, via its abstracted memory and execution model, and the programmer is not able to directly use hardware-specific technologies such as inline Parallel Thread Execution (PTX) for Nvidia GPUs unless they are willing to give up direct portability on other platforms. It is possible to run any OpenCL kernel on any conformant implementation.

However, performance of the kernel is not necessarily portable across platforms. Existing implementations have been shown to be competitive when kernel code is properly tuned, though, and auto-tuning has been suggested as a solution to the performance portability problem,[184] yielding "acceptable levels of performance" in experimental linear algebra kernels.[185] Portability of an entire application containing multiple kernels with differing behaviors was also studied, and shows that portability only required limited tradeoffs.[186]

A study at Delft University from 2011 that compared CUDA programs and their straightforward translation into OpenCL C found CUDA to outperform OpenCL by at most 30% on the Nvidia implementation. The researchers noted that their comparison could be made fairer by applying manual optimizations to the OpenCL programs, in which case there was "no reason for OpenCL to obtain worse performance than CUDA". The performance differences could mostly be attributed to differences in the programming model (especially the memory model) and to NVIDIA's compiler optimizations for CUDA compared to those for OpenCL.[184]

Another study at D-Wave Systems Inc. found that "The OpenCL kernel’s performance is between about 13% and 63% slower, and the end-to-end time is between about 16% and 67% slower" than CUDA's performance.[187]

The fact that OpenCL allows workloads to be shared by CPU and GPU, executing the same programs, means that programmers can exploit both by dividing work among the devices.[188] This leads to the problem of deciding how to partition the work, because the relative speeds of operations differ among the devices. Machine learning has been suggested to solve this problem: Grewe and O'Boyle describe a system of support-vector machines trained on compile-time features of program that can decide the device partitioning problem statically, without actually running the programs to measure their performance.[189]

In a comparison of actual graphic cards of AMD RDNA 2 and Nvidia RTX Series there is an undecided result by OpenCL-Tests. Possible performance increases from the use of Nvidia CUDA or OptiX were not tested.[190]

See also

References

  1. ^ "Khronos OpenCL Registry". Khronos Group. April 27, 2020. Retrieved April 27, 2020.
  2. ^ "Android Devices With OpenCL support". Google Docs. ArrayFire. Retrieved April 28, 2015.
  3. ^ "FreeBSD Graphics/OpenCL". FreeBSD. Retrieved December 23, 2015.
  4. ^ a b c d e "Conformant Products". Khronos Group. Retrieved May 9, 2015.
  5. ^ Sochacki, Bartosz (July 19, 2019). "The OpenCL C++ 1.0 Specification" (PDF). Khronos OpenCL Working Group. Retrieved July 19, 2019.
  6. ^ Munshi, Aaftab; Howes, Lee; Sochaki, Barosz (April 27, 2020). (PDF). Khronos OpenCL Working Group. Archived from the original (PDF) on September 20, 2020. Retrieved April 28, 2021.
  7. ^ "The C++ for OpenCL 1.0 and 2021 Programming Language Documentation". Khronos OpenCL Working Group. December 20, 2021. Retrieved December 2, 2022.
  8. ^ "Conformant Companies". Khronos Group. Retrieved April 8, 2015.
  9. ^ Gianelli, Silvia E. (January 14, 2015). "Xilinx SDAccel Development Environment for OpenCL, C, and C++, Achieves Khronos Conformance". PR Newswire. Xilinx. Retrieved April 27, 2015.
  10. ^ Howes, Lee (November 11, 2015). "The OpenCL Specification Version: 2.1 Document Revision: 23" (PDF). Khronos OpenCL Working Group. Retrieved November 16, 2015.
  11. ^ a b Gaster, Benedict; Howes, Lee; Kaeli, David R.; Mistry, Perhaad; Schaa, Dana (2012). Heterogeneous Computing with OpenCL: Revised OpenCL 1.2 Edition. Morgan Kaufmann.
  12. ^ Tompson, Jonathan; Schlachter, Kristofer (2012). (PDF). New York University Media Research Lab. Archived from the original (PDF) on July 6, 2015. Retrieved July 6, 2015.
  13. ^ a b c d Stone, John E.; Gohara, David; Shi, Guochin (2010). "OpenCL: a parallel programming standard for heterogeneous computing systems". Computing in Science & Engineering. 12 (3): 66–73. Bibcode:2010CSE....12c..66S. doi:10.1109/MCSE.2010.69. PMC 2964860. PMID 21037981.
  14. ^ Klöckner, Andreas; Pinto, Nicolas; Lee, Yunsup; Catanzaro, Bryan; Ivanov, Paul; Fasih, Ahmed (2012). "PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation". Parallel Computing. 38 (3): 157–174. arXiv:0911.3456. doi:10.1016/j.parco.2011.09.001. S2CID 18928397.
  15. ^ "OpenCL - Open Computing Language Bindings". metacpan.org. Retrieved August 18, 2018.
  16. ^ "D binding for OpenCL". dlang.org. Retrieved June 29, 2021.
  17. ^ "SPIR - The first open standard intermediate language for parallel compute and graphics". Khronos Group. January 21, 2014.
  18. ^ . Khronos Group. January 21, 2014. Archived from the original on January 18, 2021. Retrieved October 24, 2016.
  19. ^ a b c "C++ for OpenCL, OpenCL-Guide". GitHub. Retrieved April 18, 2021.
  20. ^ a b c Aaftab Munshi, ed. (2014). "The OpenCL C Specification, Version 2.0" (PDF). Retrieved June 24, 2014.
  21. ^ a b (PDF). AMD. pp. 89–90. Archived from the original (PDF) on May 16, 2011. Retrieved August 8, 2017.
  22. ^ (PDF). SIGGRAPH2008. August 14, 2008. Archived from the original (PDF) on February 16, 2012. Retrieved August 14, 2008.
  23. ^ "Fitting FFT onto G80 Architecture" (PDF). Vasily Volkov and Brian Kazian, UC Berkeley CS258 project report. May 2008. Retrieved November 14, 2008.
  24. ^ "OpenCL_FFT". Apple. June 26, 2012. Retrieved June 18, 2022.
  25. ^ Trevett, Neil (April 28, 2020). "Khronos Announcements and Panel Discussion" (PDF).
  26. ^ Stulova, Anastasia; Hickey, Neil; van Haastregt, Sven; Antognini, Marco; Petit, Kevin (April 27, 2020). "The C++ for OpenCL Programming Language". Proceedings of the International Workshop on OpenCL. IWOCL '20. Munich, Germany: Association for Computing Machinery: 1–2. doi:10.1145/3388333.3388647. ISBN 978-1-4503-7531-3. S2CID 216554183.
  27. ^ a b KhronosGroup/OpenCL-Docs, The Khronos Group, April 16, 2021, retrieved April 18, 2021
  28. ^ "Clang release 9 documentation, OpenCL support". releases.llvm.org. September 2019. Retrieved April 18, 2021.
  29. ^ "Clang 9, Language Extensions, OpenCL". releases.llvm.org. September 2019. Retrieved April 18, 2021.
  30. ^ "Release of Documentation of C++ for OpenCL kernel language, version 1.0, revision 1 · KhronosGroup/OpenCL-Docs". GitHub. December 2020. Retrieved April 18, 2021.
  31. ^ "Release of Documentation of C++ for OpenCL kernel language, version 1.0 and 2021 · KhronosGroup/OpenCL-Docs". GitHub. December 2021. Retrieved December 2, 2022.
  32. ^ a b "The C++ for OpenCL 1.0 Programming Language Documentation". www.khronos.org. Retrieved April 18, 2021.
  33. ^ a b c d "Release of C++ for OpenCL Kernel Language Documentation, version 1.0, revision 2 · KhronosGroup/OpenCL-Docs". GitHub. March 2021. Retrieved April 18, 2021.
  34. ^ "cl_ext_cxx_for_opencl". www.khronos.org. September 2020. Retrieved April 18, 2021.
  35. ^ "Mali SDK Supporting Compilation of Kernels in C++ for OpenCL". community.arm.com. December 2020. Retrieved April 18, 2021.
  36. ^ "Clang Compiler User's Manual — C++ for OpenCL Support". clang.llvm.org. Retrieved April 18, 2021.
  37. ^ "OpenCL-Guide, Offline Compilation of OpenCL Kernel Sources". GitHub. Retrieved April 18, 2021.
  38. ^ "OpenCL-Guide, Programming OpenCL Kernels". GitHub. Retrieved April 18, 2021.
  39. ^ a b Clspv is a prototype compiler for a subset of OpenCL C to Vulkan compute shaders: google/clspv, August 17, 2019, retrieved August 20, 2019
  40. ^ Petit, Kévin (April 17, 2021), Experimental implementation of OpenCL on Vulkan, retrieved April 18, 2021
  41. ^ (Press release). Khronos Group. June 16, 2008. Archived from the original on June 20, 2008. Retrieved June 18, 2008.
  42. ^ "OpenCL gets touted in Texas". MacWorld. November 20, 2008. Retrieved June 12, 2009.
  43. ^ "The Khronos Group Releases OpenCL 1.0 Specification" (Press release). Khronos Group. December 8, 2008. Retrieved December 4, 2016.
  44. ^ (Press release). Apple Inc. June 9, 2008. Archived from the original on March 18, 2012. Retrieved June 9, 2008.
  45. ^ "AMD Drives Adoption of Industry Standards in GPGPU Software Development" (Press release). AMD. August 6, 2008. Retrieved August 14, 2008.
  46. ^ "AMD Backs OpenCL, Microsoft DirectX 11". eWeek. August 6, 2008. Archived from the original on March 19, 2012. Retrieved August 14, 2008.
  47. ^ . HPCWire. November 10, 2008. Archived from the original on December 18, 2008. Retrieved November 11, 2008.
  48. ^ "Nvidia Adds OpenCL To Its Industry Leading GPU Computing Toolkit" (Press release). Nvidia. December 9, 2008. Retrieved December 10, 2008.
  49. ^ "OpenCL Development Kit for Linux on Power". alphaWorks. October 30, 2009. Retrieved October 30, 2009.
  50. ^ "Opencl Standard - an overview | ScienceDirect Topics". www.sciencedirect.com.
  51. ^ http://developer.amd.com/wordpress/media/2012/10/opencl-1.0.48.pdf[bare URL PDF]
  52. ^ . Archived from the original on March 2, 2016. Retrieved February 24, 2016.
  53. ^ "Khronos Releases OpenCL 1.2 Specification". Khronos Group. November 15, 2011. Retrieved June 23, 2015.
  54. ^ a b c "OpenCL 1.2 Specification" (PDF). Khronos Group. Retrieved June 23, 2015.
  55. ^ "Khronos Finalizes OpenCL 2.0 Specification for Heterogeneous Computing". Khronos Group. November 18, 2013. Retrieved February 10, 2014.
  56. ^ "Khronos Releases OpenCL 2.1 and SPIR-V 1.0 Specifications for Heterogeneous Parallel Programming". Khronos Group. November 16, 2015. Retrieved November 16, 2015.
  57. ^ "Khronos Announces OpenCL 2.1: C++ Comes to OpenCL". AnandTech. March 3, 2015. Retrieved April 8, 2015.
  58. ^ "Khronos Releases OpenCL 2.1 Provisional Specification for Public Review". Khronos Group. March 3, 2015. Retrieved April 8, 2015.
  59. ^ "OpenCL Overview". Khronos Group. July 21, 2013.
  60. ^ a b "Khronos Releases OpenCL 2.2 Provisional Specification with OpenCL C++ Kernel Language for Parallel Programming". Khronos Group. April 18, 2016.
  61. ^ Trevett, Neil (April 2016). "OpenCL – A State of the Union" (PDF). IWOCL. Vienna: Khronos Group. Retrieved January 2, 2017.
  62. ^ "Khronos Releases OpenCL 2.2 With SPIR-V 1.2". Khronos Group. May 16, 2017.
  63. ^ a b "OpenCL 2.2 Maintenance Update Released". The Khronos Group. May 14, 2018.
  64. ^ "OpenCL 3.0 Bringing Greater Flexibility, Async DMA Extensions". www.phoronix.com.
  65. ^ "Khronos Group Releases OpenCL 3.0". April 26, 2020.
  66. ^ https://www.khronos.org/registry/OpenCL/specs/3.0-unified/pdf/OpenCL_API.pdf[bare URL PDF]
  67. ^ https://www.iwocl.org/wp-content/uploads/k03-iwocl-syclcon-2021-trevett-updated.mp4.pdf[bare URL PDF]
  68. ^ "Using Semaphore and Memory Sharing Extensions for Vulkan Interop with NVIDIA OpenCL". February 24, 2022.
  69. ^ . www.pcper.com. Archived from the original on November 1, 2017. Retrieved May 17, 2017.
  70. ^ "SIGGRAPH 2018: OpenCL-Next Taking Shape, Vulkan Continues Evolving - Phoronix". www.phoronix.com.
  71. ^ "Vulkan Update SIGGRAPH 2019" (PDF).
  72. ^ Trevett, Neil (May 23, 2019). "Khronos and OpenCL Overview EVS Workshop May19" (PDF). Khronos Group.
  73. ^ "OpenCL ICD Specification". Retrieved June 23, 2015.
  74. ^ "Apple entry on LLVM Users page". Retrieved August 29, 2009.
  75. ^ "Nvidia entry on LLVM Users page". Retrieved August 6, 2009.
  76. ^ "Rapidmind entry on LLVM Users page". Retrieved October 1, 2009.
  77. ^ "Zack Rusin's blog post about the Gallium3D OpenCL implementation". February 2009. Retrieved October 1, 2009.
  78. ^ "GalliumCompute". dri.freedesktop.org. Retrieved June 23, 2015.
  79. ^ "Clover Status Update" (PDF).
  80. ^ "mesa/mesa - The Mesa 3D Graphics Library". cgit.freedesktop.org.
  81. ^ . www.phoronix.com. Archived from the original on October 22, 2020. Retrieved December 13, 2018.
  82. ^ https://xdc2018.x.org/slides/clover.pdf[bare URL PDF]
  83. ^ "Mesa's "Rusticl" Implementation Now Manages to Handle Darktable OpenCL".
  84. ^ Larabel, Michael (January 10, 2013). "Beignet: OpenCL/GPGPU Comes For Ivy Bridge On Linux". Phoronix.
  85. ^ Larabel, Michael (April 16, 2013). "More Criticism Comes Towards Intel's Beignet OpenCL". Phoronix.
  86. ^ Larabel, Michael (December 24, 2013). "Intel's Beignet OpenCL Is Still Slowly Baking". Phoronix.
  87. ^ "Beignet". freedesktop.org.
  88. ^ "beignet - Beignet OpenCL Library for Intel Ivy Bridge and newer GPUs". cgit.freedesktop.org.
  89. ^ "Intel Brings Beignet To Android For OpenCL Compute - Phoronix". www.phoronix.com.
  90. ^ "01.org Intel Open Source - Compute Runtime". February 7, 2018.
  91. ^ "NEO GitHub README". GitHub. March 21, 2019.
  92. ^ . GitHub. Archived from the original on October 8, 2016.
  93. ^ "RadeonOpenCompute/ROCm: ROCm - Open Source Platform for HPC and Ultrascale GPU Computing". GitHub. March 21, 2019.
  94. ^ "A Nice Overview Of The ROCm Linux Compute Stack - Phoronix". www.phoronix.com.
  95. ^ "XDC Lightning.pdf". Google Docs.
  96. ^ "Radeon ROCm 2.0 Officially Out With OpenCL 2.0 Support, TensorFlow 1.12, Vega 48-bit VA - Phoronix". www.phoronix.com.
  97. ^ "Taking Radeon ROCm 2.0 OpenCL For A Benchmarking Test Drive - Phoronix". www.phoronix.com.
  98. ^ https://github.com/RadeonOpenCompute/ROCm/blob/master/AMD_ROCm_Release_Notes_v3.3.pdf[dead link]
  99. ^ "Radeon ROCm 3.5 Released with New Features but Still No Navi Support - Phoronix".
  100. ^ "Radeon ROCm 3.10 Released with Data Center Tool Improvements, New APIs - Phoronix".
  101. ^ "AMD Launches Arcturus as the Instinct MI100, Radeon ROCm 4.0 - Phoronix".
  102. ^ "Welcome to AMD ROCm™ Platform — ROCm Documentation 1.0.0 documentation".
  103. ^ "Home". docs.amd.com.
  104. ^ Jääskeläinen, Pekka; Sánchez de La Lama, Carlos; Schnetter, Erik; Raiskila, Kalle; Takala, Jarmo; Berg, Heikki (2016). "pocl: A Performance-Portable OpenCL Implementation". Int'l J. Parallel Programming. 43 (5): 752–785. arXiv:1611.07083. Bibcode:2016arXiv161107083J. doi:10.1007/s10766-014-0320-y. S2CID 9905244.
  105. ^ a b c "pocl home page". pocl.
  106. ^ "GitHub - pocl/pocl: pocl: Portable Computing Language". March 14, 2019 – via GitHub.
  107. ^ "HSA support implementation status as of 2016-05-17 — Portable Computing Language (pocl) 1.3-pre documentation". portablecl.org.
  108. ^ "PoCL home page".
  109. ^ "PoCL home page".
  110. ^ "PoCL home page".
  111. ^ . Archived from the original on January 17, 2021. Retrieved December 3, 2020.
  112. ^ https://www.iwocl.org/wp-content/uploads/30-iwocl-syclcon-2021-baumann-slides.pdf[bare URL PDF]
  113. ^ "PoCL home page".
  114. ^ "PoCL home page".
  115. ^ "PoCL home page".
  116. ^ "About". Git.Linaro.org.
  117. ^ Gall, T.; Pitney, G. (March 6, 2014). (PDF). Amazon Web Services. Archived from the original (PDF) on July 26, 2020. Retrieved January 22, 2017.
  118. ^ "zuzuf/freeocl". GitHub. Retrieved April 13, 2017.
  119. ^ Zhang, Peng; Fang, Jianbin; Yang, Canqun; Tang, Tao; Huang, Chun; Wang, Zheng (2018). MOCL: An Efficient OpenCL Implementation for the Matrix-2000 Architecture (PDF). Proc. Int'l Conf. on Computing Frontiers. doi:10.1145/3203217.3203244.
  120. ^ "Status". GitHub. March 16, 2022.
  121. ^ "OpenCL Demo, AMD CPU". YouTube. December 10, 2008. Retrieved March 28, 2009.
  122. ^ "OpenCL Demo, Nvidia GPU". YouTube. December 10, 2008. Retrieved March 28, 2009.
  123. ^ . Imagination Technologies. March 19, 2009. Archived from the original on April 3, 2014. Retrieved January 30, 2011.
  124. ^ . PC Perspective. March 26, 2009. Archived from the original on April 5, 2009. Retrieved March 28, 2009.
  125. ^ . Nvidia. April 20, 2009. Archived from the original on February 4, 2012. Retrieved April 27, 2009.
  126. ^ "AMD does reverse GPGPU, announces OpenCL SDK for x86". Ars Technica. August 5, 2009. Retrieved August 6, 2009.[permanent dead link]
  127. ^ Moren, Dan; Snell, Jason (June 8, 2009). "Live Update: WWDC 2009 Keynote". MacWorld.com. MacWorld. Retrieved June 12, 2009.
  128. ^ . Archived from the original on August 9, 2009. Retrieved October 14, 2009.
  129. ^ . Archived from the original on December 2, 2009. Retrieved October 27, 2009.
  130. ^ . Archived from the original on December 15, 2009. Retrieved December 10, 2009.
  131. ^ . Archived from the original on November 1, 2009. Retrieved October 23, 2009.
  132. ^ "OpenCL". ZiiLABS. Retrieved June 23, 2015.
  133. ^ . Archived from the original on October 31, 2013. Retrieved September 13, 2010.
  134. ^ "WebCL related stories". Khronos Group. Retrieved June 23, 2015.
  135. ^ . Khronos Group. Archived from the original on July 9, 2015. Retrieved June 23, 2015.
  136. ^ "IBM Developer". developer.ibm.com.
  137. ^ "Welcome to Wikis". www.ibm.com. October 20, 2009.
  138. ^ . Khronos Group. May 4, 2011. Archived from the original on December 5, 2020. Retrieved June 23, 2015.
  139. ^ KamathK, Sharath. . Github.com. Archived from the original on February 18, 2015. Retrieved June 23, 2015.
  140. ^ "AMD Opens the Throttle on APU Performance with Updated OpenCL Software Development". Amd.com. August 8, 2011. Retrieved June 16, 2013.
  141. ^ "AMD APP SDK v2.6". Forums.amd.com. March 13, 2015. Retrieved June 23, 2015.[dead link]
  142. ^ "The Portland Group Announces OpenCL Compiler for ST-Ericsson ARM-Based NovaThor SoCs". Retrieved May 4, 2012.
  143. ^ . Khronos Group. November 7, 2013. Archived from the original on August 1, 2014. Retrieved June 23, 2015.
  144. ^ . Altera.com. Archived from the original on January 9, 2014. Retrieved January 9, 2014.
  145. ^ . Altera.com. Archived from the original on January 9, 2014. Retrieved January 9, 2014.
  146. ^ "Khronos Finalizes OpenCL 2.0 Specification for Heterogeneous Computing". Khronos Group. November 18, 2013. Retrieved June 23, 2015.
  147. ^ "WebCL 1.0 Press Release". Khronos Group. March 19, 2014. Retrieved June 23, 2015.
  148. ^ "WebCL 1.0 Specification". Khronos Group. March 14, 2014. Retrieved June 23, 2015.
  149. ^ "Intel OpenCL 2.0 Driver". Archived from the original on September 17, 2014. Retrieved October 14, 2014.
  150. ^ "AMD OpenCL 2.0 Driver". Support.AMD.com. June 17, 2015. Retrieved June 23, 2015.
  151. ^ "Xilinx SDAccel development environment for OpenCL, C, and C++, achieves Khronos Conformance - khronos.org news". The Khronos Group. Retrieved June 26, 2017.
  152. ^ "Release 349 Graphics Drivers for Windows, Version 350.12" (PDF). April 13, 2015. Retrieved February 4, 2016.
  153. ^ "AMD APP SDK 3.0 Released". Developer.AMD.com. August 26, 2015. Retrieved September 11, 2015.
  154. ^ "Khronos Releases OpenCL 2.1 and SPIR-V 1.0 Specifications for Heterogeneous Parallel Programming". Khronos Group. November 16, 2015.
  155. ^ "What's new? Intel® SDK for OpenCL™ Applications 2016, R3". Intel Software.
  156. ^ . Khronos Group. February 17, 2017. Archived from the original on August 6, 2020. Retrieved March 17, 2017.
  157. ^ Szuppe, Jakub (February 22, 2017). "NVIDIA enables OpenCL 2.0 beta-support".
  158. ^ Szuppe, Jakub (March 6, 2017). "NVIDIA beta-support for OpenCL 2.0 works on Linux too".
  159. ^ "The Khronos Group". The Khronos Group. March 21, 2019.
  160. ^ "GitHub - RadeonOpenCompute/ROCm at roc-3.5.0". GitHub.
  161. ^ a b "NVIDIA is Now OpenCL 3.0 Conformant". April 12, 2021.
  162. ^ a b c "The Khronos Group". The Khronos Group. December 12, 2022. Retrieved December 12, 2022.
  163. ^ "Mesa's Rusticl Achieves Official OpenCL 3.0 Conformance". www.phoronix.com. Retrieved December 12, 2022.
  164. ^ "The Khronos Group". The Khronos Group. August 20, 2019. Retrieved August 20, 2019.
  165. ^ "KhronosGroup/OpenCL-CTL: The OpenCL Conformance Tests". GitHub. March 21, 2019.
  166. ^ . AMD Developer Central. developer.amd.com. Archived from the original on August 4, 2011. Retrieved August 11, 2011.
  167. ^ "About Intel OpenCL SDK 1.1". software.intel.com. intel.com. Retrieved August 11, 2011.
  168. ^ "Intel® SDK for OpenCL™ Applications - Release Notes". software.intel.com. March 14, 2019.
  169. ^ "Product Support". Retrieved August 11, 2011.
  170. ^ . Archived from the original on July 17, 2011. Retrieved August 11, 2011.
  171. ^ "Announcing OpenCL Development Kit for Linux on Power v0.3". IBM. Retrieved August 11, 2011.
  172. ^ "IBM releases OpenCL Development Kit for Linux on Power v0.3 – OpenCL 1.1 conformant release available". OpenCL Lounge. ibm.com. Retrieved August 11, 2011.
  173. ^ "IBM releases OpenCL Common Runtime for Linux on x86 Architecture". IBM. October 20, 2009. Retrieved September 10, 2011.
  174. ^ . AMD Developer Central. developer.amd.com. Archived from the original on September 6, 2011. Retrieved September 10, 2011.
  175. ^ "Nvidia Releases OpenCL Driver". April 22, 2009. Retrieved August 11, 2011.
  176. ^ "clinfo by Simon Leblanc". GitHub. Retrieved January 27, 2017.
  177. ^ "clinfo by Oblomov". GitHub. Retrieved January 27, 2017.
  178. ^ "clinfo: openCL INFOrmation". Retrieved January 27, 2017.
  179. ^ "Khronos Products". The Khronos Group. Retrieved May 15, 2017.
  180. ^ "OpenCL-CTS/Test_conformance at main · KhronosGroup/OpenCL-CTS". GitHub.
  181. ^ "Issues · KhronosGroup/OpenCL-CTS". GitHub.
  182. ^ "Intel Compute-Runtime 20.43.18277 Brings Alder Lake Support".
  183. ^ "compute-runtime". 01.org. February 7, 2018.
  184. ^ a b Fang, Jianbin; Varbanescu, Ana Lucia; Sips, Henk (2011). "A Comprehensive Performance Comparison of CUDA and OpenCL". 2011 International Conference on Parallel Processing. Proc. Int'l Conf. on Parallel Processing. pp. 216–225. doi:10.1109/ICPP.2011.45. ISBN 978-1-4577-1336-1.
  185. ^ Du, Peng; Weber, Rick; Luszczek, Piotr; Tomov, Stanimire; Peterson, Gregory; Dongarra, Jack (2012). "From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming". Parallel Computing. 38 (8): 391–407. CiteSeerX 10.1.1.193.7712. doi:10.1016/j.parco.2011.10.002.
  186. ^ Dolbeau, Romain; Bodin, François; de Verdière, Guillaume Colin (September 7, 2013). "One OpenCL to rule them all?". 2013 IEEE 6th International Workshop on Multi-/Many-core Computing Systems (MuCoCoS). pp. 1–6. doi:10.1109/MuCoCoS.2013.6633603. ISBN 978-1-4799-1010-6. S2CID 225784.
  187. ^ Karimi, Kamran; Dickson, Neil G.; Hamze, Firas (2011). "A Performance Comparison of CUDA and OpenCL". arXiv:1005.2581v3 [cs.PF].
  188. ^ A Survey of CPU-GPU Heterogeneous Computing Techniques, ACM Computing Surveys, 2015.
  189. ^ Grewe, Dominik; O'Boyle, Michael F. P. (2011). "A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL". Compiler Construction. Proc. Int'l Conf. on Compiler Construction. Lecture Notes in Computer Science. Vol. 6601. pp. 286–305. doi:10.1007/978-3-642-19861-8_16. ISBN 978-3-642-19860-1.
  190. ^ "Radeon RX 6800 Series Has Excellent ROCm-Based OpenCL Performance On Linux". www.phoronix.com.

External links

  • Official website
  • Official website for WebCL
  • International Workshop on OpenCL January 26, 2021, at the Wayback Machine (IWOCL) sponsored by The Khronos Group

opencl, confused, with, opengl, cryptographic, library, initially, known, botan, programming, library, this, article, uses, bare, urls, which, uninformative, vulnerable, link, please, consider, converting, them, full, citations, ensure, article, remains, verif. Not to be confused with OpenGL For the cryptographic library initially known as OpenCL see Botan programming library This article uses bare URLs which are uninformative and vulnerable to link rot Please consider converting them to full citations to ensure the article remains verifiable and maintains a consistent citation style Several templates and tools are available to assist in formatting such as Reflinks documentation reFill documentation and Citation bot documentation June 2022 Learn how and when to remove this template message This article may be too technical for most readers to understand Please help improve it to make it understandable to non experts without removing the technical details October 2021 Learn how and when to remove this template message OpenCL Open Computing Language is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units CPUs graphics processing units GPUs digital signal processors DSPs field programmable gate arrays FPGAs and other processors or hardware accelerators OpenCL specifies programming languages based on C99 C 14 and C 17 for programming these devices and application programming interfaces APIs to control the platform and execute programs on the compute devices OpenCL provides a standard interface for parallel computing using task and data based parallelism OpenCL APIOriginal author s Apple Inc Developer s Khronos GroupInitial releaseAugust 28 2009 13 years ago 2009 08 28 Stable release3 0 11 1 May 6 2022 9 months ago 2022 05 06 Written inC with C bindingsOperating systemAndroid vendor dependent 2 FreeBSD 3 Linux macOS via Pocl WindowsPlatformARMv7 ARMv8 4 Cell IA 32 Power x86 64TypeHeterogeneous computing APILicenseOpenCL specification licenseWebsitewww wbr khronos wbr org wbr opencl wbr OpenCL C C and C for OpenCLParadigmImperative procedural structured C only object oriented generic programmingFamilyCStable releaseOpenCL C 1 0 revision V2 2 11 5 OpenCL C 3 0 revision V3 0 11 6 C for OpenCL 1 0 and 2021 7 December 20 2021 13 months ago 2021 12 20 Typing disciplineStatic weak manifest nominalImplementation languageImplementation specificFilename extensions cl clcppWebsitewww wbr khronos wbr org wbr openclMajor implementationsAMD Gallium Compute IBM Intel NEO Intel SDK Texas Instruments Nvidia POCL ArmInfluenced byC99 CUDA C 14 C 17OpenCL is an open standard maintained by the non profit technology consortium Khronos Group Conformant implementations are available from Altera AMD ARM Creative IBM Imagination Intel Nvidia Qualcomm Samsung Vivante Xilinx and ZiiLABS 8 9 Contents 1 Overview 1 1 Memory hierarchy 2 OpenCL kernel language 2 1 OpenCL C language 2 1 1 Example matrix vector multiplication 2 1 2 Example computing the FFT 2 2 C for OpenCL language 2 2 1 Features 2 2 2 Example complex number arithmetic 2 2 3 Tooling and Execution Environment 2 2 4 Contributions 3 History 3 1 OpenCL 1 0 3 2 OpenCL 1 1 3 3 OpenCL 1 2 3 4 OpenCL 2 0 3 5 OpenCL 2 1 3 6 OpenCL 2 2 3 7 OpenCL 3 0 4 Roadmap 5 Open source implementations 6 Vendor implementations 6 1 Timeline of vendor implementations 7 Devices 7 1 Khronos Conformance Test Suite 7 2 Conformant products 7 3 Version support 7 3 1 OpenCL 3 0 support 7 3 2 OpenCL 2 2 support 7 3 3 OpenCL 2 1 support 7 3 4 OpenCL 2 0 support 7 3 5 OpenCL 1 2 support 7 3 6 OpenCL 1 1 support 7 3 7 OpenCL 1 0 support 8 Portability performance and alternatives 9 See also 10 References 11 External linksOverview EditOpenCL views a computing system as consisting of a number of compute devices which might be central processing units CPUs or accelerators such as graphics processing units GPUs attached to a host processor a CPU It defines a C like language for writing programs Functions executed on an OpenCL device are called kernels 10 17 A single compute device typically consists of several compute units which in turn comprise multiple processing elements PEs A single kernel execution can run on all or many of the PEs in parallel How a compute device is subdivided into compute units and PEs is up to the vendor a compute unit can be thought of as a core but the notion of core is hard to define across all the types of devices supported by OpenCL or even within the category of CPUs 11 49 50 and the number of compute units may not correspond to the number of cores claimed in vendors marketing literature which may actually be counting SIMD lanes 12 In addition to its C like programming language OpenCL defines an application programming interface API that allows programs running on the host to launch kernels on the compute devices and manage device memory which is at least conceptually separate from host memory Programs in the OpenCL language are intended to be compiled at run time so that OpenCL using applications are portable between implementations for various host devices 13 The OpenCL standard defines host APIs for C and C third party APIs exist for other programming languages and platforms such as Python 14 Java Perl 15 D 16 and NET 11 15 An implementation of the OpenCL standard consists of a library that implements the API for C and C and an OpenCL C compiler for the compute device s targeted In order to open the OpenCL programming model to other languages or to protect the kernel source from inspection the Standard Portable Intermediate Representation SPIR 17 can be used as a target independent way to ship kernels between a front end compiler and the OpenCL back end More recently Khronos Group has ratified SYCL 18 a higher level programming model for OpenCL as a single source eDSL based on pure C 17 to improve programming productivity People interested by C kernels but not by SYCL single source programming style can use C features with compute kernel sources written in C for OpenCL language 19 Memory hierarchy Edit OpenCL defines a four level memory hierarchy for the compute device 13 global memory shared by all processing elements but has high access latency global read only memory smaller low latency writable by the host CPU but not the compute devices constant local memory shared by a group of processing elements local per element private memory registers private Not every device needs to implement each level of this hierarchy in hardware Consistency between the various levels in the hierarchy is relaxed and only enforced by explicit synchronization constructs notably barriers Devices may or may not share memory with the host CPU 13 The host API provides handles on device memory buffers and functions to transfer data back and forth between host and devices OpenCL kernel language EditThe programming language that is used to write compute kernels is called kernel language OpenCL adopts C C based languages to specify the kernel computations performed on the device with some restrictions and additions to facilitate efficient mapping to the heterogeneous hardware resources of accelerators Traditionally OpenCL C was used to program the accelerators in OpenCL standard later C for OpenCL kernel language was developed that inherited all functionality from OpenCL C but allowed to use C features in the kernel sources OpenCL C language Edit OpenCL C 20 is a C99 based language dialect adapted to fit the device model in OpenCL Memory buffers reside in specific levels of the memory hierarchy and pointers are annotated with the region qualifiers global local constant and private reflecting this Instead of a device program having a main function OpenCL C functions are marked kernel to signal that they are entry points into the program to be called from the host program Function pointers bit fields and variable length arrays are omitted and recursion is forbidden 21 The C standard library is replaced by a custom set of standard functions geared toward math programming OpenCL C is extended to facilitate use of parallelism with vector types and operations synchronization and functions to work with work items and work groups 21 In particular besides scalar types such as float and double which behave similarly to the corresponding types in C OpenCL provides fixed length vector types such as float4 4 vector of single precision floats such vector types are available in lengths two three four eight and sixteen for various base types 20 6 1 2 Vectorized operations on these types are intended to map onto SIMD instructions sets e g SSE or VMX when running OpenCL programs on CPUs 13 Other specialized types include 2 d and 3 d image types 20 10 11 Example matrix vector multiplication Edit Each invocation work item of the kernel takes a row of the green matrix A in the code multiplies this row with the red vector x and places the result in an entry of the blue vector y The number of columns n is passed to the kernel as ncols the number of rows is implicit in the number of work items produced by the host program The following is a matrix vector multiplication algorithm in OpenCL C Multiplies A x leaving the result in y A is a row major matrix meaning the i j element is at A i ncols j kernel void matvec global const float A global const float x uint ncols global float y size t i get global id 0 Global id used as the row index global float const a amp A i ncols Pointer to the i th row float sum 0 f Accumulator for dot product for size t j 0 j lt ncols j sum a j x j y i sum The kernel function matvec computes in each invocation the dot product of a single row of a matrix A and a vector x y i a i x j a i j x j displaystyle y i a i cdot x sum j a i j x j To extend this into a full matrix vector multiplication the OpenCL runtime maps the kernel over the rows of the matrix On the host side the clEnqueueNDRangeKernel function does this it takes as arguments the kernel to execute its arguments and a number of work items corresponding to the number of rows in the matrix A Example computing the FFT Edit This example will load a fast Fourier transform FFT implementation and execute it The implementation is shown below 22 The code asks the OpenCL library for the first available graphics card creates memory buffers for reading and writing from the perspective of the graphics card JIT compiles the FFT kernel and then finally asynchronously runs the kernel The result from the transform is not read in this example include lt stdio h gt include lt time h gt include CL opencl h define NUM ENTRIES 1024 int main int argc const char argv CONSTANTS The source code of the kernel is represented as a string located inside file fft1D 1024 kernel src cl For the details see the next listing const char KernelSource include fft1D 1024 kernel src cl Looking up the available GPUs const cl uint num 1 clGetDeviceIDs NULL CL DEVICE TYPE GPU 0 NULL cl uint amp num cl device id devices 1 clGetDeviceIDs NULL CL DEVICE TYPE GPU num devices NULL create a compute context with GPU device cl context context clCreateContextFromType NULL CL DEVICE TYPE GPU NULL NULL NULL create a command queue clGetDeviceIDs NULL CL DEVICE TYPE DEFAULT 1 devices NULL cl command queue queue clCreateCommandQueue context devices 0 0 NULL allocate the buffer memory objects cl mem memobjs clCreateBuffer context CL MEM READ ONLY CL MEM COPY HOST PTR sizeof float 2 NUM ENTRIES NULL NULL clCreateBuffer context CL MEM READ WRITE sizeof float 2 NUM ENTRIES NULL NULL create the compute program const char fft1D 1024 kernel src 1 cl program program clCreateProgramWithSource context 1 const char amp KernelSource NULL NULL build the compute program executable clBuildProgram program 0 NULL NULL NULL NULL create the compute kernel cl kernel kernel clCreateKernel program fft1D 1024 NULL set the args values size t local work size 1 256 clSetKernelArg kernel 0 sizeof cl mem void amp memobjs 0 clSetKernelArg kernel 1 sizeof cl mem void amp memobjs 1 clSetKernelArg kernel 2 sizeof float local work size 0 1 16 NULL clSetKernelArg kernel 3 sizeof float local work size 0 1 16 NULL create N D range object with work item dimensions and execute kernel size t global work size 1 256 global work size 0 NUM ENTRIES local work size 0 64 Nvidia 192 or 256 clEnqueueNDRangeKernel queue kernel 1 NULL global work size local work size 0 NULL NULL The actual calculation inside file fft1D 1024 kernel src cl based on Fitting FFT onto the G80 Architecture 23 R This kernel computes FFT of length 1024 The 1024 length FFT is decomposed into calls to a radix 16 function another radix 16 function and then a radix 4 function kernel void fft1D 1024 global float2 in global float2 out local float sMemx local float sMemy int tid get local id 0 int blockIdx get group id 0 1024 tid float2 data 16 starting index of data to from global memory in in blockIdx out out blockIdx globalLoads data in 64 coalesced global reads fftRadix16Pass data in place radix 16 pass twiddleFactorMul data tid 1024 0 local shuffle using local memory localShuffle data sMemx sMemy tid tid amp 15 65 tid gt gt 4 fftRadix16Pass data in place radix 16 pass twiddleFactorMul data tid 64 4 twiddle factor multiplication localShuffle data sMemx sMemy tid tid gt gt 4 64 tid amp 15 four radix 4 function calls fftRadix4Pass data radix 4 function number 1 fftRadix4Pass data 4 radix 4 function number 2 fftRadix4Pass data 8 radix 4 function number 3 fftRadix4Pass data 12 radix 4 function number 4 coalesced global writes globalStores data out 64 A full open source implementation of an OpenCL FFT can be found on Apple s website 24 C for OpenCL language Edit In 2020 Khronos announced 25 the transition to the community driven C for OpenCL programming language 26 that provides features from C 17 in combination with the traditional OpenCL C features This language allows to leverage a rich variety of language features from standard C while preserving backward compatibility to OpenCL C This opens up a smooth transition path to C functionality for the OpenCL kernel code developers as they can continue using familiar programming flow and even tools as well as leverage existing extensions and libraries available for OpenCL C The language semantics is described in the documentation published in the releases of OpenCL Docs 27 repository hosted by the Khronos Group but it is currently not ratified by the Khronos Group The C for OpenCL language is not documented in a stand alone document and it is based on the specification of C and OpenCL C The open source Clang compiler has supported C for OpenCL since release 9 28 C for OpenCL has been originally developed as a Clang compiler extension and appeared in the release 9 29 As it was tightly coupled with OpenCL C and did not contain any Clang specific functionality its documentation has been re hosted to the OpenCL Docs repository 27 from the Khronos Group along with the sources of other specifications and reference cards The first official release of this document describing C for OpenCL version 1 0 has been published in December 2020 30 C for OpenCL 1 0 contains features from C 17 and it is backward compatible with OpenCL C 2 0 In December 2021 a new provisional C for OpenCL version 2021 has been released which is fully compatible with the OpenCL 3 0 standard 31 A work in progress draft of the latest C for OpenCL documentation can be found on the Khronos website 32 Features Edit C for OpenCL supports most of the features syntactically and semantically from OpenCL C except for nested parallelism and blocks 33 However there are minor differences in some supported features mainly related to differences in semantics between C and C For example C is more strict with the implicit type conversions and it does not support the restrict type qualifier 33 The following C features are not supported by C for OpenCL virtual functions dynamic cast operator non placement new delete operators exceptions pointer to member functions references to functions C standard libraries 33 C for OpenCL extends the concept of separate memory regions address spaces from OpenCL C to C features functional casts templates class members references lambda functions operators Most of C features are not available for the kernel functions e g overloading or templating arbitrary class layout in parameter type 33 Example complex number arithmetic EditThe following code snippet illustrates how kernels with complex number arithmetic can be implemented in C for OpenCL language with convenient use of C features Define a class Complex that can perform complex number computations with various precision when different types for T are used double float half template lt typename T gt class complex t T m re Real component T m im Imaginary component public complex t T re T im m re re m im im Define operator for complex number multiplication complex t operator const complex t amp other const return m re other m re m im other m im m re other m im m im other m re T get re const return m re T get im const return m im A helper function to compute multiplication over complex numbers read from the input buffer and to store the computed result into the output buffer template lt typename T gt void compute helper global T in global T out auto idx get global id 0 Every work item uses 4 consecutive items from the input buffer two for each complex number auto offset idx 4 auto num1 complex t in offset in offset 1 auto num2 complex t in offset 2 in offset 3 Perform complex number multiplication auto res num1 num2 Every work item writes 2 consecutive items to the output buffer out idx 2 res get re out idx 2 1 res get im This kernel is used for complex number multiplication in single precision kernel void compute sp global float in global float out compute helper in out ifdef cl khr fp16 This kernel is used for complex number multiplication in half precision when it is supported by the device pragma OPENCL EXTENSION cl khr fp16 enable kernel void compute hp global half in global half out compute helper in out endif Tooling and Execution Environment Edit C for OpenCL language can be used for the same applications or libraries and in the same way as OpenCL C language is used Due to the rich variety of C language features applications written in C for OpenCL can express complex functionality more conveniently than applications written in OpenCL C and in particular generic programming paradigm from C is very attractive to the library developers C for OpenCL sources can be compiled by OpenCL drivers that support cl ext cxx for opencl extension 34 Arm has announced support for this extension in December 2020 35 However due to increasing complexity of the algorithms accelerated on OpenCL devices it is expected that more applications will compile C for OpenCL kernels offline using stand alone compilers such as Clang 36 into executable binary format or portable binary format e g SPIR V 37 Such an executable can be loaded during the OpenCL applications execution using a dedicated OpenCL API 38 Binaries compiled from sources in C for OpenCL 1 0 can be executed on OpenCL 2 0 conformant devices Depending on the language features used in such kernel sources it can also be executed on devices supporting earlier OpenCL versions or OpenCL 3 0 Aside from OpenCL drivers kernels written in C for OpenCL can be compiled for execution on Vulkan devices using clspv 39 compiler and clvk 40 runtime layer just the same way as OpenCL C kernels Contributions Edit C for OpenCL is an open language developed by the community of contributors listed in its documentation 32 New contributions to the language semantic definition or open source tooling support are accepted from anyone interested as soon as they are aligned with the main design philosophy and they are reviewed and approved by the experienced contributors 19 History EditOpenCL was initially developed by Apple Inc which holds trademark rights and refined into an initial proposal in collaboration with technical teams at AMD IBM Qualcomm Intel and Nvidia Apple submitted this initial proposal to the Khronos Group On June 16 2008 the Khronos Compute Working Group was formed 41 with representatives from CPU GPU embedded processor and software companies This group worked for five months to finish the technical details of the specification for OpenCL 1 0 by November 18 2008 42 This technical specification was reviewed by the Khronos members and approved for public release on December 8 2008 43 OpenCL 1 0 Edit OpenCL 1 0 released with Mac OS X Snow Leopard on August 28 2009 According to an Apple press release 44 Snow Leopard further extends support for modern hardware with Open Computing Language OpenCL which lets any application tap into the vast gigaflops of GPU computing power previously available only to graphics applications OpenCL is based on the C programming language and has been proposed as an open standard AMD decided to support OpenCL instead of the now deprecated Close to Metal in its Stream framework 45 46 RapidMind announced their adoption of OpenCL underneath their development platform to support GPUs from multiple vendors with one interface 47 On December 9 2008 Nvidia announced its intention to add full support for the OpenCL 1 0 specification to its GPU Computing Toolkit 48 On October 30 2009 IBM released its first OpenCL implementation as a part of the XL compilers 49 Acceleration of calculations with factor to 1000 are possible with OpenCL in graphic cards against normal CPU 50 Some important features of next Version of OpenCL are optional in 1 0 like double or half precision operations 51 OpenCL 1 1 Edit OpenCL 1 1 was ratified by the Khronos Group on June 14 2010 52 and adds significant functionality for enhanced parallel programming flexibility functionality and performance including New data types including 3 component vectors and additional image formats Handling commands from multiple host threads and processing buffers across multiple devices Operations on regions of a buffer including read write and copy of 1D 2D or 3D rectangular regions Enhanced use of events to drive and control command execution Additional OpenCL built in C functions such as integer clamp shuffle and asynchronous strided copies Improved OpenGL interoperability through efficient sharing of images and buffers by linking OpenCL and OpenGL events OpenCL 1 2 Edit On November 15 2011 the Khronos Group announced the OpenCL 1 2 specification 53 which added significant functionality over the previous versions in terms of performance and features for parallel programming Most notable features include Device partitioning the ability to partition a device into sub devices so that work assignments can be allocated to individual compute units This is useful for reserving areas of the device to reduce latency for time critical tasks Separate compilation and linking of objects the functionality to compile OpenCL into external libraries for inclusion into other programs Enhanced image support optional 1 2 adds support for 1D images and 1D 2D image arrays Furthermore the OpenGL sharing extensions now allow for OpenGL 1D textures and 1D 2D texture arrays to be used to create OpenCL images Built in kernels custom devices that contain specific unique functionality are now integrated more closely into the OpenCL framework Kernels can be called to use specialised or non programmable aspects of underlying hardware Examples include video encoding decoding and digital signal processors DirectX functionality DX9 media surface sharing allows for efficient sharing between OpenCL and DX9 or DXVA media surfaces Equally for DX11 seamless sharing between OpenCL and DX11 surfaces is enabled The ability to force IEEE 754 compliance for single precision floating point math OpenCL by default allows the single precision versions of the division reciprocal and square root operation to be less accurate than the correctly rounded values that IEEE 754 requires 54 If the programmer passes the cl fp32 correctly rounded divide sqrt command line argument to the compiler these three operations will be computed to IEEE 754 requirements if the OpenCL implementation supports this and will fail to compile if the OpenCL implementation does not support computing these operations to their correctly rounded values as defined by the IEEE 754 specification 54 This ability is supplemented by the ability to query the OpenCL implementation to determine if it can perform these operations to IEEE 754 accuracy 54 OpenCL 2 0 Edit On November 18 2013 the Khronos Group announced the ratification and public release of the finalized OpenCL 2 0 specification 55 Updates and additions to OpenCL 2 0 include Shared virtual memory Nested parallelism Generic address space Images optional include 3D Image C11 atomics Pipes Android installable client driver extension half precision extended with optional cl khr fp16 extension cl double double precision IEEE 754 optional OpenCL 2 1 Edit The ratification and release of the OpenCL 2 1 provisional specification was announced on March 3 2015 at the Game Developer Conference in San Francisco It was released on November 16 2015 56 It introduced the OpenCL C kernel language based on a subset of C 14 while maintaining support for the preexisting OpenCL C kernel language Vulkan and OpenCL 2 1 share SPIR V as an intermediate representation allowing high level language front ends to share a common compilation target Updates to the OpenCL API include Additional subgroup functionality Copying of kernel objects and states Low latency device timer queries Ingestion of SPIR V code by runtime Execution priority hints for queues Zero sized dispatches from hostAMD ARM Intel HPC and YetiWare have declared support for OpenCL 2 1 57 58 OpenCL 2 2 Edit OpenCL 2 2 brings the OpenCL C kernel language into the core specification for significantly enhanced parallel programming productivity 59 60 61 It was released on May 16 2017 62 Maintenance Update released in May 2018 with bugfixes 63 The OpenCL C kernel language is a static subset of the C 14 standard and includes classes templates lambda expressions function overloads and many other constructs for generic and meta programming Uses the new Khronos SPIR V 1 1 intermediate language which fully supports the OpenCL C kernel language OpenCL library functions can now use the C language to provide increased safety and reduced undefined behavior while accessing features such as atomics iterators images samplers pipes and device queue built in types and address spaces Pipe storage is a new device side type in OpenCL 2 2 that is useful for FPGA implementations by making connectivity size and type known at compile time enabling efficient device scope communication between kernels OpenCL 2 2 also includes features for enhanced optimization of generated code applications can provide the value of specialization constant at SPIR V compilation time a new query can detect non trivial constructors and destructors of program scope global objects and user callbacks can be set at program release time Runs on any OpenCL 2 0 capable hardware only a driver update is required OpenCL 3 0 Edit The OpenCL 3 0 specification was released on September 30 2020 after being in preview since April 2020 OpenCL 1 2 functionality has become a mandatory baseline while all OpenCL 2 x and OpenCL 3 0 features were made optional The specification retains the OpenCL C language and deprecates the OpenCL C Kernel Language replacing it with the C for OpenCL language 19 based on a Clang LLVM compiler which implements a subset of C 17 and SPIR V intermediate code 64 65 66 Version 3 0 7 of C for OpenCL with some Khronos openCL extensions were presented at IWOCL 21 67 Actual is 3 0 11 with some new extensions and corrections NVIDIA working closely with the Khronos OpenCL Working Group improved Vulkan Interop with semaphores and memory sharing 68 Roadmap Edit The International Workshop on OpenCL IWOCL held by the Khronos Group When releasing OpenCL 2 2 the Khronos Group announced that OpenCL would converge where possible with Vulkan to enable OpenCL software deployment flexibility over both APIs 69 70 This has been now demonstrated by Adobe s Premiere Rush using the clspv 39 open source compiler to compile significant amounts of OpenCL C kernel code to run on a Vulkan runtime for deployment on Android 71 OpenCL has a forward looking roadmap independent of Vulkan with OpenCL Next under development and targeting release in 2020 OpenCL Next may integrate extensions such as Vulkan OpenCL Interop Scratch Pad Memory Management Extended Subgroups SPIR V 1 4 ingestion and SPIR V Extended debug info OpenCL is also considering Vulkan like loader and layers and a Flexible Profile for deployment flexibility on multiple accelerator types 72 Open source implementations Edit clinfo a command line tool to see OpenCL information OpenCL consists of a set of headers and a shared object that is loaded at runtime An installable client driver ICD must be installed on the platform for every class of vendor for which the runtime would need to support That is for example in order to support Nvidia devices on a Linux platform the Nvidia ICD would need to be installed such that the OpenCL runtime the ICD loader would be able to locate the ICD for the vendor and redirect the calls appropriately The standard OpenCL header is used by the consumer application calls to each function are then proxied by the OpenCL runtime to the appropriate driver using the ICD Each vendor must implement each OpenCL call in their driver 73 The Apple 74 Nvidia 75 ROCm RapidMind 76 and Gallium3D 77 implementations of OpenCL are all based on the LLVM Compiler technology and use the Clang compiler as their frontend MESA Gallium Compute An implementation of OpenCL actual 1 1 incomplete mostly done AMD Radeon GCN for a number of platforms is maintained as part of the Gallium Compute Project 78 which builds on the work of the Mesa project to support multiple platforms Formerly this was known as CLOVER 79 actual development mostly support for running incomplete framework with actual LLVM and CLANG some new features like fp16 in 17 3 80 Target complete OpenCL 1 0 1 1 and 1 2 for AMD and Nvidia New Basic Development is done by Red Hat with SPIR V also for Clover 81 82 New Target is modular OpenCL 3 0 with full support of OpenCL 1 2 Actual state is available in Mesamatrix Image supports are here in the focus of development RustiCL is a new implementation for Gallium compute with Rust instead of C for better code In Mesa 22 2 experimental implementation will be available with openCL 3 0 support and image extension implementation for programs like Darktable 83 BEIGNET An implementation by Intel for its Ivy Bridge hardware was released in 2013 84 This software from Intel s China Team has attracted criticism from developers at AMD and Red Hat 85 as well as Michael Larabel of Phoronix 86 Actual Version 1 3 2 support OpenCL 1 2 complete Ivy Bridge and higher and OpenCL 2 0 optional for Skylake and newer 87 88 support for Android has been added to Beignet 89 actual development targets only support for 1 2 and 2 0 road to OpenCL 2 1 2 2 3 0 is gone to NEO NEO An implementation by Intel for Gen 8 Broadwell Gen 9 hardware released in 2018 90 This driver replaces Beignet implementation for supported platforms not older 6 gen to Haswell NEO provides OpenCL 2 1 support on Core platforms and OpenCL 1 2 on Atom platforms 91 Actual in 2020 also Graphic Gen 11 Ice Lake and Gen 12 Tiger Lake are supported New OpenCL 3 0 is available for Alder Lake Tiger Lake to Broadwell with Version 20 41 It includes now optional OpenCL 2 0 2 1 Features complete and some of 2 2 ROCm Created as part of AMD s GPUOpen ROCm Radeon Open Compute is an open source Linux project built on OpenCL 1 2 with language support for 2 0 The system is compatible with all modern AMD CPUs and APUs actual partly GFX 7 GFX 8 and 9 as well as Intel Gen7 5 CPUs only with PCI 3 0 92 93 With version 1 9 support is in some points extended experimental to Hardware with PCIe 2 0 and without atomics An overview of actual work is done on XDC2018 94 95 ROCm Version 2 0 supports Full OpenCL 2 0 but some errors and limitations are on the todo list 96 97 Version 3 3 is improving in details 98 Version 3 5 does support OpenCL 2 2 99 Version 3 10 was with improvements and new APIs 100 Announced at SC20 is ROCm 4 0 with support of AMD Compute Card Instinct MI 100 101 Actual documentation of 5 1 1 and before is available at github 102 103 OpenCL 3 0 is available POCL A portable implementation supporting CPUs and some GPUs via CUDA and HSA Building on Clang and LLVM 104 With version 1 0 OpenCL 1 2 was nearly fully implemented along with some 2 x features 105 Version 1 2 is with LLVM CLANG 6 0 7 0 and Full OpenCL 1 2 support with all closed tickets in Milestone 1 2 105 106 OpenCL 2 0 is nearly full implemented 107 Version 1 3 Supports Mac OS X 108 Version 1 4 includes support for LLVM 8 0 and 9 0 109 Version 1 5 implements LLVM Clang 10 support 110 Version 1 6 implements LLVM Clang 11 support and CUDA Acceleration 111 Actual targets are complete OpenCL 2 x OpenCL 3 0 and improvement of performance POCL 1 6 is with manual optimization at the same level of Intel compute runtime 112 Version 1 7 implements LLVM Clang 12 support and some new OpenCL 3 0 features 113 Version 1 8 implements LLVM Clang 13 support 114 Version 3 0 implements OpenCL 3 0 at minimum level and LLVM Clang 14 115 Shamrock A Port of Mesa Clover for ARM with full support of OpenCL 1 2 116 117 no actual development for 2 0 FreeOCL A CPU focused implementation of OpenCL 1 2 that implements an external compiler to create a more reliable platform 118 no actual development MOCL An OpenCL implementation based on POCL by the NUDT researchers for Matrix 2000 was released in 2018 The Matrix 2000 architecture is designed to replace the Intel Xeon Phi accelerators of the TianHe 2 supercomputer This programming framework is built on top of LLVM v5 0 and reuses some code pieces from POCL as well To unlock the hardware potential the device runtime uses a push based task dispatching strategy and the performance of the kernel atomics is improved significantly This framework has been deployed on the TH 2A system and is readily available to the public 119 Some of the software will next ported to improve POCL 105 VC4CL An OpenCL 1 2 implementation for the VideoCore IV BCM2763 processor used in the Raspberry Pi before its model 4 120 Vendor implementations EditTimeline of vendor implementations Edit June 2008 During Apple s WWDC conference an early beta of Mac OS X Snow Leopard was made available to the participants it included the first beta implementation of OpenCL about 6 months before the final version 1 0 specification was ratified late 2008 They also showed two demos One was a grid of 8x8 screens rendered each displaying the screen of an emulated Apple II machine 64 independent instances in total each running a famous karate game This showed task parallelism on the CPU The other demo was a N body simulation running on the GPU of a Mac Pro a data parallel task December 10 2008 AMD and Nvidia held the first public OpenCL demonstration a 75 minute presentation at SIGGRAPH Asia 2008 AMD showed a CPU accelerated OpenCL demo explaining the scalability of OpenCL on one or more cores while Nvidia showed a GPU accelerated demo 121 122 March 16 2009 at the 4th Multicore Expo Imagination Technologies announced the PowerVR SGX543MP the first GPU of this company to feature OpenCL support 123 March 26 2009 at GDC 2009 AMD and Havok demonstrated the first working implementation for OpenCL accelerating Havok Cloth on AMD Radeon HD 4000 series GPU 124 April 20 2009 Nvidia announced the release of its OpenCL driver and SDK to developers participating in its OpenCL Early Access Program 125 August 5 2009 AMD unveiled the first development tools for its OpenCL platform as part of its ATI Stream SDK v2 0 Beta Program 126 August 28 2009 Apple released Mac OS X Snow Leopard which contains a full implementation of OpenCL 127 September 28 2009 Nvidia released its own OpenCL drivers and SDK implementation October 13 2009 AMD released the fourth beta of the ATI Stream SDK 2 0 which provides a complete OpenCL implementation on both R700 R800 GPUs and SSE3 capable CPUs The SDK is available for both Linux and Windows 128 November 26 2009 Nvidia released drivers for OpenCL 1 0 rev 48 October 27 2009 S3 released their first product supporting native OpenCL 1 0 the Chrome 5400E embedded graphics processor 129 December 10 2009 VIA released their first product supporting OpenCL 1 0 ChromotionHD 2 0 video processor included in VN1000 chipset 130 December 21 2009 AMD released the production version of the ATI Stream SDK 2 0 131 which provides OpenCL 1 0 support for R800 GPUs and beta support for R700 GPUs June 1 2010 ZiiLABS released details of their first OpenCL implementation for the ZMS processor for handheld embedded and digital home products 132 June 30 2010 IBM released a fully conformant version of OpenCL 1 0 4 September 13 2010 Intel released details of their first OpenCL implementation for the Sandy Bridge chip architecture Sandy Bridge will integrate Intel s newest graphics chip technology directly onto the central processing unit 133 November 15 2010 Wolfram Research released Mathematica 8 with OpenCLLink package March 3 2011 Khronos Group announces the formation of the WebCL working group to explore defining a JavaScript binding to OpenCL This creates the potential to harness GPU and multi core CPU parallel processing from a Web browser 134 135 March 31 2011 IBM released a fully conformant version of OpenCL 1 1 4 136 April 25 2011 IBM released OpenCL Common Runtime v0 1 for Linux on x86 Architecture 137 May 4 2011 Nokia Research releases an open source WebCL extension for the Firefox web browser providing a JavaScript binding to OpenCL 138 July 1 2011 Samsung Electronics releases an open source prototype implementation of WebCL for WebKit providing a JavaScript binding to OpenCL 139 August 8 2011 AMD released the OpenCL driven AMD Accelerated Parallel Processing APP Software Development Kit SDK v2 5 replacing the ATI Stream SDK as technology and concept 140 December 12 2011 AMD released AMD APP SDK v2 6 141 which contains a preview of OpenCL 1 2 February 27 2012 The Portland Group released the PGI OpenCL compiler for multi core ARM CPUs 142 April 17 2012 Khronos released a WebCL working draft 143 May 6 2013 Altera released the Altera SDK for OpenCL version 13 0 144 It is conformant to OpenCL 1 0 145 November 18 2013 Khronos announced that the specification for OpenCL 2 0 had been finalized 146 March 19 2014 Khronos releases the WebCL 1 0 specification 147 148 August 29 2014 Intel releases HD Graphics 5300 driver that supports OpenCL 2 0 149 September 25 2014 AMD releases Catalyst 14 41 RC1 which includes an OpenCL 2 0 driver 150 January 14 2015 Xilinx Inc announces SDAccel development environment for OpenCL C and C achieves Khronos Conformance 151 April 13 2015 Nvidia releases WHQL driver v350 12 which includes OpenCL 1 2 support for GPUs based on Kepler or later architectures 152 Driver 340 support OpenCL 1 1 for Tesla and Fermi August 26 2015 AMD released AMD APP SDK v3 0 153 which contains full support of OpenCL 2 0 and sample coding November 16 2015 Khronos announced that the specification for OpenCL 2 1 had been finalized 154 April 18 2016 Khronos announced that the specification for OpenCL 2 2 had been provisionally finalized 60 November 3 2016 Intel support for Gen7 of OpenCL 2 1 in SDK 2016 r3 155 February 17 2017 Nvidia begins evaluation support of OpenCL 2 0 with driver 378 66 156 157 158 May 16 2017 Khronos announced that the specification for OpenCL 2 2 had been finalized with SPIR V 1 2 159 May 14 2018 Khronos announced Maintenance Update for OpenCL 2 2 with Bugfix and unified headers 63 April 27 2020 Khronos announced provisional Version of OpenCL 3 0 June 1 2020 Intel Neo Runtime with OpenCL 3 0 for new Tiger Lake June 3 2020 AMD announced RocM 3 5 with OpenCL 2 2 support 160 September 30 2020 Khronos announced that the specifications for OpenCL 3 0 had been finalized CTS also available October 16 2020 Intel announced with Neo 20 41 support for OpenCL 3 0 includes mostly of optional OpenCL 2 x April 6 2021 Nvidia supports OpenCL 3 0 for Ampere Maxwell and later GPUs also supports OpenCL 3 0 with Nvidia driver 465 161 August 20 2022 Intel Arc Alchemist GPUs Arc A380 A350M A370M A550M A730M and A770M are conformant with OpenCL 3 0 162 October 14 2022 Arm Mali G615 and Mali G715 Immortalis are conformant with OpenCL 3 0 162 November 11 2022 The Rusticl OpenCL Library is conformant with OpenCL 3 0 162 163 Devices EditAs of 2016 OpenCL runs on graphics processing units GPUs CPUs with SIMD instructions FPGAs Movidius Myriad 2 Adapteva Epiphany and DSPs Khronos Conformance Test Suite Edit To be officially conformant an implementation must pass the Khronos Conformance Test Suite CTS with results being submitted to the Khronos Adopters Program 164 The Khronos CTS code for all OpenCL versions has been available in open source since 2017 165 Conformant products Edit The Khronos Group maintains an extended list of OpenCL conformant products 4 Synopsis of OpenCL conformant products 4 AMD SDKs supports OpenCL CPU and APU devices GPU Terascale 1 OpenCL 1 1 Terascale 2 1 2 GCN 1 1 2 GCN 2 2 0 X86 SSE2 or higher compatible CPUs 64 bit amp 32 bit 166 Linux 2 6 PC Windows Vista 7 8 x 10 PC AMD Fusion E 350 E 240 C 50 C 30 with HD 6310 HD 6250 AMD Radeon Mobility HD 6800 HD 5x00 series GPU iGPU HD 6310 HD 6250 HD 7xxx HD 8xxx R2xx R3xx RX 4xx RX 5xx Vega Series AMD FirePro Vx800 series GPU and later Radeon ProIntel SDK for OpenCL Applications 2013 167 supports Intel Core processors and Intel HD Graphics 4000 2500 2017 R2 with OpenCL 2 1 Gen7 SDK 2019 removed OpenCL 2 1 168 Actual SDK 2020 update 3 Intel CPUs with SSE 4 1 SSE 4 2 or AVX support 169 170 Microsoft Windows Linux Intel Core i7 i5 i3 2nd Generation Intel Core i7 5 3 3rd Generation Intel Core Processors with Intel HD Graphics 4000 2500 and newer Intel Core 2 Solo Duo Quad Extreme and newer Intel Xeon 7x00 5x00 3x00 Core based and newerIBM Servers with OpenCL Development Kit for Linux on Power running on Power VSX 171 172 IBM Power 775 PERCS 750 IBM BladeCenter PS70x Express IBM BladeCenter JS2x JS43 IBM BladeCenter QS22IBM OpenCL Common Runtime OCR 173 X86 SSE2 or higher compatible CPUs 64 bit amp 32 bit 174 Linux 2 6 PC AMD Fusion Nvidia Ion and Intel Core i7 i5 i3 2nd Generation Intel Core i7 5 3 AMD Radeon Nvidia GeForce and Intel Core 2 Solo Duo Quad Extreme ATI FirePro Nvidia Quadro and Intel Xeon 7x00 5x00 3x00 Core based Nvidia OpenCL Driver and Tools 175 Chips Tesla OpenCL 1 1 Driver 340 Fermi OpenCL 1 1 Driver 390 Kepler OpenCL 1 2 Driver 470 OpenCL 2 0 beta 378 66 OpenCL 3 0 Maxwell to Ada Lovelace Driver 525 Nvidia Tesla C D S Nvidia GeForce GTS GT GTX Nvidia Ion Nvidia Quadro FX NVX Plex Quadro Quadro K Quadro M Quadro P Quadro with Volta Quadro RTX with Turing AmpereAll standard conformant implementations can be queried using one of the clinfo tools there are multiple tools with the same name and similar feature set 176 177 178 Version support Edit Products and their version of OpenCL support include 179 OpenCL 3 0 support Edit All hardware with OpenCL 1 2 is possible OpenCL 2 x only optional Khronos Test Suite available since 2020 10 180 181 2020 Intel NEO Compute 20 41 for Gen 12 Tiger Lake to Broadwell include full 2 0 and 2 1 support and parts of 2 2 182 2020 Intel 6th 7th 8th 9th 10th 11th gen processors Skylake Kaby Lake Coffee Lake Comet Lake Ice Lake Tiger Lake with latest Intel Windows graphics driver 2021 Intel 11th 12th gen processors Rocket Lake Alder Lake with latest Intel Windows graphics driver 2021 Arm Mali G78 Mali G310 Mali G510 Mali G610 Mali G710 and Mali G78AE 2022 Intel 13th gen processors Raptor Lake with latest Intel Windows graphics driver 2022 Intel Arc discrete graphics with latest Intel Arc Windows graphics driver 2021 Nvidia Maxwell Pascal Volta Turing and Ampere with Nvidia graphics driver 465 161 2022 Nvidia Ada Lovelace with Nvidia graphics driver 525 2022 Samsung Xclipse 920 GPU based on AMD RDNA2 OpenCL 2 2 support Edit None yet Khronos Test Suite ready with Driver Update all Hardware with 2 0 and 2 1 support possible Intel NEO Compute Work in Progress for actual products 183 ROCm Version 3 5 mostlyOpenCL 2 1 support Edit 2018 Support backported to Intel 5th and 6th gen processors Broadwell Skylake 2017 Intel 7th 8th 9th 10th gen processors Kaby Lake Coffee Lake Comet Lake Ice Lake Khronos with Driver Update all Hardware with 2 0 support possibleOpenCL 2 0 support Edit 2011 AMD GCN GPU s HD 7700 HD 8000 Rx 200 Rx 300 Rx 400 Rx 500 Rx 5000 Series some GCN 1st Gen only 1 2 with some Extensions 2013 AMD GCN APU s Jaguar Steamroller Puma Excavator amp Zen based 2014 Intel 5th amp 6th gen processors Broadwell Skylake 2015 Qualcomm Adreno 5xx series 2018 Qualcomm Adreno 6xx series 2017 ARM Mali Bifrost G51 and G71 in Android 7 1 and Linux 2018 ARM Mali Bifrost G31 G52 G72 and G76 2017 incomplete Evaluation support Nvidia Kepler Maxwell Pascal Volta and Turing GPU s GeForce 600 700 800 900 amp 10 series Quadro K M amp P series Tesla K M amp P series with Driver Version 378 66 OpenCL 1 2 support Edit 2011 for some AMD GCN 1st Gen some OpenCL 2 0 Features not possible today but many more Extensions than Terascale 2009 AMD TeraScale 2 amp 3 GPU s RV8xx RV9xx in HD 5000 6000 amp 7000 Series 2011 AMD TeraScale APU s K10 Bobcat amp Piledriver based 2012 Nvidia Kepler Maxwell Pascal Volta and Turing GPU s GeForce 600 700 800 900 10 16 20 series Quadro K M amp P series Tesla K M amp P series 2012 Intel 3rd amp 4th gen processors Ivy Bridge Haswell 2013 Qualcomm Adreno 4xx series 2013 ARM Mali Midgard 3rd gen T760 2015 ARM Mali Midgard 4th gen T8xx OpenCL 1 1 support Edit 2008 some AMD TeraScale 1 GPU s RV7xx in HD4000 series 2008 Nvidia Tesla Fermi GPU s GeForce 8 9 100 200 300 400 500 series Quadro series or Tesla series with Tesla or Fermi GPU 2011 Qualcomm Adreno 3xx series 2012 ARM Mali Midgard 1st and 2nd gen T 6xx T720 OpenCL 1 0 support Edit mostly updated to 1 1 and 1 2 after first Driver for 1 0 onlyPortability performance and alternatives EditA key feature of OpenCL is portability via its abstracted memory and execution model and the programmer is not able to directly use hardware specific technologies such as inline Parallel Thread Execution PTX for Nvidia GPUs unless they are willing to give up direct portability on other platforms It is possible to run any OpenCL kernel on any conformant implementation However performance of the kernel is not necessarily portable across platforms Existing implementations have been shown to be competitive when kernel code is properly tuned though and auto tuning has been suggested as a solution to the performance portability problem 184 yielding acceptable levels of performance in experimental linear algebra kernels 185 Portability of an entire application containing multiple kernels with differing behaviors was also studied and shows that portability only required limited tradeoffs 186 A study at Delft University from 2011 that compared CUDA programs and their straightforward translation into OpenCL C found CUDA to outperform OpenCL by at most 30 on the Nvidia implementation The researchers noted that their comparison could be made fairer by applying manual optimizations to the OpenCL programs in which case there was no reason for OpenCL to obtain worse performance than CUDA The performance differences could mostly be attributed to differences in the programming model especially the memory model and to NVIDIA s compiler optimizations for CUDA compared to those for OpenCL 184 Another study at D Wave Systems Inc found that The OpenCL kernel s performance is between about 13 and 63 slower and the end to end time is between about 16 and 67 slower than CUDA s performance 187 The fact that OpenCL allows workloads to be shared by CPU and GPU executing the same programs means that programmers can exploit both by dividing work among the devices 188 This leads to the problem of deciding how to partition the work because the relative speeds of operations differ among the devices Machine learning has been suggested to solve this problem Grewe and O Boyle describe a system of support vector machines trained on compile time features of program that can decide the device partitioning problem statically without actually running the programs to measure their performance 189 In a comparison of actual graphic cards of AMD RDNA 2 and Nvidia RTX Series there is an undecided result by OpenCL Tests Possible performance increases from the use of Nvidia CUDA or OptiX were not tested 190 See also EditAdvanced Simulation Library AMD FireStream BrookGPU C AMP Close to Metal CUDA DirectCompute GPGPU HIP Larrabee Lib Sh List of OpenCL applications OpenACC OpenGL OpenHMPP OpenMP Metal RenderScript SequenceL SIMD SYCL Vulkan WebCLReferences Edit Khronos OpenCL Registry Khronos Group April 27 2020 Retrieved April 27 2020 Android Devices With OpenCL support Google Docs ArrayFire Retrieved April 28 2015 FreeBSD Graphics OpenCL FreeBSD Retrieved December 23 2015 a b c d e Conformant Products Khronos Group Retrieved May 9 2015 Sochacki Bartosz July 19 2019 The OpenCL C 1 0 Specification PDF Khronos OpenCL Working Group Retrieved July 19 2019 Munshi Aaftab Howes Lee Sochaki Barosz April 27 2020 The OpenCL C Specification Version 3 0 Document Revision V3 0 7 PDF Khronos OpenCL Working Group Archived from the original PDF on September 20 2020 Retrieved April 28 2021 The C for OpenCL 1 0 and 2021 Programming Language Documentation Khronos OpenCL Working Group December 20 2021 Retrieved December 2 2022 Conformant Companies Khronos Group Retrieved April 8 2015 Gianelli Silvia E January 14 2015 Xilinx SDAccel Development Environment for OpenCL C and C Achieves Khronos Conformance PR Newswire Xilinx Retrieved April 27 2015 Howes Lee November 11 2015 The OpenCL Specification Version 2 1 Document Revision 23 PDF Khronos OpenCL Working Group Retrieved November 16 2015 a b Gaster Benedict Howes Lee Kaeli David R Mistry Perhaad Schaa Dana 2012 Heterogeneous Computing with OpenCL Revised OpenCL 1 2 Edition Morgan Kaufmann Tompson Jonathan Schlachter Kristofer 2012 An Introduction to the OpenCL Programming Model PDF New York University Media Research Lab Archived from the original PDF on July 6 2015 Retrieved July 6 2015 a b c d Stone John E Gohara David Shi Guochin 2010 OpenCL a parallel programming standard for heterogeneous computing systems Computing in Science amp Engineering 12 3 66 73 Bibcode 2010CSE 12c 66S doi 10 1109 MCSE 2010 69 PMC 2964860 PMID 21037981 Klockner Andreas Pinto Nicolas Lee Yunsup Catanzaro Bryan Ivanov Paul Fasih Ahmed 2012 PyCUDA and PyOpenCL A scripting based approach to GPU run time code generation Parallel Computing 38 3 157 174 arXiv 0911 3456 doi 10 1016 j parco 2011 09 001 S2CID 18928397 OpenCL Open Computing Language Bindings metacpan org Retrieved August 18 2018 D binding for OpenCL dlang org Retrieved June 29 2021 SPIR The first open standard intermediate language for parallel compute and graphics Khronos Group January 21 2014 SYCL C Single source Heterogeneous Programming for OpenCL Khronos Group January 21 2014 Archived from the original on January 18 2021 Retrieved October 24 2016 a b c C for OpenCL OpenCL Guide GitHub Retrieved April 18 2021 a b c Aaftab Munshi ed 2014 The OpenCL C Specification Version 2 0 PDF Retrieved June 24 2014 a b Introduction to OpenCL Programming 201005 PDF AMD pp 89 90 Archived from the original PDF on May 16 2011 Retrieved August 8 2017 OpenCL PDF SIGGRAPH2008 August 14 2008 Archived from the original PDF on February 16 2012 Retrieved August 14 2008 Fitting FFT onto G80 Architecture PDF Vasily Volkov and Brian Kazian UC Berkeley CS258 project report May 2008 Retrieved November 14 2008 OpenCL FFT Apple June 26 2012 Retrieved June 18 2022 Trevett Neil April 28 2020 Khronos Announcements and Panel Discussion PDF Stulova Anastasia Hickey Neil van Haastregt Sven Antognini Marco Petit Kevin April 27 2020 The C for OpenCL Programming Language Proceedings of the International Workshop on OpenCL IWOCL 20 Munich Germany Association for Computing Machinery 1 2 doi 10 1145 3388333 3388647 ISBN 978 1 4503 7531 3 S2CID 216554183 a b KhronosGroup OpenCL Docs The Khronos Group April 16 2021 retrieved April 18 2021 Clang release 9 documentation OpenCL support releases llvm org September 2019 Retrieved April 18 2021 Clang 9 Language Extensions OpenCL releases llvm org September 2019 Retrieved April 18 2021 Release of Documentation of C for OpenCL kernel language version 1 0 revision 1 KhronosGroup OpenCL Docs GitHub December 2020 Retrieved April 18 2021 Release of Documentation of C for OpenCL kernel language version 1 0 and 2021 KhronosGroup OpenCL Docs GitHub December 2021 Retrieved December 2 2022 a b The C for OpenCL 1 0 Programming Language Documentation www khronos org Retrieved April 18 2021 a b c d Release of C for OpenCL Kernel Language Documentation version 1 0 revision 2 KhronosGroup OpenCL Docs GitHub March 2021 Retrieved April 18 2021 cl ext cxx for opencl www khronos org September 2020 Retrieved April 18 2021 Mali SDK Supporting Compilation of Kernels in C for OpenCL community arm com December 2020 Retrieved April 18 2021 Clang Compiler User s Manual C for OpenCL Support clang llvm org Retrieved April 18 2021 OpenCL Guide Offline Compilation of OpenCL Kernel Sources GitHub Retrieved April 18 2021 OpenCL Guide Programming OpenCL Kernels GitHub Retrieved April 18 2021 a b Clspv is a prototype compiler for a subset of OpenCL C to Vulkan compute shaders google clspv August 17 2019 retrieved August 20 2019 Petit Kevin April 17 2021 Experimental implementation of OpenCL on Vulkan retrieved April 18 2021 Khronos Launches Heterogeneous Computing Initiative Press release Khronos Group June 16 2008 Archived from the original on June 20 2008 Retrieved June 18 2008 OpenCL gets touted in Texas MacWorld November 20 2008 Retrieved June 12 2009 The Khronos Group Releases OpenCL 1 0 Specification Press release Khronos Group December 8 2008 Retrieved December 4 2016 Apple Previews Mac OS X Snow Leopard to Developers Press release Apple Inc June 9 2008 Archived from the original on March 18 2012 Retrieved June 9 2008 AMD Drives Adoption of Industry Standards in GPGPU Software Development Press release AMD August 6 2008 Retrieved August 14 2008 AMD Backs OpenCL Microsoft DirectX 11 eWeek August 6 2008 Archived from the original on March 19 2012 Retrieved August 14 2008 HPCWire RapidMind Embraces Open Source and Standards Projects HPCWire November 10 2008 Archived from the original on December 18 2008 Retrieved November 11 2008 Nvidia Adds OpenCL To Its Industry Leading GPU Computing Toolkit Press release Nvidia December 9 2008 Retrieved December 10 2008 OpenCL Development Kit for Linux on Power alphaWorks October 30 2009 Retrieved October 30 2009 Opencl Standard an overview ScienceDirect Topics www sciencedirect com http developer amd com wordpress media 2012 10 opencl 1 0 48 pdf bare URL PDF Khronos Drives Momentum of Parallel Computing Standard with Release of OpenCL 1 1 Specification Archived from the original on March 2 2016 Retrieved February 24 2016 Khronos Releases OpenCL 1 2 Specification Khronos Group November 15 2011 Retrieved June 23 2015 a b c OpenCL 1 2 Specification PDF Khronos Group Retrieved June 23 2015 Khronos Finalizes OpenCL 2 0 Specification for Heterogeneous Computing Khronos Group November 18 2013 Retrieved February 10 2014 Khronos Releases OpenCL 2 1 and SPIR V 1 0 Specifications for Heterogeneous Parallel Programming Khronos Group November 16 2015 Retrieved November 16 2015 Khronos Announces OpenCL 2 1 C Comes to OpenCL AnandTech March 3 2015 Retrieved April 8 2015 Khronos Releases OpenCL 2 1 Provisional Specification for Public Review Khronos Group March 3 2015 Retrieved April 8 2015 OpenCL Overview Khronos Group July 21 2013 a b Khronos Releases OpenCL 2 2 Provisional Specification with OpenCL C Kernel Language for Parallel Programming Khronos Group April 18 2016 Trevett Neil April 2016 OpenCL A State of the Union PDF IWOCL Vienna Khronos Group Retrieved January 2 2017 Khronos Releases OpenCL 2 2 With SPIR V 1 2 Khronos Group May 16 2017 a b OpenCL 2 2 Maintenance Update Released The Khronos Group May 14 2018 OpenCL 3 0 Bringing Greater Flexibility Async DMA Extensions www phoronix com Khronos Group Releases OpenCL 3 0 April 26 2020 https www khronos org registry OpenCL specs 3 0 unified pdf OpenCL API pdf bare URL PDF https www iwocl org wp content uploads k03 iwocl syclcon 2021 trevett updated mp4 pdf bare URL PDF Using Semaphore and Memory Sharing Extensions for Vulkan Interop with NVIDIA OpenCL February 24 2022 Breaking OpenCL Merging Roadmap into Vulkan PC Perspective www pcper com Archived from the original on November 1 2017 Retrieved May 17 2017 SIGGRAPH 2018 OpenCL Next Taking Shape Vulkan Continues Evolving Phoronix www phoronix com Vulkan Update SIGGRAPH 2019 PDF Trevett Neil May 23 2019 Khronos and OpenCL Overview EVS Workshop May19 PDF Khronos Group OpenCL ICD Specification Retrieved June 23 2015 Apple entry on LLVM Users page Retrieved August 29 2009 Nvidia entry on LLVM Users page Retrieved August 6 2009 Rapidmind entry on LLVM Users page Retrieved October 1 2009 Zack Rusin s blog post about the Gallium3D OpenCL implementation February 2009 Retrieved October 1 2009 GalliumCompute dri freedesktop org Retrieved June 23 2015 Clover Status Update PDF mesa mesa The Mesa 3D Graphics Library cgit freedesktop org Gallium Clover With SPIR V amp NIR Opening Up New Compute Options Inside Mesa Phoronix www phoronix com Archived from the original on October 22 2020 Retrieved December 13 2018 https xdc2018 x org slides clover pdf bare URL PDF Mesa s Rusticl Implementation Now Manages to Handle Darktable OpenCL Larabel Michael January 10 2013 Beignet OpenCL GPGPU Comes For Ivy Bridge On Linux Phoronix Larabel Michael April 16 2013 More Criticism Comes Towards Intel s Beignet OpenCL Phoronix Larabel Michael December 24 2013 Intel s Beignet OpenCL Is Still Slowly Baking Phoronix Beignet freedesktop org beignet Beignet OpenCL Library for Intel Ivy Bridge and newer GPUs cgit freedesktop org Intel Brings Beignet To Android For OpenCL Compute Phoronix www phoronix com 01 org Intel Open Source Compute Runtime February 7 2018 NEO GitHub README GitHub March 21 2019 ROCm GitHub Archived from the original on October 8 2016 RadeonOpenCompute ROCm ROCm Open Source Platform for HPC and Ultrascale GPU Computing GitHub March 21 2019 A Nice Overview Of The ROCm Linux Compute Stack Phoronix www phoronix com XDC Lightning pdf Google Docs Radeon ROCm 2 0 Officially Out With OpenCL 2 0 Support TensorFlow 1 12 Vega 48 bit VA Phoronix www phoronix com Taking Radeon ROCm 2 0 OpenCL For A Benchmarking Test Drive Phoronix www phoronix com https github com RadeonOpenCompute ROCm blob master AMD ROCm Release Notes v3 3 pdf dead link Radeon ROCm 3 5 Released with New Features but Still No Navi Support Phoronix Radeon ROCm 3 10 Released with Data Center Tool Improvements New APIs Phoronix AMD Launches Arcturus as the Instinct MI100 Radeon ROCm 4 0 Phoronix Welcome to AMD ROCm Platform ROCm Documentation 1 0 0 documentation Home docs amd com Jaaskelainen Pekka Sanchez de La Lama Carlos Schnetter Erik Raiskila Kalle Takala Jarmo Berg Heikki 2016 pocl A Performance Portable OpenCL Implementation Int l J Parallel Programming 43 5 752 785 arXiv 1611 07083 Bibcode 2016arXiv161107083J doi 10 1007 s10766 014 0320 y S2CID 9905244 a b c pocl home page pocl GitHub pocl pocl pocl Portable Computing Language March 14 2019 via GitHub HSA support implementation status as of 2016 05 17 Portable Computing Language pocl 1 3 pre documentation portablecl org PoCL home page PoCL home page PoCL home page POCL 1 6 RC1 Released with Better CUDA Performance Phoronix Archived from the original on January 17 2021 Retrieved December 3 2020 https www iwocl org wp content uploads 30 iwocl syclcon 2021 baumann slides pdf bare URL PDF PoCL home page PoCL home page PoCL home page About Git Linaro org Gall T Pitney G March 6 2014 LCA14 412 GPGPU on ARM SoC PDF Amazon Web Services Archived from the original PDF on July 26 2020 Retrieved January 22 2017 zuzuf freeocl GitHub Retrieved April 13 2017 Zhang Peng Fang Jianbin Yang Canqun Tang Tao Huang Chun Wang Zheng 2018 MOCL An Efficient OpenCL Implementation for the Matrix 2000 Architecture PDF Proc Int l Conf on Computing Frontiers doi 10 1145 3203217 3203244 Status GitHub March 16 2022 OpenCL Demo AMD CPU YouTube December 10 2008 Retrieved March 28 2009 OpenCL Demo Nvidia GPU YouTube December 10 2008 Retrieved March 28 2009 Imagination Technologies launches advanced highly efficient POWERVR SGX543MP multi processor graphics IP family Imagination Technologies March 19 2009 Archived from the original on April 3 2014 Retrieved January 30 2011 AMD and Havok demo OpenCL accelerated physics PC Perspective March 26 2009 Archived from the original on April 5 2009 Retrieved March 28 2009 Nvidia Releases OpenCL Driver To Developers Nvidia April 20 2009 Archived from the original on February 4 2012 Retrieved April 27 2009 AMD does reverse GPGPU announces OpenCL SDK for x86 Ars Technica August 5 2009 Retrieved August 6 2009 permanent dead link Moren Dan Snell Jason June 8 2009 Live Update WWDC 2009 Keynote MacWorld com MacWorld Retrieved June 12 2009 ATI Stream Software Development Kit SDK v2 0 Beta Program Archived from the original on August 9 2009 Retrieved October 14 2009 S3 Graphics launched the Chrome 5400E embedded graphics processor Archived from the original on December 2 2009 Retrieved October 27 2009 VIA Brings Enhanced VN1000 Graphics Processor Archived from the original on December 15 2009 Retrieved December 10 2009 ATI Stream SDK v2 0 with OpenCL 1 0 Support Archived from the original on November 1 2009 Retrieved October 23 2009 OpenCL ZiiLABS Retrieved June 23 2015 Intel discloses new Sandy Bridge technical details Archived from the original on October 31 2013 Retrieved September 13 2010 WebCL related stories Khronos Group Retrieved June 23 2015 Khronos Releases Final WebGL 1 0 Specification Khronos Group Archived from the original on July 9 2015 Retrieved June 23 2015 IBM Developer developer ibm com Welcome to Wikis www ibm com October 20 2009 Nokia Research releases WebCL prototype Khronos Group May 4 2011 Archived from the original on December 5 2020 Retrieved June 23 2015 KamathK Sharath Samsung s WebCL Prototype for WebKit Github com Archived from the original on February 18 2015 Retrieved June 23 2015 AMD Opens the Throttle on APU Performance with Updated OpenCL Software Development Amd com August 8 2011 Retrieved June 16 2013 AMD APP SDK v2 6 Forums amd com March 13 2015 Retrieved June 23 2015 dead link The Portland Group Announces OpenCL Compiler for ST Ericsson ARM Based NovaThor SoCs Retrieved May 4 2012 WebCL Latest Spec Khronos Group November 7 2013 Archived from the original on August 1 2014 Retrieved June 23 2015 Altera Opens the World of FPGAs to Software Programmers with Broad Availability of SDK and Off the Shelf Boards for OpenCL Altera com Archived from the original on January 9 2014 Retrieved January 9 2014 Altera SDK for OpenCL is First in Industry to Achieve Khronos Conformance for FPGAs Altera com Archived from the original on January 9 2014 Retrieved January 9 2014 Khronos Finalizes OpenCL 2 0 Specification for Heterogeneous Computing Khronos Group November 18 2013 Retrieved June 23 2015 WebCL 1 0 Press Release Khronos Group March 19 2014 Retrieved June 23 2015 WebCL 1 0 Specification Khronos Group March 14 2014 Retrieved June 23 2015 Intel OpenCL 2 0 Driver Archived from the original on September 17 2014 Retrieved October 14 2014 AMD OpenCL 2 0 Driver Support AMD com June 17 2015 Retrieved June 23 2015 Xilinx SDAccel development environment for OpenCL C and C achieves Khronos Conformance khronos org news The Khronos Group Retrieved June 26 2017 Release 349 Graphics Drivers for Windows Version 350 12 PDF April 13 2015 Retrieved February 4 2016 AMD APP SDK 3 0 Released Developer AMD com August 26 2015 Retrieved September 11 2015 Khronos Releases OpenCL 2 1 and SPIR V 1 0 Specifications for Heterogeneous Parallel Programming Khronos Group November 16 2015 What s new Intel SDK for OpenCL Applications 2016 R3 Intel Software NVIDIA 378 66 drivers for Windows offer OpenCL 2 0 evaluation support Khronos Group February 17 2017 Archived from the original on August 6 2020 Retrieved March 17 2017 Szuppe Jakub February 22 2017 NVIDIA enables OpenCL 2 0 beta support Szuppe Jakub March 6 2017 NVIDIA beta support for OpenCL 2 0 works on Linux too The Khronos Group The Khronos Group March 21 2019 GitHub RadeonOpenCompute ROCm at roc 3 5 0 GitHub a b NVIDIA is Now OpenCL 3 0 Conformant April 12 2021 a b c The Khronos Group The Khronos Group December 12 2022 Retrieved December 12 2022 Mesa s Rusticl Achieves Official OpenCL 3 0 Conformance www phoronix com Retrieved December 12 2022 The Khronos Group The Khronos Group August 20 2019 Retrieved August 20 2019 KhronosGroup OpenCL CTL The OpenCL Conformance Tests GitHub March 21 2019 OpenCL and the AMD APP SDK AMD Developer Central developer amd com Archived from the original on August 4 2011 Retrieved August 11 2011 About Intel OpenCL SDK 1 1 software intel com intel com Retrieved August 11 2011 Intel SDK for OpenCL Applications Release Notes software intel com March 14 2019 Product Support Retrieved August 11 2011 Intel OpenCL SDK Release Notes Archived from the original on July 17 2011 Retrieved August 11 2011 Announcing OpenCL Development Kit for Linux on Power v0 3 IBM Retrieved August 11 2011 IBM releases OpenCL Development Kit for Linux on Power v0 3 OpenCL 1 1 conformant release available OpenCL Lounge ibm com Retrieved August 11 2011 IBM releases OpenCL Common Runtime for Linux on x86 Architecture IBM October 20 2009 Retrieved September 10 2011 OpenCL and the AMD APP SDK AMD Developer Central developer amd com Archived from the original on September 6 2011 Retrieved September 10 2011 Nvidia Releases OpenCL Driver April 22 2009 Retrieved August 11 2011 clinfo by Simon Leblanc GitHub Retrieved January 27 2017 clinfo by Oblomov GitHub Retrieved January 27 2017 clinfo openCL INFOrmation Retrieved January 27 2017 Khronos Products The Khronos Group Retrieved May 15 2017 OpenCL CTS Test conformance at main KhronosGroup OpenCL CTS GitHub Issues KhronosGroup OpenCL CTS GitHub Intel Compute Runtime 20 43 18277 Brings Alder Lake Support compute runtime 01 org February 7 2018 a b Fang Jianbin Varbanescu Ana Lucia Sips Henk 2011 A Comprehensive Performance Comparison of CUDA and OpenCL 2011 International Conference on Parallel Processing Proc Int l Conf on Parallel Processing pp 216 225 doi 10 1109 ICPP 2011 45 ISBN 978 1 4577 1336 1 Du Peng Weber Rick Luszczek Piotr Tomov Stanimire Peterson Gregory Dongarra Jack 2012 From CUDA to OpenCL Towards a performance portable solution for multi platform GPU programming Parallel Computing 38 8 391 407 CiteSeerX 10 1 1 193 7712 doi 10 1016 j parco 2011 10 002 Dolbeau Romain Bodin Francois de Verdiere Guillaume Colin September 7 2013 One OpenCL to rule them all 2013 IEEE 6th International Workshop on Multi Many core Computing Systems MuCoCoS pp 1 6 doi 10 1109 MuCoCoS 2013 6633603 ISBN 978 1 4799 1010 6 S2CID 225784 Karimi Kamran Dickson Neil G Hamze Firas 2011 A Performance Comparison of CUDA and OpenCL arXiv 1005 2581v3 cs PF A Survey of CPU GPU Heterogeneous Computing Techniques ACM Computing Surveys 2015 Grewe Dominik O Boyle Michael F P 2011 A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL Compiler Construction Proc Int l Conf on Compiler Construction Lecture Notes in Computer Science Vol 6601 pp 286 305 doi 10 1007 978 3 642 19861 8 16 ISBN 978 3 642 19860 1 Radeon RX 6800 Series Has Excellent ROCm Based OpenCL Performance On Linux www phoronix com External links EditOfficial website Official website for WebCL International Workshop on OpenCL Archived January 26 2021 at the Wayback Machine IWOCL sponsored by The Khronos Group Retrieved from https en wikipedia org w index php title OpenCL amp oldid 1137713597, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.