fbpx
Wikipedia

AltiVec

AltiVec is a single-precision floating point and integer SIMD instruction set designed and owned by Apple, IBM, and Freescale Semiconductor (formerly Motorola's Semiconductor Products Sector) — the AIM alliance. It is implemented on versions of the PowerPC processor architecture, including Motorola's G4, IBM's G5 and POWER6 processors, and P.A. Semi's PWRficient PA6T. AltiVec is a trademark owned solely by Freescale, so the system is also referred to as Velocity Engine by Apple and VMX (Vector Multimedia Extension) by IBM and P.A. Semi.

While AltiVec refers to an instruction set, the implementations in CPUs produced by IBM and Motorola are separate in terms of logic design. To date, no IBM core has included an AltiVec logic design licensed from Motorola or vice versa.

AltiVec is a standard part of the Power ISA v.2.03[1] specification. It was never formally a part of the PowerPC architecture until this specification although it used PowerPC instruction formats and syntax and occupied the opcode space expressly allocated for such purposes.

Comparison to x86-64 SSE edit

Both VMX/AltiVec and SSE feature 128-bit vector registers that can represent sixteen 8-bit signed or unsigned chars, eight 16-bit signed or unsigned shorts, four 32-bit ints or four 32-bit floating-point variables. Both provide cache-control instructions intended to minimize cache pollution when working on streams of data.

They also exhibit important differences. Unlike SSE2, VMX/AltiVec supports a special RGB "pixel" data type, but it does not operate on 64-bit double-precision floats, and there is no way to move data directly between scalar and vector registers. In keeping with the "load/store" model of the PowerPC's RISC design, the vector registers, like the scalar registers, can only be loaded from and stored to memory. However, VMX/AltiVec provides a much more complete set of "horizontal" operations that work across all the elements of a vector; the allowable combinations of data type and operations are much more complete. Thirty-two 128-bit vector registers are provided, compared to eight for SSE and SSE2 (extended to 16 in x86-64), and most VMX/AltiVec instructions take three register operands compared to only two register/register or register/memory operands on IA-32.

VMX/AltiVec is also unique in its support for a flexible vector permute instruction, in which each byte of a resulting vector value can be taken from any byte of either of two other vectors, parametrized by yet another vector. This allows for sophisticated manipulations in a single instruction.

Recent versions[when?] of the GNU Compiler Collection (GCC), IBM VisualAge compiler and other compilers provide intrinsics to access VMX/AltiVec instructions directly from C and C++ programs. As of version 4, the GCC also includes auto-vectorization capabilities that attempt to intelligently create VMX/Altivec accelerated binaries without the need for the programmer to use intrinsics directly. The "vector" type keyword is introduced to permit the declaration of native vector types, e.g., "vector unsigned char foo;" declares a 128-bit vector variable named "foo" containing sixteen 8-bit unsigned chars. The full complement of arithmetic and binary operators is defined on vector types so that the normal C expression language can be used to manipulate vector variables. There are also overloaded intrinsic functions such as "vec_add" that emit the appropriate opcode based on the type of the elements within the vector, and very strong type checking is enforced. In contrast, the Intel-defined data types for IA-32 SIMD registers declare only the size of the vector register (128 or 64 bits) and in the case of a 128-bit register, whether it contains integers or floating-point values. The programmer must select the appropriate intrinsic for the data types in use, e.g., "_mm_add_epi16(x,y)" for adding two vectors containing eight 16-bit integers.

Development history edit

The Power Vector Media Extension (VMX) was developed between 1996 and 1998 by a collaborative project between Apple, IBM, and Motorola. Apple was the primary customer for Power Vector Media Extension (VMX) until Apple switched to Intel-made, x86-based CPUs on June 6, 2005. They used it to accelerate multimedia applications such as QuickTime, iTunes and key parts of Apple's Mac OS X including in the Quartz graphics compositor. Other companies such as Adobe used AltiVec to optimize their image-processing programs such as Adobe Photoshop. Motorola was the first to supply AltiVec enabled processors starting with their G4 line. AltiVec was also used in some embedded systems for high-performance digital signal processing.

IBM consistently left VMX out of their earlier POWER microprocessors, which were intended for server applications where it was not very useful. The POWER6 microprocessor, introduced in 2007, implements AltiVec. The last desktop microprocessor from IBM, the PowerPC 970 (dubbed the "G5" by Apple) also implemented AltiVec with hardware similar to that of the PowerPC 7400.

AltiVec is a brandname trademarked by Freescale (previously Motorola) for the standard Category:Vector part of the Power ISA v.2.03[1] specification. This category is also known as VMX (used by IBM), and "Velocity Engine" (a brand name previously used by Apple).

The Cell Broadband Engine, used in (amongst other things) the PlayStation 3, also supports Power Vector Media Extension (VMX) in its PPU, with the SPU ISA being enhanced but architecturally similar.

Freescale is bringing an enhanced version of AltiVec to e6500 based QorIQ processors.

VMX128 edit

IBM enhanced VMX for use in Xenon (Xbox 360) and called this enhancement VMX128. The enhancements comprise new routines targeted at gaming (accelerating 3D graphics and game physics)[2] and a total of 128 registers. VMX128 is not entirely compatible with VMX/Altivec, as a number of integer operations were removed to make space for the larger register file and additional application-specific operations.[3] [4]

VSX (Vector Scalar Extension) edit

Power ISA v2.06 introduced VSX vector-scalar instructions[5] which extend SIMD processing for the Power ISA to support up to 64 registers, with support for regular floating point, decimal floating point and vector execution. POWER7 is the first Power ISA processor to implement Power ISA v2.06.

New instructions are introduced by IBM under the Vector Media Extension category for integer operations as part of the VSX extension in Power ISA 2.07.

New integer vector instructions were introduced by IBM following the VMX encodings as part of the VSX extension in Power ISA v3.0. Shall be introduced with POWER9 processors.[6]

Issues edit

In C++, the standard way of accessing AltiVec support is mutually exclusive with the use of the Standard Template Library vector<> class template due to the treatment of "vector" as a reserved word when the compiler does not implement the context-sensitive keyword version of vector. However, it may be possible to combine them using compiler-specific workarounds; for instance, in GCC one may do #undef vector to remove the vector keyword, and then use the GCC-specific __vector keyword in its place.

AltiVec prior to Power ISA 2.06 with VSX lacks loading from memory using a type's natural alignment. For example, the code below requires special handling for Power6 and below when the effective address is not 16-byte aligned. The special handling adds 3 additional instructions to a load operation when VSX is not available.

#include <altivec.h> typedef __vector unsigned char uint8x16_p; typedef __vector unsigned int uint32x4_p; ... int main(int argc, char* argv) {  /* Natural alignment of vals is 4; and not 16 as required */  unsigned int vals[4] = { 1, 2, 3, 4 };  uint32x4_p vec;  #if defined(__VSX__) || defined(_ARCH_PWR8)  vec = vec_xl(0, vals); #else  const uint8x16_p perm = vec_lvsl(0, vals);  const uint8x16_p low = vec_ld(0, vals);  const uint8x16_p high = vec_ld(15, vals);  vec = (uint32x4_p)vec_perm(low, high, perm); #endif  } 

AltiVec prior to Power ISA 2.06 with VMX lacks 64-bit integer support. Developers who wish to operate on 64-bit data will develop routines from 32-bit components. For example, below are examples of 64-bit add and subtract in C using a vector with four 32-bit words on a big-endian machine. The permutes move the carry and borrow bits from columns 1 and 3 to columns 0 and 2 like in school-book math. A little-endian machine would need a different mask.

#include <altivec.h> typedef __vector unsigned char uint8x16_p; typedef __vector unsigned int uint32x4_p; ...  /* Performs a+b as if the vector held two 64-bit double words */ uint32x4_p add64(const uint32x4_p a, const uint32x4_p b) {  const uint8x16_p cmask = {4,5,6,7, 16,16,16,16, 12,13,14,15, 16,16,16,16};  const uint32x4_p zero = {0, 0, 0, 0};   uint32x4_p cy = vec_addc(vec1, vec2);  cy = vec_perm(cy, zero, cmask);  return vec_add(vec_add(vec1, vec2), cy); }  /* Performs a-b as if the vector held two 64-bit double words */ uint32x4_p sub64(const uint32x4_p a, const uint32x4_p b) {  const uint8x16_p bmask = {4,5,6,7, 16,16,16,16, 12,13,14,15, 16,16,16,16};  const uint32x4_p amask = {1, 1, 1, 1};  const uint32x4_p zero = {0, 0, 0, 0};   uint32x4_p bw = vec_subc(vec1, vec2);  bw = vec_andc(amask, bw);  bw = vec_perm(bw, zero, bmask);  return vec_sub(vec_sub(vec1, vec2), bw); } 

Power ISA 2.07 used in Power8 finally provided the 64-bit double words. A developer working with Power8 needs only to perform the following.

#include <altivec.h> typedef __vector unsigned long long uint64x2_p; ...  /* Performs a+b using native vector 64-bit double words */ uint64x2_p add64(const uint64x2_p a, const uint64x2_p b) {  return vec_add(a, b); }  /* Performs a-b using native vector 64-bit double words */ uint64x2_p sub64(const uint64x2_p a, const uint64x2_p b) {  return vec_sub(a, b); } 

Implementations edit

The following processors have AltiVec, VMX or VMX128 included

Motorola/Freescale edit

IBM edit

P.A. Semi edit

Software Applications edit

The following software applications are known to leverage AltiVec or VMX hardware acceleration.

  • Helios has a native POWER9 / POWER10 port with support for VMX.[7]

References edit

  1. ^ a b "Power ISA v.2.03" (PDF). Power.org.[permanent dead link]
  2. ^ . IBM. October 2015. Archived from the original on 2008-01-20.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
  3. ^ IBM Research
  4. ^ Implementing instruction set architectures with non-contiguous register file specifiers 2022-01-25 at the Wayback Machine US Patent 7,421,566
  5. ^ "Workload acceleration with the IBM POWER vector-scalar architecture". IBM. 2016-03-01. from the original on 2022-01-25. Retrieved 2017-05-02.
  6. ^ "Peter Bergner - [PATCH, COMMITTED] Add full Power ISA 3.0 / POWER9 binutils support". from the original on 2016-03-07. Retrieved 2016-12-24.
  7. ^ "FAQ, Helios". Helios. Retrieved 2021-07-09.

External links edit

altivec, single, precision, floating, point, integer, simd, instruction, designed, owned, apple, freescale, semiconductor, formerly, motorola, semiconductor, products, sector, alliance, implemented, versions, powerpc, processor, architecture, including, motoro. AltiVec is a single precision floating point and integer SIMD instruction set designed and owned by Apple IBM and Freescale Semiconductor formerly Motorola s Semiconductor Products Sector the AIM alliance It is implemented on versions of the PowerPC processor architecture including Motorola s G4 IBM s G5 and POWER6 processors and P A Semi s PWRficient PA6T AltiVec is a trademark owned solely by Freescale so the system is also referred to as Velocity Engine by Apple and VMX Vector Multimedia Extension by IBM and P A Semi While AltiVec refers to an instruction set the implementations in CPUs produced by IBM and Motorola are separate in terms of logic design To date no IBM core has included an AltiVec logic design licensed from Motorola or vice versa AltiVec is a standard part of the Power ISA v 2 03 1 specification It was never formally a part of the PowerPC architecture until this specification although it used PowerPC instruction formats and syntax and occupied the opcode space expressly allocated for such purposes Contents 1 Comparison to x86 64 SSE 2 Development history 2 1 VMX128 2 2 VSX Vector Scalar Extension 3 Issues 4 Implementations 4 1 Motorola Freescale 4 2 IBM 4 3 P A Semi 5 Software Applications 6 References 7 External linksComparison to x86 64 SSE editBoth VMX AltiVec and SSE feature 128 bit vector registers that can represent sixteen 8 bit signed or unsigned chars eight 16 bit signed or unsigned shorts four 32 bit ints or four 32 bit floating point variables Both provide cache control instructions intended to minimize cache pollution when working on streams of data They also exhibit important differences Unlike SSE2 VMX AltiVec supports a special RGB pixel data type but it does not operate on 64 bit double precision floats and there is no way to move data directly between scalar and vector registers In keeping with the load store model of the PowerPC s RISC design the vector registers like the scalar registers can only be loaded from and stored to memory However VMX AltiVec provides a much more complete set of horizontal operations that work across all the elements of a vector the allowable combinations of data type and operations are much more complete Thirty two 128 bit vector registers are provided compared to eight for SSE and SSE2 extended to 16 in x86 64 and most VMX AltiVec instructions take three register operands compared to only two register register or register memory operands on IA 32 VMX AltiVec is also unique in its support for a flexible vector permute instruction in which each byte of a resulting vector value can be taken from any byte of either of two other vectors parametrized by yet another vector This allows for sophisticated manipulations in a single instruction Recent versions when of the GNU Compiler Collection GCC IBM VisualAge compiler and other compilers provide intrinsics to access VMX AltiVec instructions directly from C and C programs As of version 4 the GCC also includes auto vectorization capabilities that attempt to intelligently create VMX Altivec accelerated binaries without the need for the programmer to use intrinsics directly The vector type keyword is introduced to permit the declaration of native vector types e g vector unsigned char foo declares a 128 bit vector variable named foo containing sixteen 8 bit unsigned chars The full complement of arithmetic and binary operators is defined on vector types so that the normal C expression language can be used to manipulate vector variables There are also overloaded intrinsic functions such as vec add that emit the appropriate opcode based on the type of the elements within the vector and very strong type checking is enforced In contrast the Intel defined data types for IA 32 SIMD registers declare only the size of the vector register 128 or 64 bits and in the case of a 128 bit register whether it contains integers or floating point values The programmer must select the appropriate intrinsic for the data types in use e g mm add epi16 x y for adding two vectors containing eight 16 bit integers Development history editThe Power Vector Media Extension VMX was developed between 1996 and 1998 by a collaborative project between Apple IBM and Motorola Apple was the primary customer for Power Vector Media Extension VMX until Apple switched to Intel made x86 based CPUs on June 6 2005 They used it to accelerate multimedia applications such as QuickTime iTunes and key parts of Apple s Mac OS X including in the Quartz graphics compositor Other companies such as Adobe used AltiVec to optimize their image processing programs such as Adobe Photoshop Motorola was the first to supply AltiVec enabled processors starting with their G4 line AltiVec was also used in some embedded systems for high performance digital signal processing IBM consistently left VMX out of their earlier POWER microprocessors which were intended for server applications where it was not very useful The POWER6 microprocessor introduced in 2007 implements AltiVec The last desktop microprocessor from IBM the PowerPC 970 dubbed the G5 by Apple also implemented AltiVec with hardware similar to that of the PowerPC 7400 AltiVec is a brandname trademarked by Freescale previously Motorola for the standard Category Vector part of the Power ISA v 2 03 1 specification This category is also known as VMX used by IBM and Velocity Engine a brand name previously used by Apple The Cell Broadband Engine used in amongst other things the PlayStation 3 also supports Power Vector Media Extension VMX in its PPU with the SPU ISA being enhanced but architecturally similar Freescale is bringing an enhanced version of AltiVec to e6500 based QorIQ processors VMX128 edit IBM enhanced VMX for use in Xenon Xbox 360 and called this enhancement VMX128 The enhancements comprise new routines targeted at gaming accelerating 3D graphics and game physics 2 and a total of 128 registers VMX128 is not entirely compatible with VMX Altivec as a number of integer operations were removed to make space for the larger register file and additional application specific operations 3 4 VSX Vector Scalar Extension edit Power ISA v2 06 introduced VSX vector scalar instructions 5 which extend SIMD processing for the Power ISA to support up to 64 registers with support for regular floating point decimal floating point and vector execution POWER7 is the first Power ISA processor to implement Power ISA v2 06 New instructions are introduced by IBM under the Vector Media Extension category for integer operations as part of the VSX extension in Power ISA 2 07 New integer vector instructions were introduced by IBM following the VMX encodings as part of the VSX extension in Power ISA v3 0 Shall be introduced with POWER9 processors 6 Issues editIn C the standard way of accessing AltiVec support is mutually exclusive with the use of the Standard Template Library vector lt gt class template due to the treatment of vector as a reserved word when the compiler does not implement the context sensitive keyword version of vector However it may be possible to combine them using compiler specific workarounds for instance in GCC one may do undef vector to remove the vector keyword and then use the GCC specific vector keyword in its place AltiVec prior to Power ISA 2 06 with VSX lacks loading from memory using a type s natural alignment For example the code below requires special handling for Power6 and below when the effective address is not 16 byte aligned The special handling adds 3 additional instructions to a load operation when VSX is not available include lt altivec h gt typedef vector unsigned char uint8x16 p typedef vector unsigned int uint32x4 p int main int argc char argv Natural alignment of vals is 4 and not 16 as required unsigned int vals 4 1 2 3 4 uint32x4 p vec if defined VSX defined ARCH PWR8 vec vec xl 0 vals else const uint8x16 p perm vec lvsl 0 vals const uint8x16 p low vec ld 0 vals const uint8x16 p high vec ld 15 vals vec uint32x4 p vec perm low high perm endif AltiVec prior to Power ISA 2 06 with VMX lacks 64 bit integer support Developers who wish to operate on 64 bit data will develop routines from 32 bit components For example below are examples of 64 bit add and subtract in C using a vector with four 32 bit words on a big endian machine The permutes move the carry and borrow bits from columns 1 and 3 to columns 0 and 2 like in school book math A little endian machine would need a different mask include lt altivec h gt typedef vector unsigned char uint8x16 p typedef vector unsigned int uint32x4 p Performs a b as if the vector held two 64 bit double words uint32x4 p add64 const uint32x4 p a const uint32x4 p b const uint8x16 p cmask 4 5 6 7 16 16 16 16 12 13 14 15 16 16 16 16 const uint32x4 p zero 0 0 0 0 uint32x4 p cy vec addc vec1 vec2 cy vec perm cy zero cmask return vec add vec add vec1 vec2 cy Performs a b as if the vector held two 64 bit double words uint32x4 p sub64 const uint32x4 p a const uint32x4 p b const uint8x16 p bmask 4 5 6 7 16 16 16 16 12 13 14 15 16 16 16 16 const uint32x4 p amask 1 1 1 1 const uint32x4 p zero 0 0 0 0 uint32x4 p bw vec subc vec1 vec2 bw vec andc amask bw bw vec perm bw zero bmask return vec sub vec sub vec1 vec2 bw Power ISA 2 07 used in Power8 finally provided the 64 bit double words A developer working with Power8 needs only to perform the following include lt altivec h gt typedef vector unsigned long long uint64x2 p Performs a b using native vector 64 bit double words uint64x2 p add64 const uint64x2 p a const uint64x2 p b return vec add a b Performs a b using native vector 64 bit double words uint64x2 p sub64 const uint64x2 p a const uint64x2 p b return vec sub a b Implementations editThe following processors have AltiVec VMX or VMX128 included Motorola Freescale edit MPC7400 MPC7410 MPC7450 MPC7445 7455 MPC7447 7447A 7457 MPC7448 MPC8641 8641D MPC8640 8640D MPC8610 T2081 T2080 T4080 T4160 T4240 B4420 B4860IBM edit PowerPC 970 PowerPC 970FX PowerPC 970MP Xenon Cell B E PowerXCell 8i POWER6 POWER6 POWER7 POWER7 POWER8 POWER9 Power10P A Semi edit PA6TSoftware Applications editThe following software applications are known to leverage AltiVec or VMX hardware acceleration Helios has a native POWER9 POWER10 port with support for VMX 7 References edit a b Power ISA v 2 03 PDF Power org permanent dead link The Microsoft Xbox 360 CPU story IBM October 2015 Archived from the original on 2008 01 20 a href Template Cite web html title Template Cite web cite web a CS1 maint bot original URL status unknown link Using data parallel SIMD architecture in video games and supercomputers IBM Research Implementing instruction set architectures with non contiguous register file specifiers Archived 2022 01 25 at the Wayback Machine US Patent 7 421 566 Workload acceleration with the IBM POWER vector scalar architecture IBM 2016 03 01 Archived from the original on 2022 01 25 Retrieved 2017 05 02 Peter Bergner PATCH COMMITTED Add full Power ISA 3 0 POWER9 binutils support Archived from the original on 2016 03 07 Retrieved 2016 12 24 FAQ Helios Helios Retrieved 2021 07 09 External links editUnrolling AltiVec Part 1 Introducing the PowerPC SIMD unit at IBM archived at the Wayback Machine on 2012 09 10 AltiVec Technologies at Freescale archived at the Wayback Machine on 2012 02 04 Using data parallel SIMD architecture in video games and supercomputers at IBM archived at the Wayback Machine on 2012 02 08 Velocity Engine at Apple archived at the Wayback Machine on 2009 11 28 SIMD history and performance comparison Retrieved from https en wikipedia org w index php title AltiVec amp oldid 1156474244, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.