fbpx
Wikipedia

Predication (computer architecture)

In computer architecture, predication is a feature that provides an alternative to conditional transfer of control, as implemented by conditional branch machine instructions. Predication works by having conditional (predicated) non-branch instructions associated with a predicate, a Boolean value used by the instruction to control whether the instruction is allowed to modify the architectural state or not. If the predicate specified in the instruction is true, the instruction modifies the architectural state; otherwise, the architectural state is unchanged. For example, a predicated move instruction (a conditional move) will only modify the destination if the predicate is true. Thus, instead of using a conditional branch to select an instruction or a sequence of instructions to execute based on the predicate that controls whether the branch occurs, the instructions to be executed are associated with that predicate, so that they will be executed, or not executed, based on whether that predicate is true or false.[1]

Vector processors, some SIMD ISAs (such as AVX2 and AVX-512) and GPUs in general make heavy use of predication, applying one bit of a conditional mask vector to the corresponding elements in the vector registers being processed, whereas scalar predication in scalar instruction sets only need the one predicate bit. Where predicate masks become particularly powerful in vector processing is if an array of condition codes, one per vector element, may feed back into predicate masks that are then applied to subsequent vector instructions.

Overview edit

Most computer programs contain conditional code, which will be executed only under specific conditions depending on factors that cannot be determined beforehand, for example depending on user input. As the majority of processors simply execute the next instruction in a sequence, the traditional solution is to insert branch instructions that allow a program to conditionally branch to a different section of code, thus changing the next step in the sequence. This was sufficient until designers began improving performance by implementing instruction pipelining, a method which is slowed down by branches. For a more thorough description of the problems which arose, and a popular solution, see branch predictor.

Luckily, one of the more common patterns of code that normally relies on branching has a more elegant solution. Consider the following pseudocode:[1]

if condition  {dosomething} else  {dosomethingelse} 

On a system that uses conditional branching, this might translate to machine instructions looking similar to:[1]

branch-if-condition to label1 dosomethingelse branch-to label2 label1: dosomething label2: ... 

With predication, all possible branch paths are coded inline, but some instructions execute while others do not. The basic idea is that each instruction is associated with a predicate (the word here used similarly to its usage in predicate logic) and that the instruction will only be executed if the predicate is true. The machine code for the above example using predication might look something like this:[1]

(condition) dosomething (not condition) dosomethingelse 

Besides eliminating branches, less code is needed in total, provided the architecture provides predicated instructions. While this does not guarantee faster execution in general, it will if the dosomething and dosomethingelse blocks of code are short enough.

Predication's simplest form is partial predication, where the architecture has conditional move or conditional select instructions. Conditional move instructions write the contents of one register over another only if the predicate's value is true, whereas conditional select instructions choose which of two registers has its contents written to a third based on the predicate's value. A more generalized and capable form is full predication. Full predication has a set of predicate registers for storing predicates (which allows multiple nested or sequential branches to be simultaneously eliminated) and most instructions in the architecture have a register specifier field to specify which predicate register supplies the predicate.[2]

Advantages edit

The main purpose of predication is to avoid jumps over very small sections of program code, increasing the effectiveness of pipelined execution and avoiding problems with the cache. It also has a number of more subtle benefits:

  • Functions that are traditionally computed using simple arithmetic and bitwise operations may be quicker to compute using predicated instructions.
  • Predicated instructions with different predicates can be mixed with each other and with unconditional code, allowing better instruction scheduling and so even better performance.
  • Elimination of unnecessary branch instructions can make the execution of necessary branches, such as those that make up loops, faster by lessening the load on branch prediction mechanisms.
  • Elimination of the cost of a branch misprediction which can be high on deeply pipelined architectures.
  • Instruction sets that have comprehensive Condition Codes generated by instructions may reduce code size further by directly using the Condition Registers in or as predication.

Disadvantages edit

Predication's primary drawback is in increased encoding space. In typical implementations, every instruction reserves a bitfield for the predicate specifying under what conditions that instruction should have an effect. When available memory is limited, as on embedded devices, this space cost can be prohibitive. However, some architectures such as Thumb-2 are able to avoid this issue (see below). Other detriments are the following:[3]

  • Predication complicates the hardware by adding levels of logic to critical paths and potentially degrades clock speed.
  • A predicated block includes cycles for all operations, so shorter paths may take longer and be penalized.
  • Predication is not usually speculated and causes a longer dependency chain. For ordered data this translates to a performance loss compared to a predictable branch.[4]

Predication is most effective when paths are balanced or when the longest path is the most frequently executed,[3] but determining such a path is very difficult at compile time, even in the presence of profiling information.

History edit

Predicated instructions were popular in European computer designs of the 1950s, including the Mailüfterl (1955), the Zuse Z22 (1955), the ZEBRA (1958), and the Electrologica X1 (1958). The IBM ACS-1 design of 1967 allocated a "skip" bit in its instruction formats, and the CDC Flexible Processor in 1976 allocated three conditional execution bits in its microinstruction formats.

Hewlett-Packard's PA-RISC architecture (1986) had a feature called nullification, which allowed most instructions to be predicated by the previous instruction. IBM's POWER architecture (1990) featured conditional move instructions. POWER's successor, PowerPC (1993), dropped these instructions. Digital Equipment Corporation's Alpha architecture (1992) also featured conditional move instructions. MIPS gained conditional move instructions in 1994 with the MIPS IV version; and SPARC was extended in Version 9 (1994) with conditional move instructions for both integer and floating-point registers.

In the Hewlett-Packard/Intel IA-64 architecture, most instructions are predicated. The predicates are stored in 64 special-purpose predicate registers; and one of the predicate registers is always true so that unpredicated instructions are simply instructions predicated with the value true. The use of predication is essential in IA-64's implementation of software pipelining because it avoids the need for writing separated code for prologs and epilogs.[clarification needed]

In the x86 architecture, a family of conditional move instructions (CMOV and FCMOV) were added to the architecture by the Intel Pentium Pro (1995) processor. The CMOV instructions copied the contents of the source register to the destination register depending on a predicate supplied by the value of the flag register.

In the ARM architecture, the original 32-bit instruction set provides a feature called conditional execution that allows most instructions to be predicated by one of 13 predicates that are based on some combination of the four condition codes set by the previous instruction. ARM's Thumb instruction set (1994) dropped conditional execution to reduce the size of instructions so they could fit in 16 bits, but its successor, Thumb-2 (2003) overcame this problem by using a special instruction which has no effect other than to supply predicates for the following four instructions. The 64-bit instruction set introduced in ARMv8-A (2011) replaced conditional execution with conditional selection instructions.

SIMD, SIMT and vector predication edit

Some SIMD instruction sets, like AVX2, have the ability to use a logical mask to conditionally load/store values to memory, a parallel form of the conditional move, and may also apply individual mask bits to individual arithmetic units executing a parallel operation. The technique is known in Flynn's taxonomy as "associative processing".

This form of predication is also used in vector processors and single instruction, multiple threads GPU computing. All the techniques, advantages and disadvantages of single scalar predication apply just as well to the parallel processing case.

See also edit

References edit

  1. ^ a b c d Rick Vinyard (2000-04-26). . cs.nmsu.edu. Archived from the original on 20 Apr 2015. Retrieved 2014-04-22.
  2. ^ Mahlke, Scott A.; Hank, Richard E.; McCormick, James E.; August, David I.; Hwn, Wen-mei W. (1995). A Comparison of Full and Partial Predicated Execution Support for ILP Processors. The 22nd International Symposium on Computer Architecture, 22–24 June 1995. CiteSeerX 10.1.1.19.3187. doi:10.1145/223982.225965. ISBN 0-89791-698-0.
  3. ^ a b Fisher, Joseph A.; Faraboschi, Paolo; Young, Cliff (2004). "4.5.2 Predication § Predication in the Embedded Domain". Embedded Computing — A VLIW Approach to Architecture, Compilers, and Tools. Elsevier. p. 172. ISBN 9780080477541.
  4. ^ Cordes, Peter. "assembly - How does Out of Order execution work with conditional instructions, Ex: CMOVcc in Intel or ADDNE (Add not equal) in ARM". Stack Overflow. Unlike with control dependencies (branches), they don't predict or speculate what the flags will be, so a cmovcc instead of a jcc can create a loop-carried dependency chain and end up being worse than a predictable branch. gcc optimization flag -O3 makes code slower than -O2 is an example of that. {{cite web}}: External link in |quote= (help)

Further reading edit

  • Clements, Alan (2013). "8.3.7 Predication". Computer Organization & Architecture: Themes and Variations. Cengage Learning. pp. 532–9. ISBN 978-1-285-41542-0.

predication, computer, architecture, this, article, needs, additional, citations, verification, please, help, improve, this, article, adding, citations, reliable, sources, unsourced, material, challenged, removed, find, sources, predication, computer, architec. This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources Predication computer architecture news newspapers books scholar JSTOR March 2014 Learn how and when to remove this template message Not to be confused with branch prediction In computer architecture predication is a feature that provides an alternative to conditional transfer of control as implemented by conditional branch machine instructions Predication works by having conditional predicated non branch instructions associated with a predicate a Boolean value used by the instruction to control whether the instruction is allowed to modify the architectural state or not If the predicate specified in the instruction is true the instruction modifies the architectural state otherwise the architectural state is unchanged For example a predicated move instruction a conditional move will only modify the destination if the predicate is true Thus instead of using a conditional branch to select an instruction or a sequence of instructions to execute based on the predicate that controls whether the branch occurs the instructions to be executed are associated with that predicate so that they will be executed or not executed based on whether that predicate is true or false 1 Vector processors some SIMD ISAs such as AVX2 and AVX 512 and GPUs in general make heavy use of predication applying one bit of a conditional mask vector to the corresponding elements in the vector registers being processed whereas scalar predication in scalar instruction sets only need the one predicate bit Where predicate masks become particularly powerful in vector processing is if an array of condition codes one per vector element may feed back into predicate masks that are then applied to subsequent vector instructions Contents 1 Overview 2 Advantages 3 Disadvantages 4 History 5 SIMD SIMT and vector predication 6 See also 7 References 8 Further readingOverview editMost computer programs contain conditional code which will be executed only under specific conditions depending on factors that cannot be determined beforehand for example depending on user input As the majority of processors simply execute the next instruction in a sequence the traditional solution is to insert branch instructions that allow a program to conditionally branch to a different section of code thus changing the next step in the sequence This was sufficient until designers began improving performance by implementing instruction pipelining a method which is slowed down by branches For a more thorough description of the problems which arose and a popular solution see branch predictor Luckily one of the more common patterns of code that normally relies on branching has a more elegant solution Consider the following pseudocode 1 if condition dosomething else dosomethingelse On a system that uses conditional branching this might translate to machine instructions looking similar to 1 branch if condition to label1 dosomethingelse branch to label2 label1 dosomething label2 With predication all possible branch paths are coded inline but some instructions execute while others do not The basic idea is that each instruction is associated with a predicate the word here used similarly to its usage in predicate logic and that the instruction will only be executed if the predicate is true The machine code for the above example using predication might look something like this 1 condition dosomething not condition dosomethingelse Besides eliminating branches less code is needed in total provided the architecture provides predicated instructions While this does not guarantee faster execution in general it will if the dosomething and dosomethingelse blocks of code are short enough Predication s simplest form is partial predication where the architecture has conditional move or conditional select instructions Conditional move instructions write the contents of one register over another only if the predicate s value is true whereas conditional select instructions choose which of two registers has its contents written to a third based on the predicate s value A more generalized and capable form is full predication Full predication has a set of predicate registers for storing predicates which allows multiple nested or sequential branches to be simultaneously eliminated and most instructions in the architecture have a register specifier field to specify which predicate register supplies the predicate 2 Advantages editThe main purpose of predication is to avoid jumps over very small sections of program code increasing the effectiveness of pipelined execution and avoiding problems with the cache It also has a number of more subtle benefits Functions that are traditionally computed using simple arithmetic and bitwise operations may be quicker to compute using predicated instructions Predicated instructions with different predicates can be mixed with each other and with unconditional code allowing better instruction scheduling and so even better performance Elimination of unnecessary branch instructions can make the execution of necessary branches such as those that make up loops faster by lessening the load on branch prediction mechanisms Elimination of the cost of a branch misprediction which can be high on deeply pipelined architectures Instruction sets that have comprehensive Condition Codes generated by instructions may reduce code size further by directly using the Condition Registers in or as predication Disadvantages editPredication s primary drawback is in increased encoding space In typical implementations every instruction reserves a bitfield for the predicate specifying under what conditions that instruction should have an effect When available memory is limited as on embedded devices this space cost can be prohibitive However some architectures such as Thumb 2 are able to avoid this issue see below Other detriments are the following 3 Predication complicates the hardware by adding levels of logic to critical paths and potentially degrades clock speed A predicated block includes cycles for all operations so shorter paths may take longer and be penalized Predication is not usually speculated and causes a longer dependency chain For ordered data this translates to a performance loss compared to a predictable branch 4 Predication is most effective when paths are balanced or when the longest path is the most frequently executed 3 but determining such a path is very difficult at compile time even in the presence of profiling information History editPredicated instructions were popular in European computer designs of the 1950s including the Mailufterl 1955 the Zuse Z22 1955 the ZEBRA 1958 and the Electrologica X1 1958 The IBM ACS 1 design of 1967 allocated a skip bit in its instruction formats and the CDC Flexible Processor in 1976 allocated three conditional execution bits in its microinstruction formats Hewlett Packard s PA RISC architecture 1986 had a feature called nullification which allowed most instructions to be predicated by the previous instruction IBM s POWER architecture 1990 featured conditional move instructions POWER s successor PowerPC 1993 dropped these instructions Digital Equipment Corporation s Alpha architecture 1992 also featured conditional move instructions MIPS gained conditional move instructions in 1994 with the MIPS IV version and SPARC was extended in Version 9 1994 with conditional move instructions for both integer and floating point registers In the Hewlett Packard Intel IA 64 architecture most instructions are predicated The predicates are stored in 64 special purpose predicate registers and one of the predicate registers is always true so that unpredicated instructions are simply instructions predicated with the value true The use of predication is essential in IA 64 s implementation of software pipelining because it avoids the need for writing separated code for prologs and epilogs clarification needed In the x86 architecture a family of conditional move instructions CMOV and FCMOV were added to the architecture by the Intel Pentium Pro 1995 processor The CMOV instructions copied the contents of the source register to the destination register depending on a predicate supplied by the value of the flag register In the ARM architecture the original 32 bit instruction set provides a feature called conditional execution that allows most instructions to be predicated by one of 13 predicates that are based on some combination of the four condition codes set by the previous instruction ARM s Thumb instruction set 1994 dropped conditional execution to reduce the size of instructions so they could fit in 16 bits but its successor Thumb 2 2003 overcame this problem by using a special instruction which has no effect other than to supply predicates for the following four instructions The 64 bit instruction set introduced in ARMv8 A 2011 replaced conditional execution with conditional selection instructions SIMD SIMT and vector predication editSome SIMD instruction sets like AVX2 have the ability to use a logical mask to conditionally load store values to memory a parallel form of the conditional move and may also apply individual mask bits to individual arithmetic units executing a parallel operation The technique is known in Flynn s taxonomy as associative processing This form of predication is also used in vector processors and single instruction multiple threads GPU computing All the techniques advantages and disadvantages of single scalar predication apply just as well to the parallel processing case See also editBranch predictor Control flow Delay slot Instruction level parallelism Optimizing compiler Pipeline stall Software pipelining Speculative execution Vector processor Very long instruction wordReferences edit a b c d Rick Vinyard 2000 04 26 Predication cs nmsu edu Archived from the original on 20 Apr 2015 Retrieved 2014 04 22 Mahlke Scott A Hank Richard E McCormick James E August David I Hwn Wen mei W 1995 A Comparison of Full and Partial Predicated Execution Support for ILP Processors The 22nd International Symposium on Computer Architecture 22 24 June 1995 CiteSeerX 10 1 1 19 3187 doi 10 1145 223982 225965 ISBN 0 89791 698 0 a b Fisher Joseph A Faraboschi Paolo Young Cliff 2004 4 5 2 Predication Predication in the Embedded Domain Embedded Computing A VLIW Approach to Architecture Compilers and Tools Elsevier p 172 ISBN 9780080477541 Cordes Peter assembly How does Out of Order execution work with conditional instructions Ex CMOVcc in Intel or ADDNE Add not equal in ARM Stack Overflow Unlike with control dependencies branches they don t predict or speculate what the flags will be so a cmovcc instead of a jcc can create a loop carried dependency chain and end up being worse than a predictable branch gcc optimization flag O3 makes code slower than O2 is an example of that a href Template Cite web html title Template Cite web cite web a External link in code class cs1 code quote code help Further reading editClements Alan 2013 8 3 7 Predication Computer Organization amp Architecture Themes and Variations Cengage Learning pp 532 9 ISBN 978 1 285 41542 0 Retrieved from https en wikipedia org w index php title Predication computer architecture amp oldid 1200765675 History, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.