fbpx
Wikipedia

Replay system

The replay system is a subsystem within the Intel Pentium 4 processor.[1] Its primary function is to catch operations that have been mistakenly sent for execution by the processor's scheduler. Operations caught by the replay system are then re-executed in a loop until the conditions necessary for their proper execution have been fulfilled.[2]

Overview edit

The replay system came about as a result of Intel's quest for ever-increasing clock speeds. These higher clock speeds necessitated very lengthy pipelines (up to 31 stages in the Prescott core). Because of this, there are six stages between the scheduler and the execution units in the Prescott core. In an attempt to maintain acceptable performance, Intel engineers had to design the scheduler to be very optimistic.[2]

The scheduler in a Pentium 4 processor is so aggressive that it will send operations for execution without a guarantee that they can be successfully executed. (Among other things, the scheduler assumes all data is in level 1 "trace cache" CPU cache.) The most common reason execution fails is that the requisite data is not available, which itself is most likely due to a cache miss. When this happens, the replay system signals the scheduler to stop, then repeatedly executes the failed string of dependent operations until they have completed successfully.[2][3]

Performance considerations edit

Not surprisingly, in some cases the replay system can have a very bad impact on performance. Under normal circumstances, the execution units in the Pentium 4 are in use roughly 33% of the time. When the replay system is invoked, it will occupy execution units nearly every available cycle. This wastes power, which is an increasingly important architectural design metric, but poses no performance penalty because the execution units would be sitting idle anyway. However, if hyper-threading is in use, the replay system will prevent the other thread from utilizing the execution units. This is the true cause of any performance degradation concerning hyper-threading. In Prescott, the Pentium 4 gained a replay queue, which reduces the time the replay system will occupy the execution units.[2]

In other cases, where each thread is processing different types of operations, the replay system will not interfere, and a performance increase can appear. This explains why performance with hyper-threading is application-dependent.[2]

See also edit

References edit

  1. ^ Carmean, Doug (Spring 2002). "The Intel® Pentium® 4 Processor" (PDF).
  2. ^ a b c d e Replay: Unknown Features of the NetBurst Core (2005-06-06). . X-bit labs. Archived from the original on 2014-04-08. Retrieved 2014-04-07.
  3. ^ González, Antonio; Latorre, Fernando; Magklis, Grigorios (2010). Processor Microarchitecture: An Implementation Perspective. Morgan & Claypool Publishers. p. 68. ISBN 978-1-60845-452-5.

replay, system, this, article, needs, additional, citations, verification, please, help, improve, this, article, adding, citations, reliable, sources, unsourced, material, challenged, removed, find, sources, news, newspapers, books, scholar, jstor, october, 20. This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources Replay system news newspapers books scholar JSTOR October 2017 Learn how and when to remove this template message The replay system is a subsystem within the Intel Pentium 4 processor 1 Its primary function is to catch operations that have been mistakenly sent for execution by the processor s scheduler Operations caught by the replay system are then re executed in a loop until the conditions necessary for their proper execution have been fulfilled 2 Contents 1 Overview 2 Performance considerations 3 See also 4 ReferencesOverview editThe replay system came about as a result of Intel s quest for ever increasing clock speeds These higher clock speeds necessitated very lengthy pipelines up to 31 stages in the Prescott core Because of this there are six stages between the scheduler and the execution units in the Prescott core In an attempt to maintain acceptable performance Intel engineers had to design the scheduler to be very optimistic 2 The scheduler in a Pentium 4 processor is so aggressive that it will send operations for execution without a guarantee that they can be successfully executed Among other things the scheduler assumes all data is in level 1 trace cache CPU cache The most common reason execution fails is that the requisite data is not available which itself is most likely due to a cache miss When this happens the replay system signals the scheduler to stop then repeatedly executes the failed string of dependent operations until they have completed successfully 2 3 Performance considerations editNot surprisingly in some cases the replay system can have a very bad impact on performance Under normal circumstances the execution units in the Pentium 4 are in use roughly 33 of the time When the replay system is invoked it will occupy execution units nearly every available cycle This wastes power which is an increasingly important architectural design metric but poses no performance penalty because the execution units would be sitting idle anyway However if hyper threading is in use the replay system will prevent the other thread from utilizing the execution units This is the true cause of any performance degradation concerning hyper threading In Prescott the Pentium 4 gained a replay queue which reduces the time the replay system will occupy the execution units 2 In other cases where each thread is processing different types of operations the replay system will not interfere and a performance increase can appear This explains why performance with hyper threading is application dependent 2 See also editInstruction pipeline Speculative execution Out of order execution Simultaneous multithreading Data dependencyReferences edit Carmean Doug Spring 2002 The Intel Pentium 4 Processor PDF a b c d e Replay Unknown Features of the NetBurst Core 2005 06 06 Replay Unknown Features of the NetBurst Core X bit labs Archived from the original on 2014 04 08 Retrieved 2014 04 07 Gonzalez Antonio Latorre Fernando Magklis Grigorios 2010 Processor Microarchitecture An Implementation Perspective Morgan amp Claypool Publishers p 68 ISBN 978 1 60845 452 5 Retrieved from https en wikipedia org w index php title Replay system amp oldid 1054169759, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.