fbpx
Wikipedia

Bulldozer (microarchitecture)

The AMD Bulldozer Family 15h is a microprocessor microarchitecture for the FX and Opteron line of processors, developed by AMD for the desktop and server markets.[1][2] Bulldozer is the codename for this family of microarchitectures. It was released on October 12, 2011, as the successor to the K10 microarchitecture.

Bulldozer - Family 15h
General information
LaunchedLate 2011
DiscontinuedPresent
Common manufacturer(s)
Architecture and classification
Technology node32 nm
Instruction setx86-64
Physical specifications
Socket(s)
Products, models, variants
Core name(s)
History
PredecessorFamily 10h (K10)
SuccessorPiledriver - Family 15h (2nd-gen)

Bulldozer is designed from scratch, not a development of earlier processors.[3] The core is specifically aimed at computing products with TDPs of 10 to 125 watts. AMD claims dramatic performance-per-watt efficiency improvements in high-performance computing (HPC) applications with Bulldozer cores.

The Bulldozer cores support most of the instruction sets implemented by Intel processors (Sandy Bridge) available at its introduction (including SSE4.1, SSE4.2, AES, CLMUL, and AVX) as well as new instruction sets proposed by AMD; ABM, XOP, FMA4 and F16C.[4][5] Only Bulldozer GEN4 (Excavator) supports AVX2 instruction sets.

Overview

According to AMD, Bulldozer-based CPUs are based on GlobalFoundries' 32 nm Silicon on insulator (SOI) process technology and reuses the approach of DEC for multitasking computer performance with the arguments that it, according to press notes, "balances dedicated and shared computer resources to provide a highly compact, high units count design that is easily replicated on a chip for performance scaling."[6] In other words, by eliminating some of the "redundant" elements that naturally creep into multicore designs, AMD has hoped to take better advantage of its hardware capabilities, while using less power.

Bulldozer-based implementations built on 32nm SOI with HKMG arrived in October 2011 for both servers and desktops. The server segment included the dual chip (16-core) Opteron processor codenamed Interlagos (for Socket G34) and single chip (4, 6 or 8 cores) Valencia (for Socket C32), while the Zambezi (4, 6 and 8 cores) targeted desktops on Socket AM3+.[7][8]

Bulldozer is the first major redesign of AMD’s processor architecture since 2003, when the firm launched its K8 processors, and also features two 128-bit FMA-capable FPUs which can be combined into one 256-bit FPU. This design is accompanied by two integer clusters, each with 4 pipelines (the fetch/decode stage is shared). Bulldozer also introduced shared L2 cache in the new architecture. AMD calls this design a "Module". A 16-core processor design would feature eight of these "modules",[9] but the operating system will recognize each "module" as two logical cores.

The modular architecture consists of multithreaded shared L2 cache and FlexFPU, which uses simultaneous multithreading. Each physical integer core, two per module, is single threaded, in contrast with Intel's Hyperthreading, where two virtual simultaneous threads share the resources of a single physical core.[10][11]

In a retrospective review, Jeremy Laird of APC magazine commented on Bulldozer issues, noted that it was slower than outgoing Phenom II K10 design, and that the PC software ecosystem had not yet "embraced" the multi-threaded model. By his observation, issues caused a big loss for AMD, that the company lost over 1 billion USD in 2012, and that some industry observers were predicting the bankruptcy by mid-2015. Company later managed to return to profit. Mentioned reasons for regaining the profitability were the earlier divesting of in-house manufacturing into GlobalFoundries and then outsourcing the manufacturing to TSMC and making a new Ryzen CPU design.[12]

Architecture

Bulldozer core

 
Block diagram of a complete Bulldozer module, showing 2 integer clusters
 
Block diagram of a 4 module design with 8 integer clusters
 
Memory topology of a Bulldozer server
 
Bulldozer die shot with highlighted parts

Bulldozer made use of "Clustered Multithreading" (CMT), a technique where some parts of the processor are shared between two threads and some parts are unique for each thread. Prior examples of such an approach to unconventional multithreading can be traced way back to the 2005 Sun Microsystems' UltraSPARC T1 CPU. In terms of hardware complexity and functionality, a Bulldozer CMT module is equal to a dual-core processor in its integer calculation capabilities, and to either a single-core processor or a handicapped dual-core in terms of floating-point computational power, depending on whether the code is saturated in floating point instructions in both threads running on the same CMT module, and whether the FPU is performing 128-bit or 256-bit floating point operations. The reason for this is that for each two integer cores, that is, within the same module, there is a single floating-point unit consisting of a pair of 128-bit FMAC execution units.

CMT is in some way a simpler but similar design philosophy to SMT; both designs try to utilize execution units efficiently; in either method, when two threads compete for some execution pipelines, there is a loss in performance in one or more of the threads. Due to dedicated integer cores, the Bulldozer family modules performed roughly like a dual-core, dual-threaded processor during sections of code that were either wholly integer or a mix of integer and floating-point calculations; yet, due to the SMT use of the shared floating-point pipelines, the module would perform similarly to a single-core, dual-threaded SMT processor (SMT2) for a pair of threads saturated with floating-point instructions. (Both of these last two comparisons make the assumption that the processor possesses an equally wide and capable execution core, integer-wise and floating-point-wise, respectively.)

Both CMT and SMT are at peak effectiveness while running integer and floating point code on a pair of threads. CMT stays at peak effectiveness while working on a pair of threads consisting both of integer code, while under SMT, one or both threads will underperform due to competition for integer execution units. The disadvantage for CMT is a greater number of idle integer execution units in a single threaded case. In the single threaded case, CMT is limited to use at most half of the integer execution units in its module, while SMT imposes no such limit. A large SMT core with integer circuitry as wide and fast as two CMT cores could in theory have momentarily up to twice an integer performance in a single thread case. (More realistically for general code as a whole, Pollack's Rule estimates a speedup factor of  , or approximately 40% increase in performance.)

CMT processors and a typical SMT processor are similar in their efficient shared use of the L2 cache between a pair of threads.

  • A module consists of a coupling of two "conventional" x86 out of order processing cores. The processing core shares the early pipeline stages (e.g. L1i, fetch, decode), the FPUs, and the L2 cache with the rest of the module.
    • Each module has the following independent hardware resources:[13][14]
    • 16 KB 4-way of L1d (way-predicted) per core and 2-way 64 KB of L1i per module, one way for each of the two cores[15][16][17]
    • 2 MB of L2 cache per module (shared between the two integer cores)
    • Write Coalescing Cache[18] is a special cache that is part of L2 cache in Bulldozer microarchitecture. Stores from both L1D caches in the module go through the WCC, where they are buffered and coalesced. The WCC's task is reducing number of writes to the L2 cache.
    • Two dedicated integer cores
      • each one includes two ALU and two AGU which are capable of a total of four independent arithmetic and memory operations per clock and per core
      • duplicating integer schedulers and execution pipelines offers dedicated hardware to each of two threads which double performance for multi-threaded integer loads
      • the second integer core in the module increases the Bulldozer module die by around 12%, which at chip level adds about 5% of total die space[19]
    • Two symmetrical 128-bit FMAC (fused multiply–add capability) floating-point pipelines per module that can be unified into one large 256-bit-wide unit if one of the integer cores dispatches AVX instruction and two symmetrical x87/MMX/SSE capable FPPs for backward compatibility with SSE2 non-optimized software. Each FMAC unit is also capable of division and square root operations with variable latency.
  • All modules present share the L3 cache as well as an Advanced Dual-Channel Memory Sub-System (IMC – Integrated Memory Controller).
  • A module has 213 million transistors in an area of 30.9 mm² (including the 2 MB shared L2 cache) on an Orochi die.[20]
  • The pipeline depth of Bulldozer (as well as Piledriver and Steamroller) is 20 cycles, compared to 12 cycles of the K10 core predecessor.[21]

The longer pipeline allowed the Bulldozer family of processors to achieve a much higher clock frequency compared to its K10 predecessors. While this increased frequencies and throughput, the longer pipeline also increased latencies and increased branch misprediction penalties.

  • The width of the Bulldozer integer core, four (2 ALU, 2 AGU), is somewhat less than the width of the K10 core, six (3 ALU, 3 AGU). Bobcat and Jaguar also used a four wide integer core, yet with lighter execution units: 1 ALU, 1 simple ALU, 1 load AGU, 1 store AGU.[22]

The issue widths (and peak instruction executions per cycle) of a Jaguar, K10, and Bulldozer core are 2, 3, and 4 respectively. This made Bulldozer a more superscalar design compared to Jaguar/Bobcat. However, due to K10's somewhat wider core (in addition to the lack of refinements and optimizations in a first generation design) the Bulldozer architecture typically performed with somewhat lower IPC compared to its K10 predecessors. It was not until the refinements made in Piledriver and Steamroller, that the IPC of the Bulldozer family distinctly began to exceed that of K10 processors such as Phenom II.

Branch predictor

  • Two-level Branch Target Buffer(BTB)[23]
  • Hybrid predictor for conditionals
  • Indirect predictor

Instruction set extensions

  • Support for Intel's Advanced Vector Extensions (AVX) instruction set, which supports 256-Bit floating point operations, and SSE4.1, SSE4.2, AES, CLMUL, as well as future 128-bit instruction sets proposed by AMD (XOP, FMA4, and F16C),[24] which have the same functionality as the SSE5 instruction set formerly proposed by AMD, but with compatibility to the AVX coding scheme.
  • Bulldozer GEN4 (Excavator) supports AVX2 instruction sets.

Process technology and clock frequency

  • 11-metal layer 32 nm SOI process with implemented first generation GlobalFoundries's High-K Metal Gate (HKMG)
  • Turbo Core 2 performance boost to increase clock frequency up to 500 MHz with all threads active (for most workloads) and up to 1 GHz with the half of the thread active, within the TDP limit.[25]
  • The chip operates at 0.775 to 1.425 V, achieving clock frequencies of 3.6 GHz or more[20]
  • Min-Max TDP: 25 – 140 watts

Cache and memory interface

  • Up to 8 MB of L3 shared among all cores on the same silicon die (8 MB for 4 cores in Desktop segment and 16 MB for 8 cores in the Server segment), divided into four subcaches of 2 MB each, capable of operating at 2.2 GHz at 1.1125 V[20]
  • Native DDR3 memory support up to DDR3-1866[26]
  • Dual Channel DDR3 integrated memory controller for Desktop and Server/Workstation Opteron 42xx "Valencia";[27] Quad Channel DDR3 Integrated Memory Controller[28] for Server/Workstation Opteron 62xx "Interlagos"
  • AMD claims support for two DIMMs of DDR3-1600 per channel. Two DIMMs of DDR3-1866 on a single channel will be down-clocked to 1600.

I/O and socket interface

  • HyperTransport Technology rev. 3.1 (3.20 GHz, 6.4 GT/s, 25.6 GB/s & 16-bit wide link) [first implemented into HY-D1 revision "Magny-Cours" on the socket G34 Opteron platform in March 2010 and "Lisbon" on the socket C32 Opteron platform in June 2010]
  • Socket AM3+ (AM3r2)
    • 942-pin, DDR3 support only
    • Will retain backward compatibility with Socket AM3 motherboards (as per motherboard manufacturer choice and if BIOS updates are provided[29][30]), however this not officially supported by AMD; AM3+ motherboards will be backward-compatible with AM3 processors.[31]
  • For the server segment, the existing socket G34 (LGA1974) and socket C32 (LGA1207) will be used.

Features

CPU features table

Processors

 
Chipset and I/Os for 1st CMT generation
 
AMD FX 8350 Processor
 
AMD Opteron 6282 processor

The first revenue shipments of Bulldozer-based Opteron processors was announced on September 7, 2011.[32] The FX-4100, FX-6100, FX-8120 and FX-8150 were released in October 2011; with remaining FX series AMD processors released at the end of the first quarter of 2012.

Desktop

Model [Modules/FPUs]

Cores/threads

Freq.

(GHz)

Max. turbo

(GHz)

L2

cache

L3

(MB)

TDP

(W)

DDR3

Memory

Turbo

Core

2.0

Socket
Full

load

Half

load

FX-8100 [4]8 2.8 3.1 3.7

2MB

8 95 1866 Yes AM3+
FX-8120 3.1 3.4 4.0 125
FX-8140 3.2 3.6 4.1 95
FX-8150 3.6 3.9 4.2 125
FX-8170 3.9 4.2 4.5
FX-6100 [3]6 3.3 3.6 3.9

2MB

95
FX-6120 3.6 3.9 4.2
FX-6130 3.6 3.8 3.9
FX-6200 3.8 4.0 4.1 125
FX-4100 [2]4 3.6 3.7 3.8 2x

2MB

95
FX-4120 3.9 4.0 4.1
FX-4130 3.8 3.9 4.0 4 125
FX-4150 3.8 8 95/125
FX-4170 4.2 4.3 125

Major Sources: CPU-World[33] and Xbit-Labs[34]

Server

There are two series of Bulldozer-based processors for servers: Opteron 4200 series (Socket C32, code named Valencia, with up to four modules) and Opteron 6200 series (Socket G34, code named Interlagos, with up to 8 modules).[35][36]

False advertising lawsuit

In November 2015, AMD was sued under the California Consumers Legal Remedies Act and Unfair Competition Law for allegedly misrepresenting the specifications of Bulldozer chips. The class-action lawsuit, filed on 26 October in the US District Court for the Northern District of California, claims that each Bulldozer module is in fact a single CPU core with a few dual-core traits, rather than a true dual-core design.[37] In August 2019, AMD agreed to settle the suit for $12.1M.[38][39]

Performance

Performance on Linux

On 24 October 2011, the first generation tests done by Phoronix confirmed that the performance of Bulldozer CPU was somewhat less than expected.[40] In several tests, the CPU performed similarly to the older generation Phenom 1060T.

The performance later substantially increased, as various compiler optimizations and CPU driver fixes were released.[41][42]

Performance on Windows

The first Bulldozer CPUs were met with a mixed response. It was discovered that the FX-8150 performed poorly in benchmarks that were not highly threaded, falling behind the second-generation Intel Core i* series processors and being matched or even outperformed by AMD's own Phenom II X6 at lower clock speeds. In highly threaded benchmarks, the FX-8150 performed on par with the Phenom II X6, and the Intel Core i7 2600K, depending on the benchmark. Given the overall more consistent performance of the Intel Core i5 2500K at a lower price, these results left many reviewers underwhelmed. The processor was found to be extremely power-hungry under load, especially when overclocked, compared to Intel's Sandy Bridge.[43][44]

On 13 October 2011, AMD stated on its blog that "there are some in our community who feel the product performance did not meet their expectations", but showed benchmarks on actual applications where it outperformed the Sandy Bridge i7 2600k and AMD X6 1100T.[45]

In January 2012, Microsoft released two hotfixes for Windows 7 and Server 2008 R2 that marginally improve the performance of Bulldozer CPUs by addressing the thread scheduling concerns raised after the release of Bulldozer.[46][47][48]

On 6 March 2012, AMD posted a knowledge base article stating that there was a compatibility problem with FX processors, and certain games on the widely used digital game distribution platform, Steam. AMD stated that they had provided a BIOS update to several motherboard manufacturers (namely: Asus, Gigabyte Technology, MSI, and ASRock) that would fix the problem.[49]

In September 2014, AMD CEO Rory Read conceded the Bulldozer design had not been a "game-changing part", and that AMD had to live with the design for four years.[50]

Overclocking

On 31 August 2011, AMD and a group of well-known overclockers including Brian McLachlan, Sami Mäkinen, Aaron Schradin, and Simon Solotko managed to set a new world record for CPU frequency using the unreleased and overclocked FX-8150 Bulldozer processor. Before that day, the record sat at 8.309 GHz, but the Bulldozer combined with liquid helium cooling reached a new high of 8.429 GHz. The record has since been overtaken at 8.58 GHz by Andre Yang using liquid nitrogen.[51][52] On August 22, 2014 and using an FX-8370 (Piledriver), The Stilt from Team Finland achieved a maximum CPU frequency of 8.722 GHz.[53]

The CPU clock frequency records set by overclocked Bulldozer CPUs were only broken almost a decade later by overclocks of Intel's 13th generation Core Raptor Lake CPUs in October 2022.[54]

Revisions

Piledriver is the AMD codename for its improved second-generation microarchitecture based on Bulldozer. AMD Piledriver cores are found in Socket FM2 Trinity and Richland based series of APUs and CPUs and the Socket AM3+ Vishera based FX-series of CPUs. Piledriver was the last generation in the Bulldozer family to be available for socket AM3+ and to be available with an L3 cache. The Piledriver processors available for FM2 (and its mobile variant) sockets did not come with a L3 cache, as the L2 cache is the last-level cache for all FM2/FM2+ processors.

Steamroller is the AMD codename for its third-generation microarchitecture based on an improved version of Piledriver. Steamroller cores are found in the Socket FM2+ Kaveri based series of APUs and CPUs.

Excavator is the codename for the fourth-generation Bulldozer core.[55] Excavator was implemented as 'Carrizo' A-series APUs, "Bristol Ridge" A-series APUs, and Athlon x4 CPUs.[56]

See also

References

  1. ^ "FX Processors". AMD. 24 February 2016. Retrieved 24 February 2016.
  2. ^ "AMD ships 16 core bulldozer powered Opteron 6200". Engadget. 14 November 2011. Retrieved 24 February 2016.
  3. ^ Bulldozer 50% Faster than Core i7 and Phenom II, techPowerUp, retrieved 2012-01-23
  4. ^ AMD64 Architecture Programmer's Manual Volume 6: 128-Bit and 256-Bit XOP, and FMA4 Instructions (PDF), AMD, May 1, 2009, retrieved 2009-05-08
  5. ^ , Dave Christie, AMD Developer blogs, 7 May 2009, archived from the original on 2012-04-02, retrieved 2009-05-08
  6. ^ AMD Sets New Mark in x86 Innovation with First Detailed Disclosures of Two New Core Designs, AMD, August 24, 2011, p. 1, retrieved September 18, 2011
  7. ^ Analyst Day 2009 Summary, AMD, November 11, 2009, retrieved 2009-11-14
  8. ^ AMD bestätigt: "Zambezi" ist inkompatibel zum Sockel AM3, Planet3dnow.de, retrieved 2012-01-23
  9. ^ Analyst Day 2009 Presentations, AMD, November 11, 2009, retrieved 2009-11-14
  10. ^ . Archived from the original on 2013-10-17. Retrieved 2013-07-22.{{cite web}}: CS1 maint: archived copy as title (link)
  11. ^ "AMD unveils Flex FP - bit-tech.net". bit-tech.net.
  12. ^ Laird, Jeremy (August 2022). "Ryzen again: The resurrection of AMD". APC. No. 509. Future Publishing. pp. 56–57. ISSN 0725-4415.
  13. ^ Bulldozer microarchitecture block, AnandTech, August 24, 2010
  14. ^ , AMD, August 24, 2010, archived from the original on October 1, 2012, retrieved August 25, 2010
  15. ^ More On Bulldozer, Tomshardware.com, 2010-08-24, retrieved 2012-01-23
  16. ^ AMD Reveals Details About Bulldozer Microprocessors, , Xbitlabs.com, archived from the original on 2011-09-03, retrieved 2012-01-23
  17. ^ Real World Technologies (2010-08-26), AMD's Bulldozer Microarchitecture, Realworldtech.com, retrieved 2012-01-23
  18. ^ David Kanter (August 26, 2010). "AMD's Bulldozer Microarchitecture Memory Subsystem Continued". Real World Technologies.
  19. ^ Bulldozer design power efficiency, AMD, August 24, 2010
  20. ^ a b c (PDF), archived from the original (PDF) on 2012-01-20, retrieved 2012-01-23
  21. ^ Johan De Gelas, The Bulldozer Aftermath: Delving Even Deeper
  22. ^ Anand Lal Shimpi, AMD's Jaguar Architecture: The CPU Powering Xbox One, PlayStation 4, Kabini & Temash
  23. ^ https://www.olcf.ornl.gov/wp-content/uploads/2012/01/TitanWorkshop2012_Day1_AMD.pdf[bare URL PDF]
  24. ^ XOP and FMA4 Instruction set in SSE5, Techreport.com, 2009-05-06, retrieved 2012-01-23
  25. ^ , Ir.amd.com, 2010-11-09, archived from the original on 2013-11-12, retrieved 2012-01-23
  26. ^ AMD Roadmap, retrieved 2012-01-23
  27. ^ AMD (2012-05-14), AMD Opteron 4200 Series Processor Quick Reference Guide (PDF), www.amd.com, retrieved 2012-08-15
  28. ^ AMD (2012-05-14), AMD Opteron 6200 Series Processor Quick Reference Guide (PDF), www.amd.com, retrieved 2012-08-15
  29. ^ ASUS confirms AM3+ compatibility on AM3 boards, Event.asus.com, retrieved 2012-01-23
  30. ^ MSI confirms AM3+ compatibility on AM3 boards, Event.msi.com, April 2011, retrieved 2012-01-23
  31. ^ AM3 processors will work in the AM3+ socket, but Bulldozer chips will not work in non-AM3+ motherboards December 10, 2010, at the Wayback Machine
  32. ^ AMD Ships First "Bulldozer" Processors
  33. ^ AMD FX-Series processor families, Cpu-world.com, 2012-10-02, retrieved 2012-10-21
  34. ^ Shilov, Anton (2012-09-21). . X-bit laboratories. X-bit labs. Archived from the original on 2012-09-24. Retrieved 2012-09-23.
  35. ^ , 2010-08-02, archived from the original on August 6, 2010
  36. ^ AMD Opteron 6200 series microprocessor family, cpu-world.com
  37. ^ "AMD sued over allegedly misleading Bulldozer core count". Ars Technica. Retrieved 8 November 2015.
  38. ^ "AMD Bulldozer 'Core' Lawsuit: AMD Settles for $12.1m, Payouts for Some". AnandTech. Retrieved 19 January 2021.
  39. ^ . Archived from the original on 19 October 2019. Retrieved 19 January 2021.
  40. ^ AMD FX-8150 Bulldozer On Ubuntu Linux, phoronix.com, 2011-10-24, retrieved 2012-12-13
  41. ^ AMD Bulldozer Cache Aliasing Issue Fix, phoronix.com
  42. ^ AMD's FX-8150 Bulldozer Benefits From New Compilers, Tuning, phoronix.com
  43. ^ , X-bit labs, 2011-10-11, p. 13, archived from the original on 2012-01-13, retrieved 2012-01-23
  44. ^ , X-bit labs, 2011-10-11, p. 14, archived from the original on 2012-01-16, retrieved 2012-01-23
  45. ^ , 'akozak' on behalf of AMD Blogs, 2011-10-13, archived from the original on 15 October 2011, retrieved 23 January 2012
  46. ^ An update is available for computers that have an AMD FX, AMD Opteron 4200, AMD Opteron 6200, or AMD Bulldozer series processor installed and that are running Windows 7 or Windows Server 2008 R2, support.microsoft.com, January 2012, retrieved 2014-02-11
  47. ^ An update that selectively disables the Core Parking feature in Windows 7 or in Windows Server 2008 R2 is available, support.microsoft.com, January 2012, retrieved 2014-02-11
  48. ^ "AMD's FX-8150 After Two Windows 7 Hotfixes And UEFI Updates". tomshardware.com. 24 January 2012.
  49. ^ STEAM Games on AMD FX platforms, support.amd.com, 2012-06-12, retrieved 2012-10-11
  50. ^ "AMD: next-generation microarchitecture will make up for muted Bulldozer reception". pcgamer.com.
  51. ^ , geek.com, 2011-11-01, archived from the original on 2012-04-28, retrieved 2012-10-16
  52. ^ "AMD Bulldozer Speed Record Broken Again at 8.58GHz". tomshardware.com. 5 November 2011.
  53. ^ Samuel D. "CPU-Z Validator 4.0". Retrieved 23 September 2014.
  54. ^ "Intel Core i9 13900K @ 8812.85 MHz - CPU-Z VALIDATOR". valid.x86.fr. Retrieved 2022-10-23.
  55. ^ The Bulldozer Review: AMD FX-8150 Tested, AnandTech, 2011-10-12, retrieved 2012-01-23
  56. ^ Cutress, Ian (2016-02-02). "AMD launches excavator on desktop: the 65w athlon x4 845 for $70". anandtech. Retrieved 2017-03-28.

External links

  • www.amd.com/en-us/products/processors/desktop/fx

bulldozer, microarchitecture, bulldozer, family, microprocessor, microarchitecture, opteron, line, processors, developed, desktop, server, markets, bulldozer, codename, this, family, microarchitectures, released, october, 2011, successor, microarchitecture, bu. The AMD Bulldozer Family 15h is a microprocessor microarchitecture for the FX and Opteron line of processors developed by AMD for the desktop and server markets 1 2 Bulldozer is the codename for this family of microarchitectures It was released on October 12 2011 as the successor to the K10 microarchitecture Bulldozer Family 15hGeneral informationLaunchedLate 2011DiscontinuedPresentCommon manufacturer s AMDArchitecture and classificationTechnology node32 nmInstruction setx86 64Physical specificationsSocket s Socket AM3 Products models variantsCore name s AMD FXOpteronHistoryPredecessorFamily 10h K10 SuccessorPiledriver Family 15h 2nd gen Bulldozer is designed from scratch not a development of earlier processors 3 The core is specifically aimed at computing products with TDPs of 10 to 125 watts AMD claims dramatic performance per watt efficiency improvements in high performance computing HPC applications with Bulldozer cores The Bulldozer cores support most of the instruction sets implemented by Intel processors Sandy Bridge available at its introduction including SSE4 1 SSE4 2 AES CLMUL and AVX as well as new instruction sets proposed by AMD ABM XOP FMA4 and F16C 4 5 Only Bulldozer GEN4 Excavator supports AVX2 instruction sets Contents 1 Overview 2 Architecture 2 1 Bulldozer core 2 2 Branch predictor 2 3 Instruction set extensions 2 4 Process technology and clock frequency 2 5 Cache and memory interface 2 6 I O and socket interface 3 Features 4 Processors 4 1 Desktop 4 2 Server 5 False advertising lawsuit 6 Performance 6 1 Performance on Linux 6 2 Performance on Windows 6 3 Overclocking 7 Revisions 8 See also 9 References 10 External linksOverview EditAccording to AMD Bulldozer based CPUs are based on GlobalFoundries 32 nm Silicon on insulator SOI process technology and reuses the approach of DEC for multitasking computer performance with the arguments that it according to press notes balances dedicated and shared computer resources to provide a highly compact high units count design that is easily replicated on a chip for performance scaling 6 In other words by eliminating some of the redundant elements that naturally creep into multicore designs AMD has hoped to take better advantage of its hardware capabilities while using less power Bulldozer based implementations built on 32nm SOI with HKMG arrived in October 2011 for both servers and desktops The server segment included the dual chip 16 core Opteron processor codenamed Interlagos for Socket G34 and single chip 4 6 or 8 cores Valencia for Socket C32 while the Zambezi 4 6 and 8 cores targeted desktops on Socket AM3 7 8 Bulldozer is the first major redesign of AMD s processor architecture since 2003 when the firm launched its K8 processors and also features two 128 bit FMA capable FPUs which can be combined into one 256 bit FPU This design is accompanied by two integer clusters each with 4 pipelines the fetch decode stage is shared Bulldozer also introduced shared L2 cache in the new architecture AMD calls this design a Module A 16 core processor design would feature eight of these modules 9 but the operating system will recognize each module as two logical cores The modular architecture consists of multithreaded shared L2 cache and FlexFPU which uses simultaneous multithreading Each physical integer core two per module is single threaded in contrast with Intel s Hyperthreading where two virtual simultaneous threads share the resources of a single physical core 10 11 In a retrospective review Jeremy Laird of APC magazine commented on Bulldozer issues noted that it was slower than outgoing Phenom II K10 design and that the PC software ecosystem had not yet embraced the multi threaded model By his observation issues caused a big loss for AMD that the company lost over 1 billion USD in 2012 and that some industry observers were predicting the bankruptcy by mid 2015 Company later managed to return to profit Mentioned reasons for regaining the profitability were the earlier divesting of in house manufacturing into GlobalFoundries and then outsourcing the manufacturing to TSMC and making a new Ryzen CPU design 12 Architecture EditBulldozer core Edit Block diagram of a complete Bulldozer module showing 2 integer clusters Block diagram of a 4 module design with 8 integer clusters Memory topology of a Bulldozer server Bulldozer die shot with highlighted parts Bulldozer made use of Clustered Multithreading CMT a technique where some parts of the processor are shared between two threads and some parts are unique for each thread Prior examples of such an approach to unconventional multithreading can be traced way back to the 2005 Sun Microsystems UltraSPARC T1 CPU In terms of hardware complexity and functionality a Bulldozer CMT module is equal to a dual core processor in its integer calculation capabilities and to either a single core processor or a handicapped dual core in terms of floating point computational power depending on whether the code is saturated in floating point instructions in both threads running on the same CMT module and whether the FPU is performing 128 bit or 256 bit floating point operations The reason for this is that for each two integer cores that is within the same module there is a single floating point unit consisting of a pair of 128 bit FMAC execution units CMT is in some way a simpler but similar design philosophy to SMT both designs try to utilize execution units efficiently in either method when two threads compete for some execution pipelines there is a loss in performance in one or more of the threads Due to dedicated integer cores the Bulldozer family modules performed roughly like a dual core dual threaded processor during sections of code that were either wholly integer or a mix of integer and floating point calculations yet due to the SMT use of the shared floating point pipelines the module would perform similarly to a single core dual threaded SMT processor SMT2 for a pair of threads saturated with floating point instructions Both of these last two comparisons make the assumption that the processor possesses an equally wide and capable execution core integer wise and floating point wise respectively Both CMT and SMT are at peak effectiveness while running integer and floating point code on a pair of threads CMT stays at peak effectiveness while working on a pair of threads consisting both of integer code while under SMT one or both threads will underperform due to competition for integer execution units The disadvantage for CMT is a greater number of idle integer execution units in a single threaded case In the single threaded case CMT is limited to use at most half of the integer execution units in its module while SMT imposes no such limit A large SMT core with integer circuitry as wide and fast as two CMT cores could in theory have momentarily up to twice an integer performance in a single thread case More realistically for general code as a whole Pollack s Rule estimates a speedup factor of 2 displaystyle sqrt 2 or approximately 40 increase in performance CMT processors and a typical SMT processor are similar in their efficient shared use of the L2 cache between a pair of threads A module consists of a coupling of two conventional x86 out of order processing cores The processing core shares the early pipeline stages e g L1i fetch decode the FPUs and the L2 cache with the rest of the module Each module has the following independent hardware resources 13 14 16 KB 4 way of L1d way predicted per core and 2 way 64 KB of L1i per module one way for each of the two cores 15 16 17 2 MB of L2 cache per module shared between the two integer cores Write Coalescing Cache 18 is a special cache that is part of L2 cache in Bulldozer microarchitecture Stores from both L1D caches in the module go through the WCC where they are buffered and coalesced The WCC s task is reducing number of writes to the L2 cache Two dedicated integer cores each one includes two ALU and two AGU which are capable of a total of four independent arithmetic and memory operations per clock and per core duplicating integer schedulers and execution pipelines offers dedicated hardware to each of two threads which double performance for multi threaded integer loads the second integer core in the module increases the Bulldozer module die by around 12 which at chip level adds about 5 of total die space 19 Two symmetrical 128 bit FMAC fused multiply add capability floating point pipelines per module that can be unified into one large 256 bit wide unit if one of the integer cores dispatches AVX instruction and two symmetrical x87 MMX SSE capable FPPs for backward compatibility with SSE2 non optimized software Each FMAC unit is also capable of division and square root operations with variable latency All modules present share the L3 cache as well as an Advanced Dual Channel Memory Sub System IMC Integrated Memory Controller A module has 213 million transistors in an area of 30 9 mm including the 2 MB shared L2 cache on an Orochi die 20 The pipeline depth of Bulldozer as well as Piledriver and Steamroller is 20 cycles compared to 12 cycles of the K10 core predecessor 21 The longer pipeline allowed the Bulldozer family of processors to achieve a much higher clock frequency compared to its K10 predecessors While this increased frequencies and throughput the longer pipeline also increased latencies and increased branch misprediction penalties The width of the Bulldozer integer core four 2 ALU 2 AGU is somewhat less than the width of the K10 core six 3 ALU 3 AGU Bobcat and Jaguar also used a four wide integer core yet with lighter execution units 1 ALU 1 simple ALU 1 load AGU 1 store AGU 22 The issue widths and peak instruction executions per cycle of a Jaguar K10 and Bulldozer core are 2 3 and 4 respectively This made Bulldozer a more superscalar design compared to Jaguar Bobcat However due to K10 s somewhat wider core in addition to the lack of refinements and optimizations in a first generation design the Bulldozer architecture typically performed with somewhat lower IPC compared to its K10 predecessors It was not until the refinements made in Piledriver and Steamroller that the IPC of the Bulldozer family distinctly began to exceed that of K10 processors such as Phenom II Branch predictor Edit Two level Branch Target Buffer BTB 23 Hybrid predictor for conditionals Indirect predictorInstruction set extensions Edit Support for Intel s Advanced Vector Extensions AVX instruction set which supports 256 Bit floating point operations and SSE4 1 SSE4 2 AES CLMUL as well as future 128 bit instruction sets proposed by AMD XOP FMA4 and F16C 24 which have the same functionality as the SSE5 instruction set formerly proposed by AMD but with compatibility to the AVX coding scheme Bulldozer GEN4 Excavator supports AVX2 instruction sets Process technology and clock frequency Edit 11 metal layer 32 nm SOI process with implemented first generation GlobalFoundries s High K Metal Gate HKMG Turbo Core 2 performance boost to increase clock frequency up to 500 MHz with all threads active for most workloads and up to 1 GHz with the half of the thread active within the TDP limit 25 The chip operates at 0 775 to 1 425 V achieving clock frequencies of 3 6 GHz or more 20 Min Max TDP 25 140 wattsCache and memory interface Edit Up to 8 MB of L3 shared among all cores on the same silicon die 8 MB for 4 cores in Desktop segment and 16 MB for 8 cores in the Server segment divided into four subcaches of 2 MB each capable of operating at 2 2 GHz at 1 1125 V 20 Native DDR3 memory support up to DDR3 1866 26 Dual Channel DDR3 integrated memory controller for Desktop and Server Workstation Opteron 42xx Valencia 27 Quad Channel DDR3 Integrated Memory Controller 28 for Server Workstation Opteron 62xx Interlagos AMD claims support for two DIMMs of DDR3 1600 per channel Two DIMMs of DDR3 1866 on a single channel will be down clocked to 1600 I O and socket interface Edit HyperTransport Technology rev 3 1 3 20 GHz 6 4 GT s 25 6 GB s amp 16 bit wide link first implemented into HY D1 revision Magny Cours on the socket G34 Opteron platform in March 2010 and Lisbon on the socket C32 Opteron platform in June 2010 Socket AM3 AM3r2 942 pin DDR3 support only Will retain backward compatibility with Socket AM3 motherboards as per motherboard manufacturer choice and if BIOS updates are provided 29 30 however this not officially supported by AMD AM3 motherboards will be backward compatible with AM3 processors 31 For the server segment the existing socket G34 LGA1974 and socket C32 LGA1207 will be used Features EditCPU features tableProcessors EditMain articles List of AMD FX microprocessors List of AMD Accelerated Processing Unit microprocessors and Opteron Chipset and I Os for 1st CMT generation AMD FX 8350 Processor AMD Opteron 6282 processor The first revenue shipments of Bulldozer based Opteron processors was announced on September 7 2011 32 The FX 4100 FX 6100 FX 8120 and FX 8150 were released in October 2011 with remaining FX series AMD processors released at the end of the first quarter of 2012 Desktop Edit Model Modules FPUs Cores threads Freq GHz Max turbo GHz L2 cache L3 MB TDP W DDR3 Memory Turbo Core2 0 SocketFull load Half loadFX 8100 4 8 2 8 3 1 3 7 4 2MB 8 95 1866 Yes AM3 FX 8120 3 1 3 4 4 0 125FX 8140 3 2 3 6 4 1 95FX 8150 3 6 3 9 4 2 125FX 8170 3 9 4 2 4 5FX 6100 3 6 3 3 3 6 3 9 3 2MB 95FX 6120 3 6 3 9 4 2FX 6130 3 6 3 8 3 9FX 6200 3 8 4 0 4 1 125FX 4100 2 4 3 6 3 7 3 8 2x 2MB 95FX 4120 3 9 4 0 4 1FX 4130 3 8 3 9 4 0 4 125FX 4150 3 8 8 95 125FX 4170 4 2 4 3 125Major Sources CPU World 33 and Xbit Labs 34 Server Edit There are two series of Bulldozer based processors for servers Opteron 4200 series Socket C32 code named Valencia with up to four modules and Opteron 6200 series Socket G34 code named Interlagos with up to 8 modules 35 36 False advertising lawsuit EditIn November 2015 AMD was sued under the California Consumers Legal Remedies Act and Unfair Competition Law for allegedly misrepresenting the specifications of Bulldozer chips The class action lawsuit filed on 26 October in the US District Court for the Northern District of California claims that each Bulldozer module is in fact a single CPU core with a few dual core traits rather than a true dual core design 37 In August 2019 AMD agreed to settle the suit for 12 1M 38 39 Performance EditPerformance on Linux Edit On 24 October 2011 the first generation tests done by Phoronix confirmed that the performance of Bulldozer CPU was somewhat less than expected 40 In several tests the CPU performed similarly to the older generation Phenom 1060T The performance later substantially increased as various compiler optimizations and CPU driver fixes were released 41 42 Performance on Windows Edit The first Bulldozer CPUs were met with a mixed response It was discovered that the FX 8150 performed poorly in benchmarks that were not highly threaded falling behind the second generation Intel Core i series processors and being matched or even outperformed by AMD s own Phenom II X6 at lower clock speeds In highly threaded benchmarks the FX 8150 performed on par with the Phenom II X6 and the Intel Core i7 2600K depending on the benchmark Given the overall more consistent performance of the Intel Core i5 2500K at a lower price these results left many reviewers underwhelmed The processor was found to be extremely power hungry under load especially when overclocked compared to Intel s Sandy Bridge 43 44 On 13 October 2011 AMD stated on its blog that there are some in our community who feel the product performance did not meet their expectations but showed benchmarks on actual applications where it outperformed the Sandy Bridge i7 2600k and AMD X6 1100T 45 In January 2012 Microsoft released two hotfixes for Windows 7 and Server 2008 R2 that marginally improve the performance of Bulldozer CPUs by addressing the thread scheduling concerns raised after the release of Bulldozer 46 47 48 On 6 March 2012 AMD posted a knowledge base article stating that there was a compatibility problem with FX processors and certain games on the widely used digital game distribution platform Steam AMD stated that they had provided a BIOS update to several motherboard manufacturers namely Asus Gigabyte Technology MSI and ASRock that would fix the problem 49 In September 2014 AMD CEO Rory Read conceded the Bulldozer design had not been a game changing part and that AMD had to live with the design for four years 50 Overclocking Edit On 31 August 2011 AMD and a group of well known overclockers including Brian McLachlan Sami Makinen Aaron Schradin and Simon Solotko managed to set a new world record for CPU frequency using the unreleased and overclocked FX 8150 Bulldozer processor Before that day the record sat at 8 309 GHz but the Bulldozer combined with liquid helium cooling reached a new high of 8 429 GHz The record has since been overtaken at 8 58 GHz by Andre Yang using liquid nitrogen 51 52 On August 22 2014 and using an FX 8370 Piledriver The Stilt from Team Finland achieved a maximum CPU frequency of 8 722 GHz 53 The CPU clock frequency records set by overclocked Bulldozer CPUs were only broken almost a decade later by overclocks of Intel s 13th generation Core Raptor Lake CPUs in October 2022 54 Revisions EditPiledriver is the AMD codename for its improved second generation microarchitecture based on Bulldozer AMD Piledriver cores are found in Socket FM2 Trinity and Richland based series of APUs and CPUs and the Socket AM3 Vishera based FX series of CPUs Piledriver was the last generation in the Bulldozer family to be available for socket AM3 and to be available with an L3 cache The Piledriver processors available for FM2 and its mobile variant sockets did not come with a L3 cache as the L2 cache is the last level cache for all FM2 FM2 processors Steamroller is the AMD codename for its third generation microarchitecture based on an improved version of Piledriver Steamroller cores are found in the Socket FM2 Kaveri based series of APUs and CPUs Excavator is the codename for the fourth generation Bulldozer core 55 Excavator was implemented as Carrizo A series APUs Bristol Ridge A series APUs and Athlon x4 CPUs 56 See also EditList of AMD CPU microarchitectures List of AMD FX microprocessors Charles R Moore computer engineer Alpha 21264 K10 microarchitecture Bobcat microarchitecture Opteron Piledriver microarchitecture Steamroller microarchitecture Excavator microarchitecture Zen microarchitecture References Edit FX Processors AMD 24 February 2016 Retrieved 24 February 2016 AMD ships 16 core bulldozer powered Opteron 6200 Engadget 14 November 2011 Retrieved 24 February 2016 Bulldozer 50 Faster than Core i7 and Phenom II techPowerUp retrieved 2012 01 23 AMD64 Architecture Programmer s Manual Volume 6 128 Bit and 256 Bit XOP and FMA4 Instructions PDF AMD May 1 2009 retrieved 2009 05 08 Striking a balance Dave Christie AMD Developer blogs 7 May 2009 archived from the original on 2012 04 02 retrieved 2009 05 08 AMD Sets New Mark in x86 Innovation with First Detailed Disclosures of Two New Core Designs AMD August 24 2011 p 1 retrieved September 18 2011 Analyst Day 2009 Summary AMD November 11 2009 retrieved 2009 11 14 AMD bestatigt Zambezi ist inkompatibel zum Sockel AM3 Planet3dnow de retrieved 2012 01 23 Analyst Day 2009 Presentations AMD November 11 2009 retrieved 2009 11 14 Archived copy Archived from the original on 2013 10 17 Retrieved 2013 07 22 a href Template Cite web html title Template Cite web cite web a CS1 maint archived copy as title link AMD unveils Flex FP bit tech net bit tech net Laird Jeremy August 2022 Ryzen again The resurrection of AMD APC No 509 Future Publishing pp 56 57 ISSN 0725 4415 Bulldozer microarchitecture block AnandTech August 24 2010 Bulldozer module functional schematic AMD August 24 2010 archived from the original on October 1 2012 retrieved August 25 2010 More On Bulldozer Tomshardware com 2010 08 24 retrieved 2012 01 23 AMD Reveals Details About Bulldozer Microprocessors AMD Reveals Details About Bulldozer Microprocessors Xbitlabs com archived from the original on 2011 09 03 retrieved 2012 01 23 Real World Technologies 2010 08 26 AMD s Bulldozer Microarchitecture Realworldtech com retrieved 2012 01 23 David Kanter August 26 2010 AMD s Bulldozer Microarchitecture Memory Subsystem Continued Real World Technologies Bulldozer design power efficiency AMD August 24 2010 a b c AP PDF archived from the original PDF on 2012 01 20 retrieved 2012 01 23 Johan De Gelas The Bulldozer Aftermath Delving Even Deeper Anand Lal Shimpi AMD s Jaguar Architecture The CPU Powering Xbox One PlayStation 4 Kabini amp Temash https www olcf ornl gov wp content uploads 2012 01 TitanWorkshop2012 Day1 AMD pdf bare URL PDF XOP and FMA4 Instruction set in SSE5 Techreport com 2009 05 06 retrieved 2012 01 23 AMD Financial Analyst Day 2010 Server Platforms Presentation Ir amd com 2010 11 09 archived from the original on 2013 11 12 retrieved 2012 01 23 AMD Roadmap retrieved 2012 01 23 AMD 2012 05 14 AMD Opteron 4200 Series Processor Quick Reference Guide PDF www amd com retrieved 2012 08 15 AMD 2012 05 14 AMD Opteron 6200 Series Processor Quick Reference Guide PDF www amd com retrieved 2012 08 15 ASUS confirms AM3 compatibility on AM3 boards Event asus com retrieved 2012 01 23 MSI confirms AM3 compatibility on AM3 boards Event msi com April 2011 retrieved 2012 01 23 AM3 processors will work in the AM3 socket but Bulldozer chips will not work in non AM3 motherboards Archived December 10 2010 at the Wayback Machine AMD Ships First Bulldozer Processors AMD FX Series processor families Cpu world com 2012 10 02 retrieved 2012 10 21 Shilov Anton 2012 09 21 AMD Sets the FX Vishera Launch Date X bit laboratories X bit labs Archived from the original on 2012 09 24 Retrieved 2012 09 23 What Is Bulldozer 2010 08 02 archived from the original on August 6 2010 AMD Opteron 6200 series microprocessor family cpu world com AMD sued over allegedly misleading Bulldozer core count Ars Technica Retrieved 8 November 2015 AMD Bulldozer Core Lawsuit AMD Settles for 12 1m Payouts for Some AnandTech Retrieved 19 January 2021 Tony Dickey and Paul Parmer et al v Advanced Micro Devices Archived from the original on 19 October 2019 Retrieved 19 January 2021 AMD FX 8150 Bulldozer On Ubuntu Linux phoronix com 2011 10 24 retrieved 2012 12 13 AMD Bulldozer Cache Aliasing Issue Fix phoronix com AMD s FX 8150 Bulldozer Benefits From New Compilers Tuning phoronix com Bulldozer Has Arrived AMD FX 8150 Processor Review X bit labs 2011 10 11 p 13 archived from the original on 2012 01 13 retrieved 2012 01 23 Bulldozer Has Arrived AMD FX 8150 Processor Review X bit labs 2011 10 11 p 14 archived from the original on 2012 01 16 retrieved 2012 01 23 Our Take on AMD FX akozak on behalf of AMD Blogs 2011 10 13 archived from the original on 15 October 2011 retrieved 23 January 2012 An update is available for computers that have an AMD FX AMD Opteron 4200 AMD Opteron 6200 or AMD Bulldozer series processor installed and that are running Windows 7 or Windows Server 2008 R2 support microsoft com January 2012 retrieved 2014 02 11 An update that selectively disables the Core Parking feature in Windows 7 or in Windows Server 2008 R2 is available support microsoft com January 2012 retrieved 2014 02 11 AMD s FX 8150 After Two Windows 7 Hotfixes And UEFI Updates tomshardware com 24 January 2012 STEAM Games on AMD FX platforms support amd com 2012 06 12 retrieved 2012 10 11 AMD next generation microarchitecture will make up for muted Bulldozer reception pcgamer com AMD Bulldozer CPU beats world record again achieving 8 461GHz geek com 2011 11 01 archived from the original on 2012 04 28 retrieved 2012 10 16 AMD Bulldozer Speed Record Broken Again at 8 58GHz tomshardware com 5 November 2011 Samuel D CPU Z Validator 4 0 Retrieved 23 September 2014 Intel Core i9 13900K 8812 85 MHz CPU Z VALIDATOR valid x86 fr Retrieved 2022 10 23 The Bulldozer Review AMD FX 8150 Tested AnandTech 2011 10 12 retrieved 2012 01 23 Cutress Ian 2016 02 02 AMD launches excavator on desktop the 65w athlon x4 845 for 70 anandtech Retrieved 2017 03 28 External links Editwww amd com en us products processors desktop fx Retrieved from https en wikipedia org w index php title Bulldozer microarchitecture amp oldid 1130942706, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.