fbpx
Wikipedia

IA-64

IA-64 (Intel Itanium architecture) is the instruction set architecture (ISA) of the Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was subsequently implemented by Intel in collaboration with HP. The first Itanium processor, codenamed Merced, was released in 2001.

Intel Itanium architecture
DesignerHP and Intel
Bits64-bit
Introduced2001
DesignEPIC
TypeRegister–Register
EncodingFixed
BranchingCondition register
EndiannessSelectable
Registers
General purpose128 (64 bits plus 1 trap bit; 32 are static, 96 use register windows); 64 1-bit predicate registers
Floating point128
The Intel Itanium architecture

The Itanium architecture is based on explicit instruction-level parallelism, in which the compiler decides which instructions to execute in parallel. This contrasts with superscalar architectures, which depend on the processor to manage instruction dependencies at runtime. In all Itanium models, up to and including Tukwila, cores execute up to six instructions per clock cycle.

In 2008, Itanium was the fourth-most deployed microprocessor architecture for enterprise-class systems, behind x86-64, Power ISA, and SPARC.[1]

History

Development: 1989–2000

In 1989, HP began to become concerned that reduced instruction set computing (RISC) architectures were approaching a processing limit at one instruction per cycle. Both Intel and HP researchers had been exploring computer architecture options for future designs and separately began investigating a new concept known as very long instruction word (VLIW)[2] which came out of research by Yale University in the early 1980s.[3]

VLIW is a computer architecture concept (like RISC and CISC) where a single instruction word contains multiple instructions encoded in one very long instruction word to facilitate the processor executing multiple instructions in each clock cycle. Typical VLIW implementations rely heavily on sophisticated compilers to determine at compile time which instructions can be executed at the same time and the proper scheduling of these instructions for execution and also to help predict the direction of branch operations. The value of this approach is to do more useful work in fewer clock cycles and to simplify processor instruction scheduling and branch prediction hardware requirements, with a penalty in increased processor complexity, cost, and energy consumption in exchange for faster execution.

Production

During this time, HP had begun to believe that it was no longer cost-effective for individual enterprise systems companies such as itself to develop proprietary microprocessors. Intel had also been researching several architectural options for going beyond the x86 ISA to address high-end enterprise server and high-performance computing (HPC) requirements.

Intel and HP partnered in 1994 to develop the IA-64 ISA, using a variation of VLIW design concepts which Intel named explicitly parallel instruction computing (EPIC). Intel's goal was to leverage the expertise HP had developed in their early VLIW work along with their own to develop a volume product line targeted at the aforementioned high-end systems that could be sold to all original equipment manufacturers (OEMs), while HP wished to be able to purchase off-the-shelf processors built using Intel's volume manufacturing and contemporary process technology that were better than their PA-RISC processors.

Intel took the lead on the design and commercialization process, while HP contributes to the ISA definition, the Merced/Itanium microarchitecture, and Itanium 2. The original goal year for delivering the first Itanium family product, Merced, was 1998.[2]

Marketing

Intel's product marketing and industry engagement efforts were substantial and achieved design wins with the majority of enterprise server OEMs, including those based on RISC processors at the time. Compaq and Silicon Graphics decided to abandon further development of the Alpha and MIPS architectures respectively in favor of migrating to IA-64.[4]

By 1997, it was apparent that the IA-64 architecture and the compiler were much more difficult to implement than originally thought, and the delivery of Itanium began slipping.[5] Since Itanium was the first ever EPIC processor, the development effort encountered more unanticipated problems than the team was accustomed to. In addition, the EPIC concept depends on compiler capabilities that had never been implemented before, so more research was needed.[6]

Several groups developed operating systems for the architecture, including Microsoft Windows, Unix and Unix-like systems such as Linux, HP-UX, FreeBSD, Solaris,[7][8][9] Tru64 UNIX,[4] and Monterey/64[10] (the last three were canceled before reaching the market). In 1999, Intel led the formation of an open-source industry consortium to port Linux to IA-64 they named "Trillium" (and later renamed "Trillian" due to a trademark issue), which was led by Intel and included Caldera Systems, CERN, Cygnus Solutions, Hewlett-Packard, IBM, Red Hat, SGI, SuSE, TurboLinux and VA Linux Systems. As a result, a working IA-64 Linux was delivered ahead of schedule and was the first OS to run on the new Itanium processors.

Intel announced the official name of the processor, Itanium, on October 4, 1999.[11] Within hours, the name Itanic had been coined on a Usenet newsgroup as a pun on the name Titanic, the "unsinkable" ocean liner that sank on its maiden voyage in 1912.[12]

The very next day on 5th October 1999, AMD announced their plans to extend Intel's x86 instruction set to include a fully downward compatible 64-bit mode – additionally revealing AMD's newly coming x86 64-bit architecture, which the company already worked on, to be incorporated into AMD's upcoming eighth-generation microprocessor, code-named SledgeHammer.[13] AMD also signaled a full disclosure of the architecture's specifications and further details to be available in August 2000.[14]

As AMD was never invited to be a contributing party for the IA-64 architecture and any kind of licensing seemed unlikely, AMD's AMD64 architecture-extension was positioned from the beginning as an evolutionary way to add 64-bit computing capabilities to the existing x86 architecture, while still supporting legacy 32-bit x86 code – as opposed to Intel's approach of creating an entirely new, completely x86-incompatible 64-bit architecture with IA-64.

Itanium (Merced): 2001

Itanium (Merced)
 
Itanium processor
General information
LaunchedJune 2001
DiscontinuedJune 2002
Common manufacturer(s)
  • Intel
Performance
Max. CPU clock rate733 MHz to 800 MHz
FSB speeds266 MT/s
Cache
L2 cache96 KB
L3 cache2 or 4 MB
Architecture and classification
Instruction setItanium
Physical specifications
Cores
  • 1
Socket(s)
Products, models, variants
Core name(s)
  • Merced

By the time Itanium was released in June 2001, its performance was not superior to competing RISC and CISC processors.[15]

Recognizing that the lack of software could be a serious problem for the future, Intel made thousands of these early systems available to independent software vendors (ISVs) to stimulate development. HP and Intel brought the next-generation Itanium 2 processor to market a year later.

Itanium 2: 2002–2010

Itanium 2 (McKinley)
 
Itanium 2 processor
General information
Launched2002
Discontinuedpresent
Designed byIntel
Common manufacturer(s)
  • Intel
Performance
Max. CPU clock rate733 MHz to 2.66 GHz
Cache
L2 cache256 KB on Itanium2
256 KB (D) + 1 MB(I) or 512 KB (I) on (Itanium2 9x00 series)
L3 cache1.5–32 MB
Architecture and classification
Instruction setItanium
Physical specifications
Cores
  • 1, 2, 4 or 8
Socket(s)
Products, models, variants
Core name(s)
  • McKinley
  • Madison
  • Hondo
  • Deerfield
  • Montecito
  • Montvale
  • Tukwila
  • Poulson
 
Itanium 2 in 2003

The Itanium 2 processor was released in 2002. It relieved many of the performance problems of the original Itanium processor, which were mostly caused by an inefficient memory subsystem.

In 2003, AMD released the Opteron, which implemented its own 64-bit architecture (x86-64). Opteron gained rapid acceptance in the enterprise server space because it provided an easy upgrade from x86. Intel responded by implementing x86-64 (as Em64t) in its Xeon microprocessors in 2004.[4]

In November 2005, the major Itanium server manufacturers joined with Intel and a number of software vendors to form the Itanium Solutions Alliance to promote the architecture and accelerate software porting.[16]

In 2006, Intel delivered Montecito (marketed as the Itanium 2 9000 series), a dual-core processor that roughly doubled performance and decreased energy consumption by about 20 percent.[17]

Itanium 9300 (Tukwila): 2010

The Itanium 9300 series processor, codenamed Tukwila, was released on 8 February 2010 with greater performance and memory capacity.[18] Tukwila had originally been slated for release in 2007.[19]

The device uses a 65 nm process, includes two to four cores, up to 24 MB on-die caches, Hyper-Threading technology and integrated memory controllers. It implements double-device data correction (DDDC), which helps to fix memory errors. Tukwila also implements Intel QuickPath Interconnect (QPI) to replace the Itanium bus-based architecture. It has a peak interprocessor bandwidth of 96 GB/s and a peak memory bandwidth of 34 GB/s. With QuickPath, the processor has integrated memory controllers and interfaces the memory directly, using QPI interfaces to directly connect to other processors and I/O hubs. QuickPath is also used on Intel processors using the Nehalem microarchitecture, making it probable that Tukwila and Nehalem will be able to use the same chipsets.[20] Tukwila incorporates four memory controllers, each of which supports multiple DDR3 DIMMs via a separate memory controller,[21] much like the Nehalem-based Xeon processor code-named Beckton.[22]

Itanium 9500 (Poulson): 2012

The Itanium 9500 series processor, codenamed Poulson, is the follow-on processor to Tukwila features eight cores, has a 12-wide issue architecture, multithreading enhancements, and new instructions to take advantage of parallelism, especially in virtualization.[20][23][24] The Poulson L3 cache size is 32 MB. L2 cache size is 6 MB, 512 I KB, 256 D KB per core.[25] Die size is 544 mm², less than its predecessor Tukwila (698.75 mm²).[26][27]

At ISSCC 2011, Intel presented a paper called, "A 32nm 3.1 Billion Transistor 12-Wide-Issue Itanium Processor for Mission Critical Servers."[25][28] Given Intel's history of disclosing details about Itanium microprocessors at ISSCC, this paper most likely refers to Poulson. Analyst David Kanter speculates that Poulson will use a new microarchitecture, with a more advanced form of multi-threading that uses as many as two threads, to improve performance for single threaded and multi-threaded workloads.[29] Some new information was released at Hotchips conference.[30][31] New information presents improvements in multithreading, resiliency improvements (Instruction Replay RAS) and few new instructions (thread priority, integer instruction, cache prefetching, data access hints).

Itanium 9700 (Kittson): 2017

The Kittson is the same as the 9500 Poulson, but slightly higher clocked.[32]

End of life: 2021

In January 2019, Intel announced that Kittson would be discontinued, with a last order date of January 2020, and a last ship date of July 2021.[32][33]

There is no planned successor.

Architecture

Intel has extensively documented the Itanium instruction set[34] and the technical press has provided overviews.[35][5] The architecture has been renamed several times during its history. HP originally called it PA-WideWord. Intel later called it IA-64, then Itanium Processor Architecture (IPA),[36] before settling on Intel Itanium Architecture, but it is still widely referred to as IA-64.

It is a 64-bit register-rich explicitly parallel architecture. The base data word is 64 bits, byte-addressable. The logical address space is 264 bytes. The architecture implements predication, speculation, and branch prediction. It uses variable-sized register windowing for parameter passing. The same mechanism is also used to permit parallel execution of loops. Speculation, prediction, predication, and renaming are under control of the compiler: each instruction word includes extra bits for this. This approach is the distinguishing characteristic of the architecture.

The architecture implements a large number of registers:[37][38][39]

  • 128 general integer registers, which are 64-bit plus one trap bit ("NaT", which stands for "not a thing") used for speculative execution. 32 of these are static, the other 96 are stacked using variably-sized register windows, or rotating for pipelined loops. gr0 always reads 0.
  • 128 floating-point registers. The floating-point registers are 82 bits long to preserve precision for intermediate results. Instead of a dedicated "NaT" trap bit like the integer registers, floating-point registers have a trap value called "NaTVal" ("Not a Thing Value"), similar to (but distinct from) NaN. These also have 32 static registers and 96 windowed or rotating registers. fr0 always reads +0.0, and fr1 always reads +1.0.
  • 64 one-bit predicate registers. These also have 32 static registers and 96 windowed or rotating registers. pr0 always reads 1 (true).
  • 8 branch registers, for the addresses of indirect jumps. br0 is set to the return address when a function is called with br.call.
  • 128 special purpose (or "application") registers, which are mostly of interest to the kernel and not ordinary applications. For example, one register called bsp points to the second stack, which is where the hardware will automatically spill registers when the register window wraps around.

Each 128-bit instruction word is called a bundle, and contains three slots each holding a 41-bit instruction, plus a 5-bit template indicating which type of instruction is in each slot. Those types are M-unit (memory instructions), I-unit (integer ALU, non-ALU integer, or long immediate extended instructions), F-unit (floating-point instructions), or B-unit (branch or long branch extended instructions). The template also encodes stops which indicate that a data dependency exists between data before and after the stop. All instructions between a pair of stops constitute an instruction group, regardless of their bundling, and must be free of many types of data dependencies; this knowledge allows the processor to execute instructions in parallel without having to perform its own complicated data analysis, since that analysis was already done when the instructions were written.

Within each slot, all but a few instructions are predicated, specifying a predicate register, the value of which (true or false) will determine whether the instruction is executed. Predicated instructions which should always execute are predicated on pr0, which always reads as true.

The IA-64 assembly language and instruction format was deliberately designed to be written mainly by compilers, not by humans. Instructions must be grouped into bundles of three, ensuring that the three instructions match an allowed template. Instructions must issue stops between certain types of data dependencies, and stops can also only be used in limited places according to the allowed templates.

Instruction execution

The fetch mechanism can read up to two bundles per clock from the L1 cache into the pipeline. When the compiler can take maximum advantage of this, the processor can execute six instructions per clock cycle. The processor has thirty functional execution units in eleven groups. Each unit can execute a particular subset of the instruction set, and each unit executes at a rate of one instruction per cycle unless execution stalls waiting for data. While not all units in a group execute identical subsets of the instruction set, common instructions can be executed in multiple units.

The execution unit groups include:

  • Six general-purpose ALUs, two integer units, one shift unit
  • Four data cache units
  • Six multimedia units, two parallel shift units, one parallel multiply, one population count
  • Two 82-bit floating-point multiply–accumulate units, two SIMD floating-point multiply–accumulate units (two 32-bit operations each)[40]
  • Three branch units

Ideally, the compiler can often group instructions into sets of six that can execute at the same time. Since the floating-point units implement a multiply–accumulate operation, a single floating-point instruction can perform the work of two instructions when the application requires a multiply followed by an add: this is very common in scientific processing. When it occurs, the processor can execute four FLOPs per cycle. For example, the 800 MHz Itanium had a theoretical rating of 3.2 GFLOPS and the fastest Itanium 2, at 1.67 GHz, was rated at 6.67 GFLOPS.

In practice, the processor may often be underutilized, with not all slots filled with useful instructions due to e.g. data dependencies or limitations in the available bundle templates. The densest possible code requires 42.6 bits per instruction, compared to 32 bits per instruction on traditional RISC processors of the time, and no-ops due to wasted slots further decrease the density of code. Additional instructions for speculative loads and hints for branches and cache are difficult to generate optimally, even with modern compilers.

Memory architecture

From 2002 to 2006, Itanium 2 processors shared a common cache hierarchy. They had 16 KB of Level 1 instruction cache and 16 KB of Level 1 data cache. The L2 cache was unified (both instruction and data) and is 256 KB. The Level 3 cache was also unified and varied in size from 1.5 MB to 24 MB. The 256 KB L2 cache contains sufficient logic to handle semaphore operations without disturbing the main arithmetic logic unit (ALU).

Main memory is accessed through a bus to an off-chip chipset. The Itanium 2 bus was initially called the McKinley bus, but is now usually referred to as the Itanium bus. The speed of the bus has increased steadily with new processor releases. The bus transfers 2×128 bits per clock cycle, so the 200 MHz McKinley bus transferred 6.4 GB/s, and the 533 MHz Montecito bus transfers 17.056 GB/s[41]

Architectural changes

Itanium processors released prior to 2006 had hardware support for the IA-32 architecture to permit support for legacy server applications, but performance for IA-32 code was much worse than for native code and also worse than the performance of contemporaneous x86 processors. In 2005, Intel developed the IA-32 Execution Layer (IA-32 EL), a software emulator that provides better performance. With Montecito, Intel therefore eliminated hardware support for IA-32 code.

In 2006, with the release of Montecito, Intel made a number of enhancements to the basic processor architecture including:[42]

  • Hardware multithreading: Each processor core maintains context for two threads of execution. When one thread stalls during memory access, the other thread can execute. Intel calls this "coarse multithreading" to distinguish it from the "hyper-threading technology" Intel integrated into some x86 and x86-64 microprocessors.
  • Hardware support for virtualization: Intel added Intel Virtualization Technology (Intel VT-i), which provides hardware assists for core virtualization functions. Virtualization allows a software "hypervisor" to run multiple operating system instances on the processor concurrently.
  • Cache enhancements: Montecito added a split L2 cache, which included a dedicated 1 MB L2 cache for instructions. The original 256 KB L2 cache was converted to a dedicated data cache. Montecito also included up to 12 MB of on-die L3 cache.

See Chipsets...Other markets.

See also

References

  1. ^ Morgan, Timothy (2008-05-27). "The Server Biz Enjoys the X64 Upgrade Cycle in Q1". IT Jungle. from the original on 2016-03-03. Retrieved 2008-10-29.
  2. ^ a b "Inventing Itanium: How HP Labs Helped Create the Next-Generation Chip Architecture". HP Labs. June 2001. from the original on 2012-03-04. Retrieved 2007-03-23.
  3. ^ Fisher, Joseph A. (1983). "Very Long Instruction Word architectures and the ELI-512". Proceedings of the 10th annual international symposium on Computer architecture. International Symposium on Computer Architecture. New York, NY, USA: Association for Computing Machinery (ACM). pp. 140–150. doi:10.1145/800046.801649. ISBN 0-89791-101-6.
  4. ^ a b c . Tech News on ZDNet. 2005-12-07. Archived from the original on 2008-02-09. Retrieved 2007-11-01.
  5. ^ a b Shankland, Stephen (1999-07-08). "Intel's Merced chip may slip further". CNET News. from the original on 2012-10-24. Retrieved 2008-10-16.
  6. ^ "Microprocessors - VLIW, The Past" (PDF). NY University. 2002-04-18. (PDF) from the original on 2018-06-27. Retrieved 2018-06-26.
  7. ^ Vijayan, Jaikumar (1999-09-01). . Computerworld. Archived from the original on 2000-01-05.
  8. ^ Wolfe, Alexander (1999-09-02). "Core-logic efforts under way for Merced". EE Times. from the original on 2016-03-06. Retrieved February 27, 2016.
  9. ^ . Business Wire. 1998-03-10. Archived from the original on 2004-09-20. Retrieved 2008-10-16.
  10. ^ . CNET News.com. 1999-09-17. Archived from the original on 2011-08-09. Retrieved 2007-11-01.
  11. ^ Kanellos, Michael (1999-10-04). "Intel names Merced chip Itanium". CNET News.com. from the original on 2015-12-30. Retrieved 2007-04-30.
  12. ^ Finstad, Kraig (1999-10-04). "Re:Itanium". USENET group comp.sys.mac.advocacy. Retrieved 2013-12-19.
  13. ^ (Press release). AMD. October 5, 1999. Archived from the original on March 8, 2012. Retrieved August 15, 2022.
  14. ^ (Press release). AMD. August 10, 2000. Archived from the original on March 8, 2012. Retrieved August 15, 2022.
  15. ^ Linley Gwennap (2001-06-04). "Itanium era dawns". EE Times. from the original on 2019-12-17. Retrieved 2020-01-19.
  16. ^ . ISA web site. Archived from the original on 2008-09-08. Retrieved 2007-05-16.
  17. ^ Niccolai, James (2008-05-20). "'Tukwila' Itanium servers due early next year, Intel says". Computerworld. Retrieved 2022-09-26.
  18. ^ Burt, Jeffrey (2010-02-08). "New Intel Itanium Offers Greater Performance, Memory Capacity". eWeek.
  19. ^ Merritt, Rick (2005-03-02). "Intel preps HyperTransport competitor for Xeon, Itanium CPUs". EE Times. from the original on 2018-11-30. Retrieved 2018-11-30.
  20. ^ a b Tan, Aaron (2007-06-15). "Intel updates Itanium line with 'Kittson'". ZDNet. from the original on 2020-08-27. Retrieved 2021-02-22.
  21. ^ Stokes, Jon (2009-02-05). "Intel delays quad Itanium to boost platform memory capacity". Ars Technica. from the original on 2012-01-22. Retrieved 2009-02-05.
  22. ^ Ng, Jansen (10 February 2009). . DailyTech. Archived from the original on 2009-02-13. Retrieved 2009-02-10.
  23. ^ "Poulson: The Future of Itanium Servers". realworldtech.com. 2011-05-18. from the original on 2011-06-10. Retrieved 2011-05-24.
  24. ^ (PDF) (Press release). 2011-08-19. Archived from the original (PDF) on 2012-03-24. Retrieved 2011-08-19.
  25. ^ a b Riedlinger, Reid J.; Bhatia, Rohit; Biro, Larry; Bowhill, Bill; Fetzer, Eric; Gronowski, Paul; Grutkowski, Tom (2011-02-24). "A 32nm 3.1 billion transistor 12-wide-issue Itanium® processor for mission-critical servers". 2011 IEEE International Solid-State Circuits Conference. pp. 84–86. doi:10.1109/ISSCC.2011.5746230. ISBN 978-1-61284-303-2. S2CID 20112763.
  26. ^ Merritt, Rick (2010-11-23). "Researchers carve CPU into plastic foil". EE Times. from the original on 2013-05-20. Retrieved 2020-01-19.
  27. ^ O'Brien, Terrence (2011-08-22). "Intel talks up next-gen Itanium: 32nm, 8-core Poulson". Engadget. from the original on 2018-04-21. Retrieved 2020-01-19.
  28. ^ (PDF). Archived from the original (PDF) on 2012-03-02. Retrieved 2011-11-20.
  29. ^ Kanter, David (2010-11-17). "Preparing for Tukwila: The Next Generation of Intel's Itanium Processor Family". Real World Tech. from the original on 2010-11-23. Retrieved 2010-11-17.
  30. ^ . 2011-08-19. Archived from the original on 2012-02-11. Retrieved 2012-01-23.
  31. ^ "Intel Itanium Hotchips 2011 Overview". 2011-08-18. from the original on 2012-02-14. Retrieved 2012-01-23.
  32. ^ a b Anton Shilov (January 31, 2019). "Intel to Discontinue Itanium 9700 'Kittson' Processor, the Last of the Itaniums". AnandTech. from the original on April 16, 2019. Retrieved April 16, 2019.
  33. ^ "Product Change Notification" (PDF). January 30, 2019. (PDF) from the original on February 1, 2019. Retrieved May 9, 2019.
  34. ^ "Intel Itanium Architecture Software Developer's Manual". from the original on 2019-04-08. Retrieved 2019-04-08.
  35. ^ De Gelas, Johan (2005-11-09). "Itanium–Is there light at the end of the tunnel?". AnandTech. from the original on 2012-05-03. Retrieved 2007-03-23.
  36. ^ . September 2001. Archived from the original on 2008-11-20. Retrieved 2008-01-24.
  37. ^ Chen, Raymond (2015-07-27). "The Itanium processor, part 1: Warming up". from the original on 2018-11-01. Retrieved 2018-10-31.
  38. ^ Chen, Raymond (2015-07-28). "The Itanium processor, part 2: Instruction encoding, templates, and stops". from the original on 2018-11-01. Retrieved 2018-10-31.
  39. ^ Chen, Raymond (2015-07-29). "The Itanium processor, part 3: The Windows calling convention, how parameters are passed". from the original on 2018-11-01. Retrieved 2018-10-31.
  40. ^ Sharangpani, Harsh; Arora, Ken (2000). "Itanium Processor Microarchitecture". IEEE Micro. pp. 38–39.
  41. ^ Cataldo, Anthony (2001-08-30). "Intel outfits Itanium processor for faster runs". EE Times. from the original on 2020-08-01. Retrieved 2020-01-19.
  42. ^ . Intel web site. Archived from the original on November 7, 2007. Retrieved 2007-05-16.

External links

  • Intel Itanium Home Page
  • Hewlett Packard Enterprise Integrity Servers Home Page
  • Intel Itanium Specifications
  • at the Wayback Machine (archived 2007-02-23)
  • Itanium Docs at HP

confused, with, intel, itanium, architecture, instruction, architecture, itanium, family, intel, microprocessors, basic, specification, originated, hewlett, packard, subsequently, implemented, intel, collaboration, with, first, itanium, processor, codenamed, m. Not to be confused with x86 64 IA 64 Intel Itanium architecture is the instruction set architecture ISA of the Itanium family of 64 bit Intel microprocessors The basic ISA specification originated at Hewlett Packard HP and was subsequently implemented by Intel in collaboration with HP The first Itanium processor codenamed Merced was released in 2001 Intel Itanium architectureDesignerHP and IntelBits64 bitIntroduced2001DesignEPICTypeRegister RegisterEncodingFixedBranchingCondition registerEndiannessSelectableRegistersGeneral purpose128 64 bits plus 1 trap bit 32 are static 96 use register windows 64 1 bit predicate registersFloating point128The Intel Itanium architecture The Itanium architecture is based on explicit instruction level parallelism in which the compiler decides which instructions to execute in parallel This contrasts with superscalar architectures which depend on the processor to manage instruction dependencies at runtime In all Itanium models up to and including Tukwila cores execute up to six instructions per clock cycle In 2008 Itanium was the fourth most deployed microprocessor architecture for enterprise class systems behind x86 64 Power ISA and SPARC 1 Contents 1 History 1 1 Development 1989 2000 1 1 1 Production 1 1 2 Marketing 1 2 Itanium Merced 2001 1 3 Itanium 2 2002 2010 1 4 Itanium 9300 Tukwila 2010 1 5 Itanium 9500 Poulson 2012 1 6 Itanium 9700 Kittson 2017 1 7 End of life 2021 2 Architecture 2 1 Instruction execution 2 2 Memory architecture 2 3 Architectural changes 3 See also 4 References 5 External linksHistory EditDevelopment 1989 2000 Edit In 1989 HP began to become concerned that reduced instruction set computing RISC architectures were approaching a processing limit at one instruction per cycle Both Intel and HP researchers had been exploring computer architecture options for future designs and separately began investigating a new concept known as very long instruction word VLIW 2 which came out of research by Yale University in the early 1980s 3 VLIW is a computer architecture concept like RISC and CISC where a single instruction word contains multiple instructions encoded in one very long instruction word to facilitate the processor executing multiple instructions in each clock cycle Typical VLIW implementations rely heavily on sophisticated compilers to determine at compile time which instructions can be executed at the same time and the proper scheduling of these instructions for execution and also to help predict the direction of branch operations The value of this approach is to do more useful work in fewer clock cycles and to simplify processor instruction scheduling and branch prediction hardware requirements with a penalty in increased processor complexity cost and energy consumption in exchange for faster execution Production Edit During this time HP had begun to believe that it was no longer cost effective for individual enterprise systems companies such as itself to develop proprietary microprocessors Intel had also been researching several architectural options for going beyond the x86 ISA to address high end enterprise server and high performance computing HPC requirements Intel and HP partnered in 1994 to develop the IA 64 ISA using a variation of VLIW design concepts which Intel named explicitly parallel instruction computing EPIC Intel s goal was to leverage the expertise HP had developed in their early VLIW work along with their own to develop a volume product line targeted at the aforementioned high end systems that could be sold to all original equipment manufacturers OEMs while HP wished to be able to purchase off the shelf processors built using Intel s volume manufacturing and contemporary process technology that were better than their PA RISC processors Intel took the lead on the design and commercialization process while HP contributes to the ISA definition the Merced Itanium microarchitecture and Itanium 2 The original goal year for delivering the first Itanium family product Merced was 1998 2 Marketing Edit Intel s product marketing and industry engagement efforts were substantial and achieved design wins with the majority of enterprise server OEMs including those based on RISC processors at the time Compaq and Silicon Graphics decided to abandon further development of the Alpha and MIPS architectures respectively in favor of migrating to IA 64 4 By 1997 it was apparent that the IA 64 architecture and the compiler were much more difficult to implement than originally thought and the delivery of Itanium began slipping 5 Since Itanium was the first ever EPIC processor the development effort encountered more unanticipated problems than the team was accustomed to In addition the EPIC concept depends on compiler capabilities that had never been implemented before so more research was needed 6 Several groups developed operating systems for the architecture including Microsoft Windows Unix and Unix like systems such as Linux HP UX FreeBSD Solaris 7 8 9 Tru64 UNIX 4 and Monterey 64 10 the last three were canceled before reaching the market In 1999 Intel led the formation of an open source industry consortium to port Linux to IA 64 they named Trillium and later renamed Trillian due to a trademark issue which was led by Intel and included Caldera Systems CERN Cygnus Solutions Hewlett Packard IBM Red Hat SGI SuSE TurboLinux and VA Linux Systems As a result a working IA 64 Linux was delivered ahead of schedule and was the first OS to run on the new Itanium processors Intel announced the official name of the processor Itanium on October 4 1999 11 Within hours the name Itanic had been coined on a Usenet newsgroup as a pun on the name Titanic the unsinkable ocean liner that sank on its maiden voyage in 1912 12 The very next day on 5th October 1999 AMD announced their plans to extend Intel s x86 instruction set to include a fully downward compatible 64 bit mode additionally revealing AMD s newly coming x86 64 bit architecture which the company already worked on to be incorporated into AMD s upcoming eighth generation microprocessor code named SledgeHammer 13 AMD also signaled a full disclosure of the architecture s specifications and further details to be available in August 2000 14 As AMD was never invited to be a contributing party for the IA 64 architecture and any kind of licensing seemed unlikely AMD s AMD64 architecture extension was positioned from the beginning as an evolutionary way to add 64 bit computing capabilities to the existing x86 architecture while still supporting legacy 32 bit x86 code as opposed to Intel s approach of creating an entirely new completely x86 incompatible 64 bit architecture with IA 64 Itanium Merced 2001 Edit Itanium Merced Itanium processorGeneral informationLaunchedJune 2001DiscontinuedJune 2002Common manufacturer s IntelPerformanceMax CPU clock rate733 MHz to 800 MHzFSB speeds266 MT sCacheL2 cache96 KBL3 cache2 or 4 MBArchitecture and classificationInstruction setItaniumPhysical specificationsCores1Socket s PAC418Products models variantsCore name s MercedBy the time Itanium was released in June 2001 its performance was not superior to competing RISC and CISC processors 15 Recognizing that the lack of software could be a serious problem for the future Intel made thousands of these early systems available to independent software vendors ISVs to stimulate development HP and Intel brought the next generation Itanium 2 processor to market a year later Itanium 2 2002 2010 Edit Itanium 2 McKinley Itanium 2 processorGeneral informationLaunched2002DiscontinuedpresentDesigned byIntelCommon manufacturer s IntelPerformanceMax CPU clock rate733 MHz to 2 66 GHzCacheL2 cache256 KB on Itanium2256 KB D 1 MB I or 512 KB I on Itanium2 9x00 series L3 cache1 5 32 MBArchitecture and classificationInstruction setItaniumPhysical specificationsCores1 2 4 or 8Socket s PAC611LGA1248 FC LGA6 Itanium 9300 series Products models variantsCore name s McKinleyMadisonHondoDeerfieldMontecitoMontvaleTukwilaPoulson Itanium 2 in 2003 The Itanium 2 processor was released in 2002 It relieved many of the performance problems of the original Itanium processor which were mostly caused by an inefficient memory subsystem In 2003 AMD released the Opteron which implemented its own 64 bit architecture x86 64 Opteron gained rapid acceptance in the enterprise server space because it provided an easy upgrade from x86 Intel responded by implementing x86 64 as Em64t in its Xeon microprocessors in 2004 4 In November 2005 the major Itanium server manufacturers joined with Intel and a number of software vendors to form the Itanium Solutions Alliance to promote the architecture and accelerate software porting 16 In 2006 Intel delivered Montecito marketed as the Itanium 2 9000 series a dual core processor that roughly doubled performance and decreased energy consumption by about 20 percent 17 Itanium 9300 Tukwila 2010 Edit Main article Tukwila processor The Itanium 9300 series processor codenamed Tukwila was released on 8 February 2010 with greater performance and memory capacity 18 Tukwila had originally been slated for release in 2007 19 The device uses a 65 nm process includes two to four cores up to 24 MB on die caches Hyper Threading technology and integrated memory controllers It implements double device data correction DDDC which helps to fix memory errors Tukwila also implements Intel QuickPath Interconnect QPI to replace the Itanium bus based architecture It has a peak interprocessor bandwidth of 96 GB s and a peak memory bandwidth of 34 GB s With QuickPath the processor has integrated memory controllers and interfaces the memory directly using QPI interfaces to directly connect to other processors and I O hubs QuickPath is also used on Intel processors using the Nehalem microarchitecture making it probable that Tukwila and Nehalem will be able to use the same chipsets 20 Tukwila incorporates four memory controllers each of which supports multiple DDR3 DIMMs via a separate memory controller 21 much like the Nehalem based Xeon processor code named Beckton 22 Itanium 9500 Poulson 2012 Edit This section needs to be updated Please help update this article to reflect recent events or newly available information April 2017 The Itanium 9500 series processor codenamed Poulson is the follow on processor to Tukwila features eight cores has a 12 wide issue architecture multithreading enhancements and new instructions to take advantage of parallelism especially in virtualization 20 23 24 The Poulson L3 cache size is 32 MB L2 cache size is 6 MB 512 I KB 256 D KB per core 25 Die size is 544 mm less than its predecessor Tukwila 698 75 mm 26 27 At ISSCC 2011 Intel presented a paper called A 32nm 3 1 Billion Transistor 12 Wide Issue Itanium Processor for Mission Critical Servers 25 28 Given Intel s history of disclosing details about Itanium microprocessors at ISSCC this paper most likely refers to Poulson Analyst David Kanter speculates that Poulson will use a new microarchitecture with a more advanced form of multi threading that uses as many as two threads to improve performance for single threaded and multi threaded workloads 29 Some new information was released at Hotchips conference 30 31 New information presents improvements in multithreading resiliency improvements Instruction Replay RAS and few new instructions thread priority integer instruction cache prefetching data access hints Itanium 9700 Kittson 2017 Edit The Kittson is the same as the 9500 Poulson but slightly higher clocked 32 End of life 2021 Edit In January 2019 Intel announced that Kittson would be discontinued with a last order date of January 2020 and a last ship date of July 2021 32 33 There is no planned successor Architecture EditFor AMD64 and Intel64 architecture see x86 64 Intel has extensively documented the Itanium instruction set 34 and the technical press has provided overviews 35 5 The architecture has been renamed several times during its history HP originally called it PA WideWord Intel later called it IA 64 then Itanium Processor Architecture IPA 36 before settling on Intel Itanium Architecture but it is still widely referred to as IA 64 It is a 64 bit register rich explicitly parallel architecture The base data word is 64 bits byte addressable The logical address space is 264 bytes The architecture implements predication speculation and branch prediction It uses variable sized register windowing for parameter passing The same mechanism is also used to permit parallel execution of loops Speculation prediction predication and renaming are under control of the compiler each instruction word includes extra bits for this This approach is the distinguishing characteristic of the architecture The architecture implements a large number of registers 37 38 39 128 general integer registers which are 64 bit plus one trap bit NaT which stands for not a thing used for speculative execution 32 of these are static the other 96 are stacked using variably sized register windows or rotating for pipelined loops gr sub 0 sub always reads 0 128 floating point registers The floating point registers are 82 bits long to preserve precision for intermediate results Instead of a dedicated NaT trap bit like the integer registers floating point registers have a trap value called NaTVal Not a Thing Value similar to but distinct from NaN These also have 32 static registers and 96 windowed or rotating registers fr sub 0 sub always reads 0 0 and fr sub 1 sub always reads 1 0 64 one bit predicate registers These also have 32 static registers and 96 windowed or rotating registers pr sub 0 sub always reads 1 true 8 branch registers for the addresses of indirect jumps br sub 0 sub is set to the return address when a function is called with br call 128 special purpose or application registers which are mostly of interest to the kernel and not ordinary applications For example one register called bsp points to the second stack which is where the hardware will automatically spill registers when the register window wraps around Each 128 bit instruction word is called a bundle and contains three slots each holding a 41 bit instruction plus a 5 bit template indicating which type of instruction is in each slot Those types are M unit memory instructions I unit integer ALU non ALU integer or long immediate extended instructions F unit floating point instructions or B unit branch or long branch extended instructions The template also encodes stops which indicate that a data dependency exists between data before and after the stop All instructions between a pair of stops constitute an instruction group regardless of their bundling and must be free of many types of data dependencies this knowledge allows the processor to execute instructions in parallel without having to perform its own complicated data analysis since that analysis was already done when the instructions were written Within each slot all but a few instructions are predicated specifying a predicate register the value of which true or false will determine whether the instruction is executed Predicated instructions which should always execute are predicated on pr sub 0 sub which always reads as true The IA 64 assembly language and instruction format was deliberately designed to be written mainly by compilers not by humans Instructions must be grouped into bundles of three ensuring that the three instructions match an allowed template Instructions must issue stops between certain types of data dependencies and stops can also only be used in limited places according to the allowed templates Instruction execution Edit The fetch mechanism can read up to two bundles per clock from the L1 cache into the pipeline When the compiler can take maximum advantage of this the processor can execute six instructions per clock cycle The processor has thirty functional execution units in eleven groups Each unit can execute a particular subset of the instruction set and each unit executes at a rate of one instruction per cycle unless execution stalls waiting for data While not all units in a group execute identical subsets of the instruction set common instructions can be executed in multiple units The execution unit groups include Six general purpose ALUs two integer units one shift unit Four data cache units Six multimedia units two parallel shift units one parallel multiply one population count Two 82 bit floating point multiply accumulate units two SIMD floating point multiply accumulate units two 32 bit operations each 40 Three branch unitsIdeally the compiler can often group instructions into sets of six that can execute at the same time Since the floating point units implement a multiply accumulate operation a single floating point instruction can perform the work of two instructions when the application requires a multiply followed by an add this is very common in scientific processing When it occurs the processor can execute four FLOPs per cycle For example the 800 MHz Itanium had a theoretical rating of 3 2 GFLOPS and the fastest Itanium 2 at 1 67 GHz was rated at 6 67 GFLOPS In practice the processor may often be underutilized with not all slots filled with useful instructions due to e g data dependencies or limitations in the available bundle templates The densest possible code requires 42 6 bits per instruction compared to 32 bits per instruction on traditional RISC processors of the time and no ops due to wasted slots further decrease the density of code Additional instructions for speculative loads and hints for branches and cache are difficult to generate optimally even with modern compilers Memory architecture Edit From 2002 to 2006 Itanium 2 processors shared a common cache hierarchy They had 16 KB of Level 1 instruction cache and 16 KB of Level 1 data cache The L2 cache was unified both instruction and data and is 256 KB The Level 3 cache was also unified and varied in size from 1 5 MB to 24 MB The 256 KB L2 cache contains sufficient logic to handle semaphore operations without disturbing the main arithmetic logic unit ALU Main memory is accessed through a bus to an off chip chipset The Itanium 2 bus was initially called the McKinley bus but is now usually referred to as the Itanium bus The speed of the bus has increased steadily with new processor releases The bus transfers 2 128 bits per clock cycle so the 200 MHz McKinley bus transferred 6 4 GB s and the 533 MHz Montecito bus transfers 17 056 GB s 41 Architectural changes Edit Intel VT i redirects here For the x86 virtualization extensions see Intel VT x Itanium processors released prior to 2006 had hardware support for the IA 32 architecture to permit support for legacy server applications but performance for IA 32 code was much worse than for native code and also worse than the performance of contemporaneous x86 processors In 2005 Intel developed the IA 32 Execution Layer IA 32 EL a software emulator that provides better performance With Montecito Intel therefore eliminated hardware support for IA 32 code In 2006 with the release of Montecito Intel made a number of enhancements to the basic processor architecture including 42 Hardware multithreading Each processor core maintains context for two threads of execution When one thread stalls during memory access the other thread can execute Intel calls this coarse multithreading to distinguish it from the hyper threading technology Intel integrated into some x86 and x86 64 microprocessors Hardware support for virtualization Intel added Intel Virtualization Technology Intel VT i which provides hardware assists for core virtualization functions Virtualization allows a software hypervisor to run multiple operating system instances on the processor concurrently Cache enhancements Montecito added a split L2 cache which included a dedicated 1 MB L2 cache for instructions The original 256 KB L2 cache was converted to a dedicated data cache Montecito also included up to 12 MB of on die L3 cache See Chipsets Other markets See also EditList of Intel Itanium microprocessorsReferences Edit Morgan Timothy 2008 05 27 The Server Biz Enjoys the X64 Upgrade Cycle in Q1 IT Jungle Archived from the original on 2016 03 03 Retrieved 2008 10 29 a b Inventing Itanium How HP Labs Helped Create the Next Generation Chip Architecture HP Labs June 2001 Archived from the original on 2012 03 04 Retrieved 2007 03 23 Fisher Joseph A 1983 Very Long Instruction Word architectures and the ELI 512 Proceedings of the 10th annual international symposium on Computer architecture International Symposium on Computer Architecture New York NY USA Association for Computing Machinery ACM pp 140 150 doi 10 1145 800046 801649 ISBN 0 89791 101 6 a b c Itanium A cautionary tale Tech News on ZDNet 2005 12 07 Archived from the original on 2008 02 09 Retrieved 2007 11 01 a b Shankland Stephen 1999 07 08 Intel s Merced chip may slip further CNET News Archived from the original on 2012 10 24 Retrieved 2008 10 16 Microprocessors VLIW The Past PDF NY University 2002 04 18 Archived PDF from the original on 2018 06 27 Retrieved 2018 06 26 Vijayan Jaikumar 1999 09 01 Solaris for IA 64 coming this fall Computerworld Archived from the original on 2000 01 05 Wolfe Alexander 1999 09 02 Core logic efforts under way for Merced EE Times Archived from the original on 2016 03 06 Retrieved February 27 2016 Sun Introduces Solaris Developer Kit for Intel to Speed Development of Applications On Solaris Award winning Sun Tools Help ISVs Easily Develop for Solaris on Intel Today Business Wire 1998 03 10 Archived from the original on 2004 09 20 Retrieved 2008 10 16 Next generation chip passes key milestone CNET News com 1999 09 17 Archived from the original on 2011 08 09 Retrieved 2007 11 01 Kanellos Michael 1999 10 04 Intel names Merced chip Itanium CNET News com Archived from the original on 2015 12 30 Retrieved 2007 04 30 Finstad Kraig 1999 10 04 Re Itanium USENET group comp sys mac advocacy Retrieved 2013 12 19 AMD Discloses New Technologies At Microporcessor Forum Press release AMD October 5 1999 Archived from the original on March 8 2012 Retrieved August 15 2022 AMD Releases x86 64 Architectural Specification Enables Market Driven Migration to 64 Bit Computing Press release AMD August 10 2000 Archived from the original on March 8 2012 Retrieved August 15 2022 Linley Gwennap 2001 06 04 Itanium era dawns EE Times Archived from the original on 2019 12 17 Retrieved 2020 01 19 Itanium Solutions Alliance ISA web site Archived from the original on 2008 09 08 Retrieved 2007 05 16 Niccolai James 2008 05 20 Tukwila Itanium servers due early next year Intel says Computerworld Retrieved 2022 09 26 Burt Jeffrey 2010 02 08 New Intel Itanium Offers Greater Performance Memory Capacity eWeek Merritt Rick 2005 03 02 Intel preps HyperTransport competitor for Xeon Itanium CPUs EE Times Archived from the original on 2018 11 30 Retrieved 2018 11 30 a b Tan Aaron 2007 06 15 Intel updates Itanium line with Kittson ZDNet Archived from the original on 2020 08 27 Retrieved 2021 02 22 Stokes Jon 2009 02 05 Intel delays quad Itanium to boost platform memory capacity Ars Technica Archived from the original on 2012 01 22 Retrieved 2009 02 05 Ng Jansen 10 February 2009 Intel Aims for Efficiency With New Server Roadmap DailyTech Archived from the original on 2009 02 13 Retrieved 2009 02 10 Poulson The Future of Itanium Servers realworldtech com 2011 05 18 Archived from the original on 2011 06 10 Retrieved 2011 05 24 Intel Discloses Architecture Features of Next Itanium Processor at Hot Chips 2011 PDF Press release 2011 08 19 Archived from the original PDF on 2012 03 24 Retrieved 2011 08 19 a b Riedlinger Reid J Bhatia Rohit Biro Larry Bowhill Bill Fetzer Eric Gronowski Paul Grutkowski Tom 2011 02 24 A 32nm 3 1 billion transistor 12 wide issue Itanium processor for mission critical servers 2011 IEEE International Solid State Circuits Conference pp 84 86 doi 10 1109 ISSCC 2011 5746230 ISBN 978 1 61284 303 2 S2CID 20112763 Merritt Rick 2010 11 23 Researchers carve CPU into plastic foil EE Times Archived from the original on 2013 05 20 Retrieved 2020 01 19 O Brien Terrence 2011 08 22 Intel talks up next gen Itanium 32nm 8 core Poulson Engadget Archived from the original on 2018 04 21 Retrieved 2020 01 19 ISSCC 2011 PDF Archived from the original PDF on 2012 03 02 Retrieved 2011 11 20 Kanter David 2010 11 17 Preparing for Tukwila The Next Generation of Intel s Itanium Processor Family Real World Tech Archived from the original on 2010 11 23 Retrieved 2010 11 17 Itanium Poulson Update Greater Parallelism New Instruction Replay amp More Catch the details from Hotchips 2011 08 19 Archived from the original on 2012 02 11 Retrieved 2012 01 23 Intel Itanium Hotchips 2011 Overview 2011 08 18 Archived from the original on 2012 02 14 Retrieved 2012 01 23 a b Anton Shilov January 31 2019 Intel to Discontinue Itanium 9700 Kittson Processor the Last of the Itaniums AnandTech Archived from the original on April 16 2019 Retrieved April 16 2019 Product Change Notification PDF January 30 2019 Archived PDF from the original on February 1 2019 Retrieved May 9 2019 Intel Itanium Architecture Software Developer s Manual Archived from the original on 2019 04 08 Retrieved 2019 04 08 De Gelas Johan 2005 11 09 Itanium Is there light at the end of the tunnel AnandTech Archived from the original on 2012 05 03 Retrieved 2007 03 23 HPWorks Newsletter September 2001 Archived from the original on 2008 11 20 Retrieved 2008 01 24 Chen Raymond 2015 07 27 The Itanium processor part 1 Warming up Archived from the original on 2018 11 01 Retrieved 2018 10 31 Chen Raymond 2015 07 28 The Itanium processor part 2 Instruction encoding templates and stops Archived from the original on 2018 11 01 Retrieved 2018 10 31 Chen Raymond 2015 07 29 The Itanium processor part 3 The Windows calling convention how parameters are passed Archived from the original on 2018 11 01 Retrieved 2018 10 31 Sharangpani Harsh Arora Ken 2000 Itanium Processor Microarchitecture IEEE Micro pp 38 39 Cataldo Anthony 2001 08 30 Intel outfits Itanium processor for faster runs EE Times Archived from the original on 2020 08 01 Retrieved 2020 01 19 Intel product announcement Intel web site Archived from the original on November 7 2007 Retrieved 2007 05 16 External links EditIntel Itanium Home Page Hewlett Packard Enterprise Integrity Servers Home Page Intel Itanium Specifications Some undocumented Itanium 2 microarchitectural information at the Wayback Machine archived 2007 02 23 IA 64 tutorial including code examples Itanium Docs at HP Retrieved from https en wikipedia org w index php title IA 64 amp oldid 1112615881, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.