fbpx
Wikipedia

ARM big.LITTLE

ARM big.LITTLE is a heterogeneous computing architecture developed by ARM Holdings, coupling relatively battery-saving and slower processor cores (LITTLE) with relatively more powerful and power-hungry ones (big). Typically, only one "side" or the other will be active at once, but all cores have access to the same memory regions, so workloads can be swapped between Big and Little cores on the fly.[1] The intention is to create a multi-core processor that can adjust better to dynamic computing needs and use less power than clock scaling alone. ARM's marketing material promises up to a 75% savings in power usage for some activities.[2] Most commonly, ARM big.LITTLE architectures are used to create a multi-processor system-on-chip (MPSoC).

Cortex A57/A53 MPCore big.LITTLE CPU chip

In October 2011, big.LITTLE was announced along with the Cortex-A7, which was designed to be architecturally compatible with the Cortex-A15.[3] In October 2012 ARM announced the Cortex-A53 and Cortex-A57 (ARMv8-A) cores, which are also intercompatible to allow their use in a big.LITTLE chip.[4] ARM later announced the Cortex-A12 at Computex 2013 followed by the Cortex-A17 in February 2014. Both the Cortex-A12 and the Cortex-A17 can also be paired in a big.LITTLE configuration with the Cortex-A7.[5][6]

The problem that big.LITTLE solves

For a given library of CMOS logic, active power increases as the logic switches more per second, while leakage increases with the number of transistors. So, CPUs designed to run fast are different from CPUs designed to save power. When a very fast out-of-order CPU is idling at very low speeds, a CPU with much less leakage (fewer transistors) could do the same work. For example, it might use a smaller (fewer transistors) memory cache, or a simpler microarchitecture such as a pipeline. big.LITTLE is a way to optimize for both cases: Power and speed, in the same system.

In practice, a big.LITTLE system can be surprisingly inflexible. One issue is the number and types of power and clock domains that the IC provides. These may not match the standard power management features offered by an operating system. Another is that the CPUs no longer have equivalent abilities, and matching the right software task to the right CPU becomes more difficult. Most of these problems are being solved by making the electronics and software more flexible.

Run-state migration

There are three ways[7] for the different processor cores to be arranged in a big.LITTLE design, depending on the scheduler implemented in the kernel.[8]

Clustered switching

 
Big.Little clustered switching

The clustered model approach is the first and simplest implementation, arranging the processor into identically sized clusters of "big" or "LITTLE" cores. The operating system scheduler can only see one cluster at a time; when the load on the whole processor changes between low and high, the system transitions to the other cluster. All relevant data are then passed through the common L2 cache, the active core cluster is powered off and the other one is activated. A Cache Coherent Interconnect (CCI) is used. This model has been implemented in the Samsung Exynos 5 Octa (5410).[9]

In-kernel switcher (CPU migration)

 
Big.Little in-kernel switcher

CPU migration via the in-kernel switcher (IKS) involves pairing up a 'big' core with a 'LITTLE' core, with possibly many identical pairs in one chip. Each pair operates as one so-termed virtual core, and only one real core is (fully) powered up and running at a time. The 'big' core is used when the demand is high and the 'LITTLE' core is employed when demand is low. When demand on the virtual core changes (between high and low), the incoming core is powered up, running state is transferred, the outgoing is shut down, and processing continues on the new core. Switching is done via the cpufreq framework. A complete big.LITTLE IKS implementation was added in Linux 3.11. big.LITTLE IKS is an improvement of cluster migration (§ Clustered switching), the main difference being that each pair is visible to the scheduler.

A more complex arrangement involves a non-symmetric grouping of 'big' and 'LITTLE' cores. A single chip could have one or two 'big' cores and many more 'LITTLE' cores, or vice versa. Nvidia created something similar to this with the low-power 'companion core' in their Tegra 3 System-on-Chip.

Heterogeneous multi-processing (global task scheduling)

 
Big.Little heterogeneous multi-processing

The most powerful use model of big.LITTLE architecture is Heterogeneous Multi-Processing (HMP), which enables the use of all physical cores at the same time. Threads with high priority or computational intensity can in this case be allocated to the "big" cores while threads with less priority or less computational intensity, such as background tasks, can be performed by the "LITTLE" cores.[10]

This model has been implemented in the Samsung Exynos starting with the Exynos 5 Octa series (5420, 5422, 5430),[11][12] and Apple A series processors starting with the Apple A11.[13]

Scheduling

The paired arrangement allows for switching to be done transparently to the operating system using the existing dynamic voltage and frequency scaling (DVFS) facility. The existing DVFS support in the kernel (e.g. cpufreq in Linux) will simply see a list of frequencies/voltages and will switch between them as it sees fit, just like it does on the existing hardware. However, the low-end slots will activate the 'Little' core and the high-end slots will activate the 'Big' core. This is the early solution provided by Linux's "deadline" CPU scheduler (not to be confused with the I/O scheduler with the same name) since 2012.[14]

Alternatively, all the cores may be exposed to the kernel scheduler, which will decide where each process/thread is executed. This will be required for the non-paired arrangement but could possibly also be used on the paired cores. It poses unique problems for the kernel scheduler, which, at least with modern commodity hardware, has been able to assume all cores in a SMP system are equal rather than heterogeneous. A 2019 addition to Linux 5.0 called Energy Aware Scheduling is an example of a scheduler that considers cores differently.[15][16]

Advantages of global task scheduling

  • Finer-grained control of workloads that are migrated between cores. Because the scheduler is directly migrating tasks between cores, kernel overhead is reduced and power savings can be correspondingly increased.
  • Implementation in the scheduler also makes switching decisions faster than in the cpufreq framework implemented in IKS.
  • The ability to easily support non-symmetrical clusters (e.g. with 2 Cortex-A15 cores and 4 Cortex-A7 cores).
  • The ability to use all cores simultaneously to provide improved peak performance throughput of the SoC compared to IKS.

Successor

In May 2017, ARM announced DynamIQ as the successor to big.LITTLE.[17] DynamIQ is expected to allow for more flexibility and scalability when designing multi-core processors. In contrast to big.LITTLE, it increases the maximum number of cores in a cluster to 8, allows for varying core designs within a single cluster, and up to 32 total clusters. The technology also offers more fine grained per core voltage control and faster L2 cache speeds. However, DynamIQ is incompatible with previous ARM designs and is initially only supported by the Cortex-A75 and Cortex-A55 CPU cores.

References

  1. ^ Nguyen, Hubert (17 January 2013). "What Is ARM big.LITTLE?". UberGizmo.com.
  2. ^ . ARM.com. Archived from the original on 22 October 2012. Retrieved 17 October 2012.
  3. ^ "ARM Unveils its Most Energy Efficient Application Processor Ever; Redefines Traditional Power And Performance Relationship With big.LITTLE Processing" (Press release). ARM Holdings. 19 October 2011. Retrieved 31 October 2012.
  4. ^ "ARM Launches Cortex-A50 Series, the World's Most Energy-Efficient 64-bit Processors" (Press release). ARM Holdings. Retrieved 31 October 2012.
  5. ^ "ARM's new Cortex-A12 is ready to power 2014's $200 midrange smartphones". The Verge. April 2014.
  6. ^ "ARM Cortex A17: An Evolved Cortex A12 for the Mainstream in 2015". AnandTech. April 2014.
  7. ^ Brian Jeff (18 June 2013). . ARM Holdings. Archived from the original on 10 September 2013. Retrieved 17 September 2013.
  8. ^ George Grey (10 July 2013). . Linaro. Archived from the original on 4 October 2013. Retrieved 17 September 2013.
  9. ^ Peter Clarke (6 August 2013). "Benchmarking ARM's big-little architecture". Retrieved 17 September 2013.
  10. ^ (PDF), ARM Holdings, September 2013, archived from the original (PDF) on 17 April 2012, retrieved 17 September 2013
  11. ^ Brian Klug (11 September 2013). "Samsung Announces big.LITTLE MP Support in Exynos 5420". AnandTech. Retrieved 16 September 2013.
  12. ^ "Samsung Unveils New Products from its System LSI Business at Mobile World Congress". Samsung Tomorrow. Retrieved 26 February 2013.
  13. ^ "The future is here: iPhone X". Apple Newsroom. Retrieved 25 February 2018.
  14. ^ McKenney, Paul (12 June 2012). "A big.LITTLE scheduler update". LWN.net.
  15. ^ Perret, Quentin (25 February 2019). "Energy Aware Scheduling merged in Linux 5.0". community.arm.com.
  16. ^ "Energy Aware Scheduling". The Linux Kernel documentation.
  17. ^ Humrick, Matt (29 May 2017). "Exploring Dynamiq and ARM's New CPUs". Anandtech. Retrieved 10 July 2017.

Further reading

  • David Zinman (25 January 2013). "big.LITTLE MP status Jan 25, 2013". LWN.net. Retrieved 25 January 2013.
  • Nicolas Pitre (15 February 2012). "Linux support for ARM big.LITTLE". LWN.net. Retrieved 18 October 2012.
  • Paul McKenney (12 June 2012). "A big.LITTLE scheduler update". LWN.net. Retrieved 18 October 2012.
  • Jake Edge (5 September 2012). "KS2012: ARM: A big.LITTLE update". LWN.net. Retrieved 18 October 2012.
  • Jon Stokes (20 October 2011). "ARM's new Cortex A7 is tailor-made for Android superphones". Ars Technica. Retrieved 31 October 2012.
  • Andrew Cunningham (30 October 2012). "ARM goes 64-bit with new Cortex-A53 and Cortex-A57 designs". Ars Technica. Retrieved 31 October 2012.

External links

  • (PDF) (full technical explanation)

little, heterogeneous, computing, architecture, developed, holdings, coupling, relatively, battery, saving, slower, processor, cores, little, with, relatively, more, powerful, power, hungry, ones, typically, only, side, other, will, active, once, cores, have, . ARM big LITTLE is a heterogeneous computing architecture developed by ARM Holdings coupling relatively battery saving and slower processor cores LITTLE with relatively more powerful and power hungry ones big Typically only one side or the other will be active at once but all cores have access to the same memory regions so workloads can be swapped between Big and Little cores on the fly 1 The intention is to create a multi core processor that can adjust better to dynamic computing needs and use less power than clock scaling alone ARM s marketing material promises up to a 75 savings in power usage for some activities 2 Most commonly ARM big LITTLE architectures are used to create a multi processor system on chip MPSoC Cortex A57 A53 MPCore big LITTLE CPU chip In October 2011 big LITTLE was announced along with the Cortex A7 which was designed to be architecturally compatible with the Cortex A15 3 In October 2012 ARM announced the Cortex A53 and Cortex A57 ARMv8 A cores which are also intercompatible to allow their use in a big LITTLE chip 4 ARM later announced the Cortex A12 at Computex 2013 followed by the Cortex A17 in February 2014 Both the Cortex A12 and the Cortex A17 can also be paired in a big LITTLE configuration with the Cortex A7 5 6 Contents 1 The problem that big LITTLE solves 2 Run state migration 2 1 Clustered switching 2 2 In kernel switcher CPU migration 2 3 Heterogeneous multi processing global task scheduling 3 Scheduling 4 Advantages of global task scheduling 5 Successor 6 References 7 Further reading 8 External linksThe problem that big LITTLE solves EditFor a given library of CMOS logic active power increases as the logic switches more per second while leakage increases with the number of transistors So CPUs designed to run fast are different from CPUs designed to save power When a very fast out of order CPU is idling at very low speeds a CPU with much less leakage fewer transistors could do the same work For example it might use a smaller fewer transistors memory cache or a simpler microarchitecture such as a pipeline big LITTLE is a way to optimize for both cases Power and speed in the same system In practice a big LITTLE system can be surprisingly inflexible One issue is the number and types of power and clock domains that the IC provides These may not match the standard power management features offered by an operating system Another is that the CPUs no longer have equivalent abilities and matching the right software task to the right CPU becomes more difficult Most of these problems are being solved by making the electronics and software more flexible Run state migration EditThere are three ways 7 for the different processor cores to be arranged in a big LITTLE design depending on the scheduler implemented in the kernel 8 Clustered switching Edit Big Little clustered switching The clustered model approach is the first and simplest implementation arranging the processor into identically sized clusters of big or LITTLE cores The operating system scheduler can only see one cluster at a time when the load on the whole processor changes between low and high the system transitions to the other cluster All relevant data are then passed through the common L2 cache the active core cluster is powered off and the other one is activated A Cache Coherent Interconnect CCI is used This model has been implemented in the Samsung Exynos 5 Octa 5410 9 In kernel switcher CPU migration Edit Big Little in kernel switcher CPU migration via the in kernel switcher IKS involves pairing up a big core with a LITTLE core with possibly many identical pairs in one chip Each pair operates as one so termed virtual core and only one real core is fully powered up and running at a time The big core is used when the demand is high and the LITTLE core is employed when demand is low When demand on the virtual core changes between high and low the incoming core is powered up running state is transferred the outgoing is shut down and processing continues on the new core Switching is done via the cpufreq framework A complete big LITTLE IKS implementation was added in Linux 3 11 big LITTLE IKS is an improvement of cluster migration Clustered switching the main difference being that each pair is visible to the scheduler A more complex arrangement involves a non symmetric grouping of big and LITTLE cores A single chip could have one or two big cores and many more LITTLE cores or vice versa Nvidia created something similar to this with the low power companion core in their Tegra 3 System on Chip Heterogeneous multi processing global task scheduling Edit Big Little heterogeneous multi processing The most powerful use model of big LITTLE architecture is Heterogeneous Multi Processing HMP which enables the use of all physical cores at the same time Threads with high priority or computational intensity can in this case be allocated to the big cores while threads with less priority or less computational intensity such as background tasks can be performed by the LITTLE cores 10 This model has been implemented in the Samsung Exynos starting with the Exynos 5 Octa series 5420 5422 5430 11 12 and Apple A series processors starting with the Apple A11 13 Scheduling EditThe paired arrangement allows for switching to be done transparently to the operating system using the existing dynamic voltage and frequency scaling DVFS facility The existing DVFS support in the kernel e g cpufreq in Linux will simply see a list of frequencies voltages and will switch between them as it sees fit just like it does on the existing hardware However the low end slots will activate the Little core and the high end slots will activate the Big core This is the early solution provided by Linux s deadline CPU scheduler not to be confused with the I O scheduler with the same name since 2012 14 Alternatively all the cores may be exposed to the kernel scheduler which will decide where each process thread is executed This will be required for the non paired arrangement but could possibly also be used on the paired cores It poses unique problems for the kernel scheduler which at least with modern commodity hardware has been able to assume all cores in a SMP system are equal rather than heterogeneous A 2019 addition to Linux 5 0 called Energy Aware Scheduling is an example of a scheduler that considers cores differently 15 16 Advantages of global task scheduling EditFiner grained control of workloads that are migrated between cores Because the scheduler is directly migrating tasks between cores kernel overhead is reduced and power savings can be correspondingly increased Implementation in the scheduler also makes switching decisions faster than in the cpufreq framework implemented in IKS The ability to easily support non symmetrical clusters e g with 2 Cortex A15 cores and 4 Cortex A7 cores The ability to use all cores simultaneously to provide improved peak performance throughput of the SoC compared to IKS Successor EditIn May 2017 ARM announced DynamIQ as the successor to big LITTLE 17 DynamIQ is expected to allow for more flexibility and scalability when designing multi core processors In contrast to big LITTLE it increases the maximum number of cores in a cluster to 8 allows for varying core designs within a single cluster and up to 32 total clusters The technology also offers more fine grained per core voltage control and faster L2 cache speeds However DynamIQ is incompatible with previous ARM designs and is initially only supported by the Cortex A75 and Cortex A55 CPU cores References Edit Nguyen Hubert 17 January 2013 What Is ARM big LITTLE UberGizmo com big LITTLE technology ARM com Archived from the original on 22 October 2012 Retrieved 17 October 2012 ARM Unveils its Most Energy Efficient Application Processor Ever Redefines Traditional Power And Performance Relationship With big LITTLE Processing Press release ARM Holdings 19 October 2011 Retrieved 31 October 2012 ARM Launches Cortex A50 Series the World s Most Energy Efficient 64 bit Processors Press release ARM Holdings Retrieved 31 October 2012 ARM s new Cortex A12 is ready to power 2014 s 200 midrange smartphones The Verge April 2014 ARM Cortex A17 An Evolved Cortex A12 for the Mainstream in 2015 AnandTech April 2014 Brian Jeff 18 June 2013 Ten Things to Know About big LITTLE ARM Holdings Archived from the original on 10 September 2013 Retrieved 17 September 2013 George Grey 10 July 2013 big LITTLE Software Update Linaro Archived from the original on 4 October 2013 Retrieved 17 September 2013 Peter Clarke 6 August 2013 Benchmarking ARM s big little architecture Retrieved 17 September 2013 Big LITTLE Processing with ARM Cortex A15 amp Cortex A7 PDF ARM Holdings September 2013 archived from the original PDF on 17 April 2012 retrieved 17 September 2013 Brian Klug 11 September 2013 Samsung Announces big LITTLE MP Support in Exynos 5420 AnandTech Retrieved 16 September 2013 Samsung Unveils New Products from its System LSI Business at Mobile World Congress Samsung Tomorrow Retrieved 26 February 2013 The future is here iPhone X Apple Newsroom Retrieved 25 February 2018 McKenney Paul 12 June 2012 A big LITTLE scheduler update LWN net Perret Quentin 25 February 2019 Energy Aware Scheduling merged in Linux 5 0 community arm com Energy Aware Scheduling The Linux Kernel documentation Humrick Matt 29 May 2017 Exploring Dynamiq and ARM s New CPUs Anandtech Retrieved 10 July 2017 Further reading EditDavid Zinman 25 January 2013 big LITTLE MP status Jan 25 2013 LWN net Retrieved 25 January 2013 Nicolas Pitre 15 February 2012 Linux support for ARM big LITTLE LWN net Retrieved 18 October 2012 Paul McKenney 12 June 2012 A big LITTLE scheduler update LWN net Retrieved 18 October 2012 Jake Edge 5 September 2012 KS2012 ARM A big LITTLE update LWN net Retrieved 18 October 2012 Jon Stokes 20 October 2011 ARM s new Cortex A7 is tailor made for Android superphones Ars Technica Retrieved 31 October 2012 Andrew Cunningham 30 October 2012 ARM goes 64 bit with new Cortex A53 and Cortex A57 designs Ars Technica Retrieved 31 October 2012 External links Editbig LITTLE Processing big LITTLE Processing with ARM CortexTM A15 amp Cortex A7 PDF full technical explanation Retrieved from https en wikipedia org w index php title ARM big LITTLE amp oldid 1112515125, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.