fbpx
Wikipedia

Thunk

In computer programming, a thunk is a subroutine used to inject a calculation into another subroutine. Thunks are primarily used to delay a calculation until its result is needed, or to insert operations at the beginning or end of the other subroutine. They have many other applications in compiler code generation and modular programming.

The term originated as a whimsical irregular form of the verb think. It refers to the original use of thunks in ALGOL 60 compilers, which required special analysis (thought) to determine what type of routine to generate.[1][2]

Background edit

The early years of compiler research saw broad experimentation with different evaluation strategies. A key question was how to compile a subroutine call if the arguments can be arbitrary mathematical expressions rather than constants. One approach, known as "call by value", calculates all of the arguments before the call and then passes the resulting values to the subroutine. In the rival "call by name" approach, the subroutine receives the unevaluated argument expression and must evaluate it.

A simple implementation of "call by name" might substitute the code of an argument expression for each appearance of the corresponding parameter in the subroutine, but this can produce multiple versions of the subroutine and multiple copies of the expression code. As an improvement, the compiler can generate a helper subroutine, called a thunk, that calculates the value of the argument. The address and environment[a] of this helper subroutine are then passed to the original subroutine in place of the original argument, where it can be called as many times as needed. Peter Ingerman first described thunks in reference to the ALGOL 60 programming language, which supports call-by-name evaluation.[4]

Applications edit

Functional programming edit

Although the software industry largely standardized on call-by-value and call-by-reference evaluation,[5] active study of call-by-name continued in the functional programming community. This research produced a series of lazy evaluation programming languages in which some variant of call-by-name is the standard evaluation strategy. Compilers for these languages, such as the Glasgow Haskell Compiler, have relied heavily on thunks, with the added feature that the thunks save their initial result so that they can avoid recalculating it;[6] this is known as memoization or call-by-need.

Functional programming languages have also allowed programmers to explicitly generate thunks. This is done in source code by wrapping an argument expression in an anonymous function that has no parameters of its own. This prevents the expression from being evaluated until a receiving function calls the anonymous function, thereby achieving the same effect as call-by-name.[7] The adoption of anonymous functions into other programming languages has made this capability widely available.

The following is a simple demonstration in JavaScript (ES6):

// 'hypot' is a binary function const hypot = (x, y) => Math.sqrt(x * x + y * y); // 'thunk' is a function that takes no arguments and, when invoked, performs a potentially expensive // operation (computing a square root, in this example) and/or causes some side-effect to occur const thunk = () => hypot(3, 4); // the thunk can then be passed around without being evaluated... doSomethingWithThunk(thunk); // ...or evaluated thunk(); // === 5 

Object-oriented programming edit

Thunks are useful in object-oriented programming platforms that allow a class to inherit multiple interfaces, leading to situations where the same method might be called via any of several interfaces. The following code illustrates such a situation in C++.

class A {  public:  virtual int Access() const { return value_; }  private:  int value_; }; class B {  public:  virtual int Access() const { return value_; }  private:  int value_; }; class C : public A, public B {  public:  int Access() const override { return better_value_; }  private:  int better_value_; }; int use(B *b) { return b->Access(); } int main() {  // ...  B some_b;  use(&some_b);  C some_c;  use(&some_c); } 

In this example, the code generated for each of the classes A, B and C will include a dispatch table that can be used to call Access on an object of that type, via a reference that has the same type. Class C will have an additional dispatch table, used to call Access on an object of type C via a reference of type B. The expression b->Access() will use B's own dispatch table or the additional C table, depending on the type of object b refers to. If it refers to an object of type C, the compiler must ensure that C's Access implementation receives an instance address for the entire C object, rather than the inherited B part of that object.[8]

As a direct approach to this pointer adjustment problem, the compiler can include an integer offset in each dispatch table entry. This offset is the difference between the reference's address and the address required by the method implementation. The code generated for each call through these dispatch tables must then retrieve the offset and use it to adjust the instance address before calling the method.

The solution just described has problems similar to the naïve implementation of call-by-name described earlier: the compiler generates several copies of code to calculate an argument (the instance address), while also increasing the dispatch table sizes to hold the offsets. As an alternative, the compiler can generate an adjustor thunk along with C's implementation of Access that adjusts the instance address by the required amount and then calls the method. The thunk can appear in C's dispatch table for B, thereby eliminating the need for callers to adjust the address themselves.[9]

Numerical calculations requiring evaluations at multiple points edit

Routines for computations such as integration need to calculate an expression at multiple points. Call by name was used for this purpose in languages that didn't support closures or procedure parameters.

Interoperability edit

Thunks have been widely used to provide interoperability between software modules whose routines cannot call each other directly. This may occur because the routines have different calling conventions, run in different CPU modes or address spaces, or at least one runs in a virtual machine. A compiler (or other tool) can solve this problem by generating a thunk that automates the additional steps needed to call the target routine, whether that is transforming arguments, copying them to another location, or switching the CPU mode. A successful thunk minimizes the extra work the caller must do compared to a normal call.

Much of the literature on interoperability thunks relates to various Wintel platforms, including MS-DOS, OS/2,[10] Windows[11][12][13][14] and .NET, and to the transition from 16-bit to 32-bit memory addressing. As customers have migrated from one platform to another, thunks have been essential to support legacy software written for the older platforms.

The transition from 32-bit to 64-bit code on x86 also uses a form of thunking (WoW64). However, because the x86-64 address space is larger than the one available to 32-bit code, the old "generic thunk" mechanism could not be used to call 64-bit code from 32-bit code.[15] The only case of 32-bit code calling 64-bit code is in the WoW64's thunking of Windows APIs to 32-bit.

Overlays and dynamic linking edit

On systems that lack automatic virtual memory hardware, thunks can implement a limited form of virtual memory known as overlays. With overlays, a developer divides a program's code into segments that can be loaded and unloaded independently, and identifies the entry points into each segment. A segment that calls into another segment must do so indirectly via a branch table. When a segment is in memory, its branch table entries jump into the segment. When a segment is unloaded, its entries are replaced with "reload thunks" that can reload it on demand.[16]

Similarly, systems that dynamically link modules of a program together at run-time can use thunks to connect the modules. Each module can call the others through a table of thunks that the linker fills in when it loads the module. This way the modules can interact without prior knowledge of where they are located in memory.[17]

See also edit

Thunk technologies edit

Related concepts edit

Notes edit

  1. ^ A thunk is an early limited type of closure. The environment passed for the thunk is that of the expression, not that of the called routine.[3]

References edit

  1. ^ Eric Raymond rejects "a couple of onomatopoeic myths circulating about the origin of this term" and cites the inventors of the thunk recalling that the term "was coined after they realized (in the wee hours after hours of discussion) that the type of an argument in Algol-60 could be figured out in advance with a little compile-time thought [...] In other words, it had 'already been thought of'; thus it was christened a thunk, which is 'the past tense of "think" at two in the morning'. See: Raymond, Eric S. (1996). Raymond, Eric S. (ed.). The New Hacker's Dictionary. MIT Press. p. 445. ISBN 9780262680929. Retrieved 2015-05-25.
  2. ^ See Ingerman (1961): "The translator knows what kind of thunk to create by considering the formation of the actual parameter and the previously scanned declarations.… [W]hen a procedure declaration is being compiled, the translator, again by observing syntax, knows what kind of address to expect from a thunk."
  3. ^ E. T. Irons (1961-01-01). "Comments on the Implementation of Recursive Procedures and Blocks in ALGOL". Communications of the ACM. Association for Computing Machinery (ACM). 4 (1): 65–69. doi:10.1145/366062.366084. ISSN 0001-0782. S2CID 14646332.
  4. ^ Ingerman, P. Z. (1961-01-01). "Thunks: a way of compiling procedure statements with some comments on procedure declarations". Communications of the ACM. Association for Computing Machinery (ACM). 4 (1): 55–58. doi:10.1145/366062.366084. ISSN 0001-0782. S2CID 14646332.
  5. ^ Scott, Michael (2009). Programming Language Pragmatics. p. 395.
  6. ^ Marlow, Simon (2013). Parallel and Concurrent Programming in Haskell. p. 10.
  7. ^ Queinnec, Christian (2003). Lisp in Small Pieces. p. 176.
  8. ^ Stroustrup, Bjarne (Fall 1989). "Multiple Inheritance for C++" (PDF). Computing Systems. USENIX. 1 (4). Retrieved 2014-08-04.
  9. ^ Driesen, Karel; Hölzle, Urs (1996). "The Direct Cost of Virtual Function Calls in C++" (PDF). Proceedings of the 1996 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages & Applications, OOPSLA 1996, San Jose, California, USA, October 6-10, 1996. 11th OOPSLA 1996: San Jose, California, USA. ACM. ISBN 0-89791-788-X. Archived from the original (PDF) on 2019-12-29. Retrieved 2011-02-24.[dead link]
  10. ^ Calcote, John (May 1995). "Thunking: Using 16-Bit Libraries in OS/2 2.0". OS/2 Developer Magazine. 7 (3): 48–56.
  11. ^ King, Adrian (1994). Inside Microsoft Windows 95 (2nd ed.). Redmond, Washington, USA: Microsoft Press. ISBN 1-55615-626-X.
  12. ^ Programmer's Guide to Microsoft Windows 95: Key Topics on Programming for Windows from the Microsoft Windows Development Team (1st ed.). Redmond, Washington, USA: Microsoft Press. 1995-07-01. ISBN 1-55615-834-3. Retrieved 2016-05-26. {{cite book}}: |work= ignored (help)
  13. ^ Hazzah, Karen (1997). Writing Windows VxDs and Device Drivers - Programming Secrets for Virtual Device Drivers (2nd printing, 2nd ed.). Lawrence, Kansas, USA: R&D Books / Miller Freeman, Inc. ISBN 0-87930-438-3.
  14. ^ Kauler, Barry (August 1997). Windows Assembly Language and Systems Programming - 16- and 32-Bit Low-Level Programming for the PC and Windows (2nd ed.). Lawrence, Kansas, USA: R&D Books / Miller Freeman, Inc. ISBN 0-87930-474-X.
  15. ^ "Why can't you thunk between 32-bit and 64-bit Windows?". The Old New Thing. 2008-10-20.
  16. ^ Bright, Walter (1990-07-01). "Virtual Memory For 640K DOS". Dr. Dobb's Journal. Retrieved 2014-03-06.
  17. ^ Levine, John R. (2000) [October 1999]. Linkers and Loaders. The Morgan Kaufmann Series in Software Engineering and Programming (1 ed.). San Francisco, USA: Morgan Kaufmann. ISBN 1-55860-496-0. OCLC 42413382. Archived from the original on 2012-12-05. Retrieved 2020-01-12. Code: [1][2] Errata: [3]

thunk, other, uses, disambiguation, look, thunk, wiktionary, free, dictionary, computer, programming, thunk, subroutine, used, inject, calculation, into, another, subroutine, primarily, used, delay, calculation, until, result, needed, insert, operations, begin. For other uses see Thunk disambiguation Look up thunk in Wiktionary the free dictionary In computer programming a thunk is a subroutine used to inject a calculation into another subroutine Thunks are primarily used to delay a calculation until its result is needed or to insert operations at the beginning or end of the other subroutine They have many other applications in compiler code generation and modular programming The term originated as a whimsical irregular form of the verb think It refers to the original use of thunks in ALGOL 60 compilers which required special analysis thought to determine what type of routine to generate 1 2 Contents 1 Background 2 Applications 2 1 Functional programming 2 2 Object oriented programming 2 3 Numerical calculations requiring evaluations at multiple points 2 4 Interoperability 2 5 Overlays and dynamic linking 3 See also 3 1 Thunk technologies 3 2 Related concepts 4 Notes 5 ReferencesBackground editThe early years of compiler research saw broad experimentation with different evaluation strategies A key question was how to compile a subroutine call if the arguments can be arbitrary mathematical expressions rather than constants One approach known as call by value calculates all of the arguments before the call and then passes the resulting values to the subroutine In the rival call by name approach the subroutine receives the unevaluated argument expression and must evaluate it A simple implementation of call by name might substitute the code of an argument expression for each appearance of the corresponding parameter in the subroutine but this can produce multiple versions of the subroutine and multiple copies of the expression code As an improvement the compiler can generate a helper subroutine called a thunk that calculates the value of the argument The address and environment a of this helper subroutine are then passed to the original subroutine in place of the original argument where it can be called as many times as needed Peter Ingerman first described thunks in reference to the ALGOL 60 programming language which supports call by name evaluation 4 Applications editFunctional programming edit Although the software industry largely standardized on call by value and call by reference evaluation 5 active study of call by name continued in the functional programming community This research produced a series of lazy evaluation programming languages in which some variant of call by name is the standard evaluation strategy Compilers for these languages such as the Glasgow Haskell Compiler have relied heavily on thunks with the added feature that the thunks save their initial result so that they can avoid recalculating it 6 this is known as memoization or call by need Functional programming languages have also allowed programmers to explicitly generate thunks This is done in source code by wrapping an argument expression in an anonymous function that has no parameters of its own This prevents the expression from being evaluated until a receiving function calls the anonymous function thereby achieving the same effect as call by name 7 The adoption of anonymous functions into other programming languages has made this capability widely available The following is a simple demonstration in JavaScript ES6 hypot is a binary function const hypot x y gt Math sqrt x x y y thunk is a function that takes no arguments and when invoked performs a potentially expensive operation computing a square root in this example and or causes some side effect to occur const thunk gt hypot 3 4 the thunk can then be passed around without being evaluated doSomethingWithThunk thunk or evaluated thunk 5 Object oriented programming edit Thunks are useful in object oriented programming platforms that allow a class to inherit multiple interfaces leading to situations where the same method might be called via any of several interfaces The following code illustrates such a situation in C class A public virtual int Access const return value private int value class B public virtual int Access const return value private int value class C public A public B public int Access const override return better value private int better value int use B b return b gt Access int main B some b use amp some b C some c use amp some c In this example the code generated for each of the classes A B and C will include a dispatch table that can be used to call Access on an object of that type via a reference that has the same type Class C will have an additional dispatch table used to call Access on an object of type C via a reference of type B The expression b gt Access will use B s own dispatch table or the additional C table depending on the type of object b refers to If it refers to an object of type C the compiler must ensure that C s Access implementation receives an instance address for the entire C object rather than the inherited B part of that object 8 As a direct approach to this pointer adjustment problem the compiler can include an integer offset in each dispatch table entry This offset is the difference between the reference s address and the address required by the method implementation The code generated for each call through these dispatch tables must then retrieve the offset and use it to adjust the instance address before calling the method The solution just described has problems similar to the naive implementation of call by name described earlier the compiler generates several copies of code to calculate an argument the instance address while also increasing the dispatch table sizes to hold the offsets As an alternative the compiler can generate an adjustor thunk along with C s implementation of Access that adjusts the instance address by the required amount and then calls the method The thunk can appear in C s dispatch table for B thereby eliminating the need for callers to adjust the address themselves 9 Numerical calculations requiring evaluations at multiple points edit Routines for computations such as integration need to calculate an expression at multiple points Call by name was used for this purpose in languages that didn t support closures or procedure parameters Interoperability edit Thunks have been widely used to provide interoperability between software modules whose routines cannot call each other directly This may occur because the routines have different calling conventions run in different CPU modes or address spaces or at least one runs in a virtual machine A compiler or other tool can solve this problem by generating a thunk that automates the additional steps needed to call the target routine whether that is transforming arguments copying them to another location or switching the CPU mode A successful thunk minimizes the extra work the caller must do compared to a normal call Much of the literature on interoperability thunks relates to various Wintel platforms including MS DOS OS 2 10 Windows 11 12 13 14 and NET and to the transition from 16 bit to 32 bit memory addressing As customers have migrated from one platform to another thunks have been essential to support legacy software written for the older platforms The transition from 32 bit to 64 bit code on x86 also uses a form of thunking WoW64 However because the x86 64 address space is larger than the one available to 32 bit code the old generic thunk mechanism could not be used to call 64 bit code from 32 bit code 15 The only case of 32 bit code calling 64 bit code is in the WoW64 s thunking of Windows APIs to 32 bit Overlays and dynamic linking edit On systems that lack automatic virtual memory hardware thunks can implement a limited form of virtual memory known as overlays With overlays a developer divides a program s code into segments that can be loaded and unloaded independently and identifies the entry points into each segment A segment that calls into another segment must do so indirectly via a branch table When a segment is in memory its branch table entries jump into the segment When a segment is unloaded its entries are replaced with reload thunks that can reload it on demand 16 Similarly systems that dynamically link modules of a program together at run time can use thunks to connect the modules Each module can call the others through a table of thunks that the linker fills in when it loads the module This way the modules can interact without prior knowledge of where they are located in memory 17 See also editThunk technologies edit DOS Protected Mode Interface DPMI DOS Protected Mode Services DPMS J Direct Microsoft Layer for Unicode Platform Invocation Services Win32s Windows on Windows WoW64 libffiRelated concepts edit Anonymous function Futures and promises Remote procedure call Shim computing Trampoline computing Reducible expressionNotes edit A thunk is an early limited type of closure The environment passed for the thunk is that of the expression not that of the called routine 3 References edit Eric Raymond rejects a couple of onomatopoeic myths circulating about the origin of this term and cites the inventors of the thunk recalling that the term was coined after they realized in the wee hours after hours of discussion that the type of an argument in Algol 60 could be figured out in advance with a little compile time thought In other words it had already been thought of thus it was christened a thunk which is the past tense of think at two in the morning See Raymond Eric S 1996 Raymond Eric S ed The New Hacker s Dictionary MIT Press p 445 ISBN 9780262680929 Retrieved 2015 05 25 See Ingerman 1961 The translator knows what kind of thunk to create by considering the formation of the actual parameter and the previously scanned declarations W hen a procedure declaration is being compiled the translator again by observing syntax knows what kind of address to expect from a thunk E T Irons 1961 01 01 Comments on the Implementation of Recursive Procedures and Blocks in ALGOL Communications of the ACM Association for Computing Machinery ACM 4 1 65 69 doi 10 1145 366062 366084 ISSN 0001 0782 S2CID 14646332 Ingerman P Z 1961 01 01 Thunks a way of compiling procedure statements with some comments on procedure declarations Communications of the ACM Association for Computing Machinery ACM 4 1 55 58 doi 10 1145 366062 366084 ISSN 0001 0782 S2CID 14646332 Scott Michael 2009 Programming Language Pragmatics p 395 Marlow Simon 2013 Parallel and Concurrent Programming in Haskell p 10 Queinnec Christian 2003 Lisp in Small Pieces p 176 Stroustrup Bjarne Fall 1989 Multiple Inheritance for C PDF Computing Systems USENIX 1 4 Retrieved 2014 08 04 Driesen Karel Holzle Urs 1996 The Direct Cost of Virtual Function Calls in C PDF Proceedings of the 1996 ACM SIGPLAN Conference on Object Oriented Programming Systems Languages amp Applications OOPSLA 1996 San Jose California USA October 6 10 1996 11th OOPSLA 1996 San Jose California USA ACM ISBN 0 89791 788 X Archived from the original PDF on 2019 12 29 Retrieved 2011 02 24 dead link Calcote John May 1995 Thunking Using 16 Bit Libraries in OS 2 2 0 OS 2 Developer Magazine 7 3 48 56 King Adrian 1994 Inside Microsoft Windows 95 2nd ed Redmond Washington USA Microsoft Press ISBN 1 55615 626 X Programmer s Guide to Microsoft Windows 95 Key Topics on Programming for Windows from the Microsoft Windows Development Team 1st ed Redmond Washington USA Microsoft Press 1995 07 01 ISBN 1 55615 834 3 Retrieved 2016 05 26 a href Template Cite book html title Template Cite book cite book a work ignored help Hazzah Karen 1997 Writing Windows VxDs and Device Drivers Programming Secrets for Virtual Device Drivers 2nd printing 2nd ed Lawrence Kansas USA R amp D Books Miller Freeman Inc ISBN 0 87930 438 3 Kauler Barry August 1997 Windows Assembly Language and Systems Programming 16 and 32 Bit Low Level Programming for the PC and Windows 2nd ed Lawrence Kansas USA R amp D Books Miller Freeman Inc ISBN 0 87930 474 X Why can t you thunk between 32 bit and 64 bit Windows The Old New Thing 2008 10 20 Bright Walter 1990 07 01 Virtual Memory For 640K DOS Dr Dobb s Journal Retrieved 2014 03 06 Levine John R 2000 October 1999 Linkers and Loaders The Morgan Kaufmann Series in Software Engineering and Programming 1 ed San Francisco USA Morgan Kaufmann ISBN 1 55860 496 0 OCLC 42413382 Archived from the original on 2012 12 05 Retrieved 2020 01 12 Code 1 2 Errata 3 Retrieved from https en wikipedia org w index php title Thunk amp oldid 1209826279, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.