fbpx
Wikipedia

Algebraic data type

In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite type, i.e., a type formed by combining other types.

Two common classes of algebraic types are product types (i.e., tuples and records) and sum types (i.e., tagged or disjoint unions, coproduct types or variant types).[1]

The values of a product type typically contain several values, called fields. All values of that type have the same combination of field types. The set of all possible values of a product type is the set-theoretic product, i.e., the Cartesian product, of the sets of all possible values of its field types.

The values of a sum type are typically grouped into several classes, called variants. A value of a variant type is usually created with a quasi-functional entity called a constructor. Each variant has its own constructor, which takes a specified number of arguments with specified types. The set of all possible values of a sum type is the set-theoretic sum, i.e., the disjoint union, of the sets of all possible values of its variants. Enumerated types are a special case of sum types in which the constructors take no arguments, as exactly one value is defined for each constructor.

Values of algebraic types are analyzed with pattern matching, which identifies a value by its constructor or field names and extracts the data it contains.

History edit

Algebraic data types were introduced in Hope, a small functional programming language developed in the 1970s at the University of Edinburgh.[2]

Examples edit

Singly Linked List edit

One of the most common examples of an algebraic data type is the singly linked list. A list type is a sum type with two variants, Nil for an empty list and Cons x xs for the combination of a new element x with a list xs to create a new list. Here is an example of how a singly linked list would be declared in Haskell:

data List a = Nil | Cons a (List a) 

or

data [] a = [] | a : [a] 

Cons is an abbreviation of construct. Many languages have special syntax for lists defined in this way. For example, Haskell and ML use [] for Nil, : or :: for Cons, respectively, and square brackets for entire lists. So Cons 1 (Cons 2 (Cons 3 Nil)) would normally be written as 1:2:3:[] or [1,2,3] in Haskell, or as 1::2::3::[] or [1,2,3] in ML.

Binary Tree edit

For a slightly more complex example, binary trees may be implemented in Haskell as follows:

data Tree = Empty  | Leaf Int  | Node Int Tree Tree 

or

data BinaryTree a = BTNil   | BTNode a (BinaryTree a) (BinaryTree a) 

Here, Empty represents an empty tree, Leaf represents a leaf node, and Node organizes the data into branches.

In most languages that support algebraic data types, it is possible to define parametric types. Examples are given later in this article.

Somewhat similar to a function, a data constructor is applied to arguments of an appropriate type, yielding an instance of the data type to which the type constructor belongs. For example, the data constructor Leaf is logically a function Int -> Tree, meaning that giving an integer as an argument to Leaf produces a value of the type Tree. As Node takes two arguments of the type Tree itself, the datatype is recursive.

Operations on algebraic data types can be defined by using pattern matching to retrieve the arguments. For example, consider a function to find the depth of a Tree, given here in Haskell:

 depth :: Tree -> Int  depth Empty = 0  depth (Leaf n) = 1  depth (Node n l r) = 1 + max (depth l) (depth r) 

Thus, a Tree given to depth can be constructed using any of Empty, Leaf, or Node and must be matched for any of them respectively to deal with all cases. In case of Node, the pattern extracts the subtrees l and r for further processing.

Abstract Syntax edit

Algebraic data types are highly suited to implementing abstract syntax. For example, the following algebraic data type describes a simple language representing numerical expressions:

data Expression = Number Int   | Add Expression Expression   | Minus Expression Expression   | Mult Expression Expression   | Divide Expression Expression 

An element of such a data type would have a form such as Mult (Add (Number 4) (Minus (Number 0) (Number 1))) (Number 2).

Writing an evaluation function for this language is a simple exercise; however, more complex transformations also become feasible. For example, an optimization pass in a compiler might be written as a function taking an abstract expression as input and returning an optimized form.

Pattern matching edit

Algebraic data types are used to represent values that can be one of several types of things. Each type of thing is associated with an identifier called a constructor, which can be considered a tag for that kind of data. Each constructor can carry with it a different type of data.

For example, considering the binary Tree example shown above, a constructor could carry no data (e.g., Empty), or one piece of data (e.g., Leaf has one Int value), or multiple pieces of data (e.g., Node has two Tree values).

To do something with a value of this Tree algebraic data type, it is deconstructed using a process called pattern matching. This involves matching the data with a series of patterns. The example function depth above pattern-matches its argument with three patterns. When the function is called, it finds the first pattern that matches its argument, performs any variable bindings that are found in the pattern, and evaluates the expression corresponding to the pattern.

Each pattern above has a form that resembles the structure of some possible value of this datatype. The first pattern simply matches values of the constructor Empty. The second pattern matches values of the constructor Leaf. Patterns are recursive, so then the data that is associated with that constructor is matched with the pattern "n". In this case, a lowercase identifier represents a pattern that matches any value, which then is bound to a variable of that name — in this case, a variable “n” is bound to the integer value stored in the data type — to be used in the expression to evaluate.

The recursion in patterns in this example are trivial, but a possible more complex recursive pattern would be something like:

Node (Node (Leaf 4) x) (Node y (Node Empty z))

Recursive patterns several layers deep are used for example in balancing red–black trees, which involve cases that require looking at colors several layers deep.

The example above is operationally equivalent to the following pseudocode:

 switch on (data.constructor) case Empty: return 0 case Leaf: let n = data.field1 return 1 case Node: let l = data.field1 let r = data.field2 return 1 + max (depth l) (depth r) 

The advantages of algebraic data types can be highlighted by comparison of the above pseudocode with a pattern matching equivalent.

Firstly, there is type safety. In the pseudocode example above, programmer diligence is required to not access field2 when the constructor is a Leaf. Also, the type of field1 is different for Leaf and Node. For Leaf it is Int but for Node it is Tree. The type system would have difficulties assigning a static type in a safe way for traditional record data structures. However, in pattern matching such problems are not faced. The type of each extracted value is based on the types declared by the relevant constructor. The number of values that can be extracted is known based on the constructor.

Secondly, in pattern matching, the compiler performs exhaustiveness checking to ensure all cases are handled. If one of the cases of the depth function above were missing, the compiler would issue a warning. Exhaustiveness checking may seem easy for simple patterns, but with many complex recursive patterns, the task soon becomes difficult for the average human (or compiler, if it must check arbitrary nested if-else constructs). Similarly, there may be patterns which never match (i.e., are already covered by prior patterns). The compiler can also check and issue warnings for these, as they may indicate an error in reasoning.

Algebraic data type pattern matching should not be confused with regular expression string pattern matching. The purpose of both is similar (to extract parts from a piece of data matching certain constraints) however, the implementation is very different. Pattern matching on algebraic data types matches on the structural properties of an object rather than on the character sequence of strings.

Theory edit

A general algebraic data type is a possibly recursive sum type of product types. Each constructor tags a product type to separate it from others, or if there is only one constructor, the data type is a product type. Further, the parameter types of a constructor are the factors of the product type. A parameterless constructor corresponds to the empty product. If a datatype is recursive, the entire sum of products is wrapped in a recursive type, and each constructor also rolls the datatype into the recursive type.

For example, the Haskell datatype:

 data List a = Nil | Cons a (List a) 

is represented in type theory as   with constructors   and  .

The Haskell List datatype can also be represented in type theory in a slightly different form, thus:  . (Note how the   and   constructs are reversed relative to the original.) The original formation specified a type function whose body was a recursive type. The revised version specifies a recursive function on types. (The type variable   is used to suggest a function rather than a base type like  , since   is like a Greek f.) The function must also now be applied   to its argument type   in the body of the type.

For the purposes of the List example, these two formulations are not significantly different; but the second form allows expressing so-called nested data types, i.e., those where the recursive type differs parametrically from the original. (For more information on nested data types, see the works of Richard Bird, Lambert Meertens, and Ross Paterson.)

In set theory the equivalent of a sum type is a disjoint union, a set whose elements are pairs consisting of a tag (equivalent to a constructor) and an object of a type corresponding to the tag (equivalent to the constructor arguments).

Programming languages with algebraic data types edit

Many programming languages incorporate algebraic data types as a first class notion, including:

See also edit

References edit

  1. ^ Records and variants- OCaml manual section 1.4 2020-04-28 at the Wayback Machine
  2. ^ Paul Hudak; John Hughes; Simon Peyton Jones; Philip Wadler. "A history of Haskell: being lazy with class". Proceedings of the third ACM SIGPLAN conference on History of programming languages. Presentations included Rod Burstall, Dave MacQueen, and Don Sannella on Hope, the language that introduced algebraic data types
  3. ^ Calculus of Inductive Constructions, and basic standard libraries : Datatypes and Logic.
  4. ^ "CppCon 2016: Ben Deane "Using Types Effectively"". Archived from the original on 2021-12-12 – via www.youtube.com.
  5. ^ "Sealed class modifier". Dart.
  6. ^ "Algebraic Data Types in Haskell". Serokell.
  7. ^ "Enum Instance". Haxe - The Cross-platform Toolkit.
  8. ^ "JEP 395: Records". OpenJDK.
  9. ^ "JEP 409: Sealed Classes". OpenJDK.
  10. ^ "Sealed Classes - Kotlin Programming Language". Kotlin.
  11. ^ "Reason · Reason lets you write simple, fast and quality type safe code while leveraging both the JavaScript & OCaml ecosystems". reasonml.github.io.
  12. ^ "Enums and Pattern Matching - The Rust Programming Language". doc.rust-lang.org. Retrieved 31 August 2021.

algebraic, data, type, confused, with, abstract, data, type, this, article, multiple, issues, please, help, improve, discuss, these, issues, talk, page, learn, when, remove, these, template, messages, this, article, needs, additional, citations, verification, . Not to be confused with Abstract data type This article has multiple issues Please help improve it or discuss these issues on the talk page Learn how and when to remove these template messages This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources Algebraic data type news newspapers books scholar JSTOR October 2015 Learn how and when to remove this message This article may be too technical for most readers to understand Please help improve it to make it understandable to non experts without removing the technical details September 2017 Learn how and when to remove this message Learn how and when to remove this message In computer programming especially functional programming and type theory an algebraic data type ADT is a kind of composite type i e a type formed by combining other types Two common classes of algebraic types are product types i e tuples and records and sum types i e tagged or disjoint unions coproduct types or variant types 1 The values of a product type typically contain several values called fields All values of that type have the same combination of field types The set of all possible values of a product type is the set theoretic product i e the Cartesian product of the sets of all possible values of its field types The values of a sum type are typically grouped into several classes called variants A value of a variant type is usually created with a quasi functional entity called a constructor Each variant has its own constructor which takes a specified number of arguments with specified types The set of all possible values of a sum type is the set theoretic sum i e the disjoint union of the sets of all possible values of its variants Enumerated types are a special case of sum types in which the constructors take no arguments as exactly one value is defined for each constructor Values of algebraic types are analyzed with pattern matching which identifies a value by its constructor or field names and extracts the data it contains Contents 1 History 2 Examples 2 1 Singly Linked List 2 2 Binary Tree 2 3 Abstract Syntax 3 Pattern matching 4 Theory 5 Programming languages with algebraic data types 6 See also 7 ReferencesHistory editAlgebraic data types were introduced in Hope a small functional programming language developed in the 1970s at the University of Edinburgh 2 Examples editSingly Linked List edit One of the most common examples of an algebraic data type is the singly linked list A list type is a sum type with two variants Nil for an empty list and a href Cons html title Cons Cons a i x i i xs i for the combination of a new element x with a list xs to create a new list Here is an example of how a singly linked list would be declared in Haskell data List a Nil Cons a List a or data a a a Cons is an abbreviation of construct Many languages have special syntax for lists defined in this way For example Haskell and ML use for Nil or for Cons respectively and square brackets for entire lists So Cons 1 Cons 2 Cons 3 Nil would normally be written as 1 2 3 or 1 2 3 in Haskell or as 1 2 3 or 1 2 3 in ML Binary Tree edit For a slightly more complex example binary trees may be implemented in Haskell as follows data Tree Empty Leaf Int Node Int Tree Tree or data BinaryTree a BTNil BTNode a BinaryTree a BinaryTree a Here Empty represents an empty tree Leaf represents a leaf node and Node organizes the data into branches In most languages that support algebraic data types it is possible to define parametric types Examples are given later in this article Somewhat similar to a function a data constructor is applied to arguments of an appropriate type yielding an instance of the data type to which the type constructor belongs For example the data constructor Leaf is logically a function Int gt Tree meaning that giving an integer as an argument to Leaf produces a value of the type Tree As Node takes two arguments of the type Tree itself the datatype is recursive Operations on algebraic data types can be defined by using pattern matching to retrieve the arguments For example consider a function to find the depth of a Tree given here in Haskell depth Tree gt Int depth Empty 0 depth Leaf n 1 depth Node n l r 1 max depth l depth r Thus a Tree given to depth can be constructed using any of Empty Leaf or Node and must be matched for any of them respectively to deal with all cases In case of Node the pattern extracts the subtrees l and r for further processing Abstract Syntax edit Algebraic data types are highly suited to implementing abstract syntax For example the following algebraic data type describes a simple language representing numerical expressions data Expression Number Int Add Expression Expression Minus Expression Expression Mult Expression Expression Divide Expression Expression An element of such a data type would have a form such as Mult Add Number 4 Minus Number 0 Number 1 Number 2 Writing an evaluation function for this language is a simple exercise however more complex transformations also become feasible For example an optimization pass in a compiler might be written as a function taking an abstract expression as input and returning an optimized form Pattern matching editMain article Pattern Matching Algebraic data types are used to represent values that can be one of several types of things Each type of thing is associated with an identifier called a constructor which can be considered a tag for that kind of data Each constructor can carry with it a different type of data For example considering the binary Tree example shown above a constructor could carry no data e g Empty or one piece of data e g Leaf has one Int value or multiple pieces of data e g Node has two Tree values To do something with a value of this Tree algebraic data type it is deconstructed using a process called pattern matching This involves matching the data with a series of patterns The example function depth above pattern matches its argument with three patterns When the function is called it finds the first pattern that matches its argument performs any variable bindings that are found in the pattern and evaluates the expression corresponding to the pattern Each pattern above has a form that resembles the structure of some possible value of this datatype The first pattern simply matches values of the constructor Empty The second pattern matches values of the constructor Leaf Patterns are recursive so then the data that is associated with that constructor is matched with the pattern n In this case a lowercase identifier represents a pattern that matches any value which then is bound to a variable of that name in this case a variable n is bound to the integer value stored in the data type to be used in the expression to evaluate The recursion in patterns in this example are trivial but a possible more complex recursive pattern would be something like code class mw highlight mw highlight lang lisp mw content ltr dir ltr span class nv Node span span class w span span class p span span class nv Node span span class w span span class p span span class nv Leaf span span class w span span class mi 4 span span class p span span class w span span class nv x span span class p span span class w span span class p span span class nv Node span span class w span span class nv y span span class w span span class p span span class nv Node span span class w span span class nv Empty span span class w span span class nv z span span class p span code Recursive patterns several layers deep are used for example in balancing red black trees which involve cases that require looking at colors several layers deep The example above is operationally equivalent to the following pseudocode switch on data constructor case Empty return 0 case Leaf let n data field1 return 1 case Node let l data field1 let r data field2 return 1 max depth l depth r The advantages of algebraic data types can be highlighted by comparison of the above pseudocode with a pattern matching equivalent Firstly there is type safety In the pseudocode example above programmer diligence is required to not access field2 when the constructor is a Leaf Also the type of field1 is different for Leaf and Node For Leaf it is Int but for Node it is Tree The type system would have difficulties assigning a static type in a safe way for traditional record data structures However in pattern matching such problems are not faced The type of each extracted value is based on the types declared by the relevant constructor The number of values that can be extracted is known based on the constructor Secondly in pattern matching the compiler performs exhaustiveness checking to ensure all cases are handled If one of the cases of the depth function above were missing the compiler would issue a warning Exhaustiveness checking may seem easy for simple patterns but with many complex recursive patterns the task soon becomes difficult for the average human or compiler if it must check arbitrary nested if else constructs Similarly there may be patterns which never match i e are already covered by prior patterns The compiler can also check and issue warnings for these as they may indicate an error in reasoning Algebraic data type pattern matching should not be confused with regular expression string pattern matching The purpose of both is similar to extract parts from a piece of data matching certain constraints however the implementation is very different Pattern matching on algebraic data types matches on the structural properties of an object rather than on the character sequence of strings Theory editMain article Recursive data type A general algebraic data type is a possibly recursive sum type of product types Each constructor tags a product type to separate it from others or if there is only one constructor the data type is a product type Further the parameter types of a constructor are the factors of the product type A parameterless constructor corresponds to the empty product If a datatype is recursive the entire sum of products is wrapped in a recursive type and each constructor also rolls the datatype into the recursive type For example the Haskell datatype data List a Nil Cons a List a is represented in type theory as l a m b 1 a b displaystyle lambda alpha mu beta 1 alpha times beta nbsp with constructors n i l a r o l l i n l displaystyle mathrm nil alpha mathrm roll mathrm inl langle rangle nbsp and c o n s a x l r o l l i n r x l displaystyle mathrm cons alpha x l mathrm roll mathrm inr langle x l rangle nbsp The Haskell List datatype can also be represented in type theory in a slightly different form thus m ϕ l a 1 a ϕ a displaystyle mu phi lambda alpha 1 alpha times phi alpha nbsp Note how the m displaystyle mu nbsp and l displaystyle lambda nbsp constructs are reversed relative to the original The original formation specified a type function whose body was a recursive type The revised version specifies a recursive function on types The type variable ϕ displaystyle phi nbsp is used to suggest a function rather than a base type like b displaystyle beta nbsp since ϕ displaystyle phi nbsp is like a Greek f The function must also now be applied ϕ displaystyle phi nbsp to its argument type a displaystyle alpha nbsp in the body of the type For the purposes of the List example these two formulations are not significantly different but the second form allows expressing so called nested data types i e those where the recursive type differs parametrically from the original For more information on nested data types see the works of Richard Bird Lambert Meertens and Ross Paterson In set theory the equivalent of a sum type is a disjoint union a set whose elements are pairs consisting of a tag equivalent to a constructor and an object of a type corresponding to the tag equivalent to the constructor arguments Programming languages with algebraic data types editMain article Comparison of programming languages algebraic data type Many programming languages incorporate algebraic data types as a first class notion including Ceylon Clean Coq 3 C 4 Elm Dart 5 Flow F F Free Pascal Haskell 6 Haxe 7 Hope Idris Java 16 for product types 8 17 for sum types 9 Kotlin 10 Limbo Language Of Temporal Ordering Specification LOTOS Mercury Miranda Nemerle Nim OCaml Opa OpenCog Perl PureScript Python Racket Reason 11 Rust 12 Scala Standard ML Swift Tom TypeScript Visual PrologSee also editDisjoint union Generalized algebraic data type Initial algebra Quotient type Tagged union Type theory Visitor patternReferences edit Records and variants OCaml manual section 1 4 Archived 2020 04 28 at the Wayback Machine Paul Hudak John Hughes Simon Peyton Jones Philip Wadler A history of Haskell being lazy with class Proceedings of the third ACM SIGPLAN conference on History of programming languages Presentations included Rod Burstall Dave MacQueen and Don Sannella on Hope the language that introduced algebraic data types Calculus of Inductive Constructions and basic standard libraries Datatypes and Logic CppCon 2016 Ben Deane Using Types Effectively Archived from the original on 2021 12 12 via www youtube com Sealed class modifier Dart Algebraic Data Types in Haskell Serokell Enum Instance Haxe The Cross platform Toolkit JEP 395 Records OpenJDK JEP 409 Sealed Classes OpenJDK Sealed Classes Kotlin Programming Language Kotlin Reason Reason lets you write simple fast and quality type safe code while leveraging both the JavaScript amp OCaml ecosystems reasonml github io Enums and Pattern Matching The Rust Programming Language doc rust lang org Retrieved 31 August 2021 Retrieved from https en wikipedia org w index php title Algebraic data type amp oldid 1204639468, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.