Mathematical Models of Structured Data Types
A key feature of any programming language is its type system, the system describing how the values of the language are organized into a system of types.
Programming languages generally support both primitive types, which are predefined in the language and structured data types, which allow programmers to build new type structures in terms of existing types.
- Primitive types: integers, floating point numbers, characters.
- Structured data types:
- cartesian products (tuples)
- discriminated unions
- sequences,
- function types
- recursive types
- sets
Most of these data structuring techniques can be studied in terms of mathematical models.
- The mathematical models describe the basic concepts underlying the data structuring mechanism, independent of the details of its syntax in any given language. By understanding the models, we can easily adapt to new programming languages.
- The models also give us an abstract characterization of data structures against which we can measure the restrictions imposed by a particular language, e.g., for efficiency reasons.
- For each type of mathematical model, there are generally four issues related to how data structures may be specified and programmed in a particular programming language:
- Definition Mechanisms are needed for defining the structure associated with a data type.
- Access Mechanisms are needed for examining and analyzing instances of a data type.
- Construction Mechanisms are needed for building new instances of a data type.
- Restrictions may place the limitations on how the data structuring mechanism is used.
Cartesian Products (Records)
- Mathematical model:
- compositions of a fixed number of objects from possibly different domains. Given subdomains \(D_1, D_2, \ldots, D_n\), the cartesian product domain is the domain of all \(n\)-tuples \((a_1, a_2, \ldots, a_n)\) where each \(a_i\) is a member of \(D_i\).
Denoted \(D_1 \times D_2 \times \ldots \times D_n\).
- Haskell Tuple Types
- Modula-3 records
- C structs
Discriminated Unions (Unions, Variant Records)
- Mathematical model:
- objects come from a union of possible subdomains.
Given subdomains \(D_1, D_2, \ldots, D_n\), the discriminated union domain \(U\) is denoted \(U = D_1 + D_2 + \ldots + D_n\). Any object \(d\) in \(U\) belongs to one of the subdomains \(D_i\).
- Definition:
- e.g., Algol 68 united modes.
mode ib = union(int, bool) ib x
defines type ib
as a union type and variable x
as a variable of this type.
Pascal's variant record mechanism provides a combination of discriminated union and cartesian product.
- Access:
- a method of dicriminating the subtype is needed.
e.g., the Algol 68 case structure (cf. Modula-3TYPECASE
).
case x in (int i): process_integer(i) (bool b): process_boolean(b) esac
- Construction:
- values are constructed using operations of the appropriate subdomain; the value is usually injected into the union domain automatically.
x := 5; x := true
- Algebraic Data Types in Haskell, ML and other functional languages.
Sequences (Strings, Lists)
- Mathematical model:
- Ordered collections of any number of elements of a base domain \(C\).
Denoted \(\lbrace\rbrace + C + C × C + C × C × C + \ldots\)
- Definition:
- e.g., Turbo Pascal character strings.
VAR s1, s2 : STRING[40];
declares s1
and s2
to be sequences of up to 40 characters.
- Construction:
- String values may be constructed using literal notation:
s1 := '383';
- The key construction operation that distinguishes a sequence data type from an array data type is that of concatenation.
s2 := s1 + ' is easy.';
- Access:
- Index notation for individual elements.
- Subsequence selection (Ada)
e.g.,s2(4..7)
=' is '
Recursive Types
- Mathematical model:
- Example: domain B of binary trees of integers. B= {} + Int × B × B. A binary tree is either the empty tree (a null value) or a non-empty tree with an integer value and left and right subtrees. Must always have a union of a recursive substructure and at least one nonrecursive subdomain.
- Definition:
- Indirect recursion with pointers in imperative languages: (e.g., Modula-3)
TYPE IntTree = REF RECORD value : INTEGER; left, right : IntTree END;
- Direct recursion in functional languages (e.g., Miranda):
IntTree :: EmptyTree | NonEmptyTree Int IntTree IntTree
- Access:
- Discrimination between the nonrecursive and recursive subdomains is necessary.
Using pointers: the null pointerNIL
, usually used to represent the nonrecursive case.
IF theTree = NIL THEN (* Handle the empty tree case. *) ELSE (* Handle the nonempty case. *)
Accessing the components of the recursive subdomain uses cartesian product facilities (e.g., record access aTree.left
).
- Construction:
- Using pointers: space for a tree node must be dynamically allocated and filled in. Third generation languages: sequential allocation and construction, e.g., Pascal:
NEW(theTree); WITH theTree^ DO BEGIN value := 6; left := subtree1; right := subtree2 END
Simultaneous allocation and construction (Modula-3):
theTree := NEW(IntTree, value := 6, left := subtree1, right := subtree2)
Parallel construction and implicit allocation (Miranda):
theTree = NonEmptyTree 6 subtree1 subtree2
- Recursive types in Haskell and other functional languages are naturally implemented using Algebraic Data Types.
Finite Mappings (Arrays)
- Mathematical model:
- Mappings (functions) specifying objects in some range domain R for each value in some finite index domain I. Denoted \(I \rightarrow R\).
- Definition:
- e.g., Modula-3 arrays.
VAR x : ARRAY [1..3, 'a'..'c'] OF REAL;
or, equivalently,
VAR x : ARRAY [1..3] OF ARRAY ['a'..'c'] OF REAL;
defines a mapping from the domain {1,2,3} to the domain of mappings from {'a'
, 'b'
, 'c'
} to reals.
- Access:
- index notation.
- an individual element:
x[i,ch]
- a row:
x[i]
- a slice:
Y[12..19]
an 8-element subarray ofY
(Ada)
- Sequential construction:
- index notation in assignment.
FOR i := 1 TO 3 DO FOR ch := 'a' TO 'c' DO x[i,ch] := ...
- Simultaneous construction:
- (Modula-3)
x[1] := ARRAY ['a'..'c'] OF REAL{1.2, 1.7, 0.3}
General Function Types
- Mathematical model:
- Mappings (functions) specifying values in some range domain R for each value in some (arbitrary) input domain I. Denoted \(I \rightarrow R\).
Fully general function types are generally only supported in pure functional languages such as Haskell.
Functional languages have many elegant ways of constructing new functions in terms of other functions and values.
- Operator sections.
- lambda expressions for anonymous functions
- functionals
Powersets (Set Types)
- Mathematical model:
- Objects are sets of elements of a given base type.
The domain is the set of all sets of the base type, called the powerset of the base type. - Definition:
- e.g., Modula-2.
VAR digits, alphabet, alphanumerics : SET OF CHAR;
N.B. SET
types are very restricted in Pascal/Modula; the only allowable base types are simple discrete types (enumerations and subranges).
- Construction:
- set notation:
digits := ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']; alphabet := ['A'..'Z', 'a'..'z'];
- set union
alphanumerics := digits + alphanumerics
- set intersection:
*
operator in Modula, Pascal
- Access:
- set membership.
CharRead IN digits
- subset test.
OptionSet < AllowableOptions
- Restrictions
- SETs in Pascal/Modula languages are restricted to small discrete types that conveniently be implemented using a bitset representation.