Mathematical Models of Structured Data Types

A key feature of any programming language is its type system, the system describing how the values of the language are organized into a system of types.

Programming languages generally support both primitive types, which are predefined in the language and structured data types, which allow programmers to build new type structures in terms of existing types.

Primitive types: integers, floating point numbers, characters.
Structured data types:
- cartesian products (tuples)
- discriminated unions
- sequences,
- function types
- recursive types
- sets

Most of these data structuring techniques can be studied in terms of mathematical models.

The mathematical models describe the basic concepts underlying the data structuring mechanism, independent of the details of its syntax in any given language. By understanding the models, we can easily adapt to new programming languages.
The models also give us an abstract characterization of data structures against which we can measure the restrictions imposed by a particular language, e.g., for efficiency reasons.
For each type of mathematical model, there are generally four issues related to how data structures may be specified and programmed in a particular programming language:

Definition Mechanisms are needed for defining the structure associated with a data type.
Access Mechanisms are needed for examining and analyzing instances of a data type.
Construction Mechanisms are needed for building new instances of a data type.
Restrictions may place the limitations on how the data structuring mechanism is used.

Cartesian Products (Records)

Mathematical model:: compositions of a fixed number of objects from possibly different domains. Given subdomains \(D_1, D_2, \ldots, D_n\), the cartesian product domain is the domain of all \(n\)-tuples \((a_1, a_2, \ldots, a_n)\) where each \(a_i\) is a member of \(D_i\).

Denoted \(D_1 \times D_2 \times \ldots \times D_n\).

Discriminated Unions (Unions, Variant Records)

Mathematical model:: objects come from a union of possible subdomains.

Given subdomains \(D_1, D_2, \ldots, D_n\), the discriminated union domain \(U\) is denoted \(U = D_1 + D_2 + \ldots + D_n\). Any object \(d\) in \(U\) belongs to one of the subdomains \(D_i\).

Definition:: e.g., Algol 68 united modes.

mode ib = union(int, bool)
ib x

defines type ib as a union type and variable x as a variable of this type. Pascal's variant record mechanism provides a combination of discriminated union and cartesian product.

Access:: a method of dicriminating the subtype is needed.
e.g., the Algol 68 case structure (cf. Modula-3 TYPECASE).

case x in
  (int i):   process_integer(i)
  (bool b):  process_boolean(b)
esac

Construction:: values are constructed using operations of the appropriate subdomain; the value is usually injected into the union domain automatically.

x := 5;
x := true

Algebraic Data Types in Haskell, ML and other functional languages.

Sequences (Strings, Lists)

Mathematical model:: Ordered collections of any number of elements of a base domain \(C\).
Denoted \(\lbrace\rbrace + C + C × C + C × C × C + \ldots\)

Definition:: e.g., Turbo Pascal character strings.

VAR s1, s2 : STRING[40];

declares s1 and s2 to be sequences of up to 40 characters.

Construction:

String values may be constructed using literal notation:

s1 := '383';

The key construction operation that distinguishes a sequence data type from an array data type is that of concatenation.

s2 := s1 +  ' is easy.';

Access:: Index notation for individual elements.; Subsequence selection (Ada)
e.g., s2(4..7)= ' is '

List Types in Haskell

Recursive Types

Mathematical model:: Example: domain B of binary trees of integers. B= {} + Int × B × B. A binary tree is either the empty tree (a null value) or a non-empty tree with an integer value and left and right subtrees. Must always have a union of a recursive substructure and at least one nonrecursive subdomain.
Definition:

Indirect recursion with pointers in imperative languages: (e.g., Modula-3)

TYPE IntTree = REF RECORD
                 value : INTEGER;
                 left, right : IntTree
               END;

Direct recursion in functional languages (e.g., Miranda):

IntTree :: EmptyTree | 
           NonEmptyTree Int IntTree IntTree

Access:: Discrimination between the nonrecursive and recursive subdomains is necessary.
Using pointers: the null pointer NIL, usually used to represent the nonrecursive case.

IF theTree = NIL 
  THEN (* Handle the empty tree case. *)
  ELSE (* Handle the nonempty case. *)

Accessing the components of the recursive subdomain uses cartesian product facilities (e.g., record access aTree.left).

Construction:: Using pointers: space for a tree node must be dynamically allocated and filled in. Third generation languages: sequential allocation and construction, e.g., Pascal:

NEW(theTree);
WITH theTree^ DO BEGIN
  value := 6;
  left := subtree1;
  right := subtree2
END

Simultaneous allocation and construction (Modula-3):

theTree := 
  NEW(IntTree, value := 6,
               left := subtree1,
               right := subtree2)

Parallel construction and implicit allocation (Miranda):

theTree = NonEmptyTree 6 subtree1 subtree2

Recursive types in Haskell and other functional languages are naturally implemented using Algebraic Data Types.

Finite Mappings (Arrays)

Mathematical model:: Mappings (functions) specifying objects in some range domain R for each value in some finite index domain I. Denoted \(I \rightarrow R\).
Definition:: e.g., Modula-3 arrays.

VAR x : ARRAY [1..3, 'a'..'c'] OF REAL;

or, equivalently,

VAR x : ARRAY [1..3] OF ARRAY ['a'..'c'] OF REAL;

defines a mapping from the domain {1,2,3} to the domain of mappings from {'a', 'b', 'c'} to reals.

Access:: index notation.

an individual element: x[i,ch]
a row: x[i]
a slice: Y[12..19]
an 8-element subarray of Y (Ada)

Sequential construction:: index notation in assignment.

FOR i := 1 TO 3 DO
  FOR ch := 'a' TO 'c' DO
    x[i,ch] := ...

Simultaneous construction:: (Modula-3)

x[1] := ARRAY ['a'..'c'] OF REAL{1.2, 1.7, 0.3}

General Function Types

Mathematical model:: Mappings (functions) specifying values in some range domain R for each value in some (arbitrary) input domain I. Denoted \(I \rightarrow R\).

Fully general function types are generally only supported in pure functional languages such as Haskell.

Functional languages have many elegant ways of constructing new functions in terms of other functions and values.

Operator sections.
lambda expressions for anonymous functions
functionals

Powersets (Set Types)

Mathematical model:: Objects are sets of elements of a given base type.
The domain is the set of all sets of the base type, called the powerset of the base type.
Definition:: e.g., Modula-2.

VAR digits, alphabet, alphanumerics : 
      SET OF CHAR;

N.B. SET types are very restricted in Pascal/Modula; the only allowable base types are simple discrete types (enumerations and subranges).

Construction:

set notation:

digits := ['0', '1', '2', '3', '4', 
           '5', '6', '7', '8', '9'];
alphabet := ['A'..'Z', 'a'..'z'];

set union

alphanumerics := digits + alphanumerics

set intersection: * operator in Modula, Pascal

Access:

set membership.

CharRead IN digits

subset test.

OptionSet < AllowableOptions

Restrictions: SETs in Pascal/Modula languages are restricted to small discrete types that conveniently be implemented using a bitset representation.

Updated Thu Oct. 08 2015, 06:39 by cameron.

Simon Fraser University
Engaging the World

CourSys