On this page:
1.1 Tensors
1.2 Automatic Differentiation
1.3 Operator Extension
1.4 Deep Learning Functions
1.5 Summary of Types
8.12

1 Overview🔗ℹ

The toolkit has four major pieces that are important to understand for this documentation. These four pieces are
  • Tensors

  • Automatic Differentiation

  • Operator Extension

  • Deep Learning Functions

These pieces work together to form the whole toolkit. This overview provides a high level roadmap in how to understand this documentation and the accompanying code.

1.1 Tensors🔗ℹ

A tensor is the fundamental data structure in deep learning. A tensor can be thought of an n-dimensional array where it is possible for n to be 0. When n is 0, the tensor is known as a scalar.

The easiest way to think about it is that a scalar is a single number and tensors are vectors of scalars or vectors of tensors.

Every tensor has a shape. The shape of a tensor is a list of n members where the ith member of the list is the size of the ith dimension of the tensor. For scalars, the shape is the empty list.

The number n is known as the rank of the tensor. The following types are used to denote entities related to tensors.
  • scalar? - Tensors of rank 0

  • tensor? - Tensors of rank 0 or higher

  • shape? - The type (listof natural?) signifies the shape of a tensor.

1.2 Automatic Differentiation🔗ℹ

Malt provides a simple reverse-mode automatic differentiation mechanism that is based on the concept of duals. A dual carries the tensor result of a function along with a link which encodes the chain of operations that produced the tensor result. This allows the gradient of the function that produced the tensor result to be computed.

Duals are automatically constructed when differentiable primitive operators (also provided by Malt) are used.

For interoperability, numerical constants are also considered to be duals with an empty link (known as end-of-chain).

Duals and tensors can contain each other, depending upon the representation. Malt provides three representations of tensors in increasing order of complexity and efficiency.

The default representation for tensors in Malt is learner. The Malt source repository can be configured and recompiled to choose different tensor representations in order to experiment with them. To set a specific implementation, see Setting tensor implementations.

The following types are used to denote entities related to duals.
  • dual? - Duals

  • link? - Links included in a dual. Defined as the type
    (-> dual? tensor? gradient-state? gradient-state?)

  • gradient-state? - A hashtable from dual? to tensor?

  • differentiable? - Either a dual?, or a (listof differentiable?). In the learner representation (vectorof differentiable?) is also considered to be differentiable?, but not in other representations.

1.3 Operator Extension🔗ℹ

The simple recursive structure of tensors allows commonly used numerical primitives to be extended to produce what are known as pointwise extensions. These are also known as broadcast operations over arrays. Malt provides an additional ability to pause the extension at a certain rank. So, rather than go all the way to the scalars in the array, the extension can stop at one of the higher dimensions. This allows the construction of polynomial-complexity functions by composing extensions.

Additionally, these extended primitives are automatically differentiable and functions built by composing these primitives can also be automatically differentiated (within limits of differentiablity of the function).

Section Differentiable extended numerical functions lists the primitives provided by Malt. Malt also provides tools to build extended versions of user defined functions. The type signatures of these tools are specific to the representation of tensors described above.

The following types are used to denote entities related to operator extension.
  • primitive-1? - A unary non-extended primitive.

  • primitive-2? - A binary non-extended primitive.

1.4 Deep Learning Functions🔗ℹ

Building on top of tensors and automatic differentiation, Malt provides a collection of deep learning specific functions – loss functions, layer functions, gradient descent, compositional mechanisms, hyperparameters, etc.

The following types are used to describe some of these functions.
  • theta? - A list of tensors which forms a parameter set.

1.5 Summary of Types🔗ℹ

The following types are used primarily in the description of functions, but some types are marked as "virtual", in the sense that predicates for the type are not defined, but their intent is clear.