Concepts and usage#

This page introduces the essential concepts of genno and demonstrates basic usage. Click on the names of classes and methods to access complete descriptions in the Top-level classes and functions.

Quantity#

genno.Quantity represents a sparse, multi-dimensional array with labels and units. In research code it is common to use terms like ‘variables’, ‘parameters’, etc.; in genno, all data is ‘quantities’.

A Quantity has:

  • 0 or more dimensions, with labels along those dimensions (e.g. specific years; the names of specific technologies);

    • A 0-dimensional Quantity is a single or ‘scalar’ (as opposed to ‘vector’) value.

  • sparse coverage or “missingness,” i.e. there is not necessarily a value for each combination of labels; and

  • associated units.

Notation:

\[\begin{split}\begin{align} A^{ij} & = \left[a_{i,j} \right] \\ i & \in I \\ j & \in J \\ a_{i,j} & \in \left\{ \mathbb{R}, \text{NaN} \right\} \\ a_{i,j} & [=]\, \text{units of X} \end{align}\end{split}\]

Dimensionality of quantities#

Quantities may have many dimensions. For instance, suppose \(X^{abcdefghij}\), which has ten dimensions. For some calculations, we may not care about some of these dimensions. In this case, we don’t really want the 10-dimensional quantity, but its partial sum over a few dimensions, while others are retained.

Notation. Consider a quantity with three dimensions, \(A^{ijk}\), and another with two, \(B^{kl}\), and a scalar \(C\). We define partial sums over every possible combination of dimensions:

\[\begin{split}\begin{array} AA^{ij} = \left[ a_{i,j} \right], & a_{i,j} = \sum_{k}{a_{i,j,k}} \ \forall \ i, j & \text{similarly } A^{ik}, A^{jk} \\ A^{i} = \left[ a_i \right], & a_i = \sum_j\sum_{k}{a_{i,j,k}} \ \forall\ i & \text{similarly } A^j, A^k \\ A = \sum_i\sum_j\sum_k{a_{i,j,k}} & & \text{(a scalar)} \end{array}\end{split}\]

Note that \(A\) and \(B\) share one dimension, \(k\), but the other dimensions are distinct. We specify that simple arithmetic operations result in a quantity whose dimensions are the union of the dimensions of the operands. In other words:

\[\begin{split}\begin{array} CC + A^{i} = X^{i} = \left[ x_{i} \right], & x_{i} = C + a_{i} \ \forall \ i \\ A^{jk} \times B^{kl} = Y^{jkl} = \left[ y_{j,k,l} \right], & y_{j,k,l} = a_{j,k} \times b_{k,l} \ \forall \ j, k, l \\ A^{j} - B^{j} = Z^{j} = \left[ z_{j} \right], & z_{j} = a_{j} - b_{j} \ \forall \ j \\ \end{array}\end{split}\]

As a result of this rule:

  • The difference \(Z^j\) has the same dimensionality as both of its operands.

  • The sum \(X^i\) has the same dimensionality as one of its operands.

  • The product \(Y^{jkl}\) has a different dimensonality from each of its operands.

These operations are called broadcasting and alignment: The scalar value \(C\) is broadcast across all labels on the dimension \(i\) that it lacks, in order to calculate \(x_i\). \(A^{jk}\) and \(B^{kl}\) are aligned on matching values of \(k\), but broadcast over dimensions \(j\) and \(l\), respectively.

Key#

genno.Key is used to refer to a Quantity, before it is computed. For multi-dimensional calculations, we need keys that distinguish \(A^i\)—the partial sum of \(A^{ijk}\) used in the calculation of \(X^i\)—from \(A^{jk}\)—a different partial sum used in the calculation of \(Y^{jkl}\). It is not sufficient to refer to both as ‘A’, since this is ambiguous about what calculation we want to perform.

A Key has a name, zero or more dimensions, and an optional tag:

In [1]: from genno import Key

# Quantity named 'A' dimensions i, j, k
In [2]: A_ijk = Key("A", ["i", "j", "k"])

In [3]: type(A_ijk)
Out[3]: genno.core.key.Key

In [4]: repr(A_ijk)
Out[4]: '<A:i-j-k>'

In [5]: str(A_ijk)
Out[5]: 'A:i-j-k'

# With different dimensions
In [6]: A_jk = Key("A", ["j", "k"])

In [7]: A_jk
Out[7]: <A:j-k>

Key has methods that allow producing related keys:

# Drop dimensions from a key
In [8]: A_ijk.drop("i")
Out[8]: <A:j-k>

# Describe a key that is the product of two others; add a tag
In [9]: B_kl = Key("B", ["k", "l"])

In [10]: B_kl
Out[10]: <B:k-l>

In [11]: Key.product("Y", A_ijk.drop("i"), B_kl, tag="initial")
Out[11]: <Y:j-k-l:initial>

A Key object can also be produced by parsing a string representation:

In [12]: Z_j = Key("Z:j")

In [13]: Z_j
Out[13]: <Z:j>

# Keys compare and hash() identically to their str() representation
In [14]: Z_j == "Z:j"
Out[14]: True

In [15]: Z_j == "Y:i-j-k"
Out[15]: False

Computer#

Computer provides the main interface of genno. Usage of a Computer involves two steps:

  1. Use Computer.add() and other helper methods to describe all the tasks the Computer might perform.

  2. Use Computer.get() to trigger the execution of one or more tasks.

This two-step process allows the genno to deliver good performance by skipping irrelevant tasks and avoiding re-computing intermediate results that are used in multiple places.

Graph#

Computer is built around a graph of nodes and edges; specifically, a directed, acyclic graph. This means:

  • Every edge has a direction; from one node to another.

  • There are no recursive loops in the graph; i.e. no node is its own ancestor.

In the graph, every node represents a computation; usually a tuple called a task wherein the first element is a callable() like a function. This callable can be:

  • a numerical calculation operating on one or more Quantity objects;

  • more generally, an operator that can perform any other action, for instance transforming data formats, reading and writing files, or writing plots.

Every node has a unique key that labels the results of its computation. In genno, these labels can be Key object (if the task produces a Quantity), str (most other cases) or generally any other hashable object.

The computation at any node may depend on certain inputs. In the graph, these can be literal values, or keys that refer to the outputs produced by other nodes. These associations—output from one node, input to another—are the edges of the graph.

When Computer.get() is called, the values for each input in a task tuple are first computed and then passed, in the same order, as positional arguments to the callable.

Note

genno relies on the Dask implementation of task graphs. For a complete description of tasks, see the Specification in the dask documentation.

Describe tasks#

For example, the following equation:

\[C = A + B\]

…is represented by:

  • A node named “A” that provides the value of A.

  • A node named “B” that provides the value of B.

  • A node named “C” that computes a sum of its inputs.

  • An edge from “A” to ‘C’, indicating that the value of A is an input to C.

  • An edge from “B” to ‘C’.

To describe this using the Computer (step 1):

In [16]: from genno import Computer

# Create a new Computer object
In [17]: c = Computer()

# Add two nodes
# These have no inputs; they only return a literal value.
In [18]: c.add("A", 1)
Out[18]: 'A'

In [19]: c.add("B", 2)
Out[19]: 'B'

# Add one node and two edges
In [20]: c.add("C", (lambda *inputs: sum(inputs), "A", "B"))
Out[20]: 'C'

# Equivalent, without parentheses
In [21]: c.add("C", lambda *inputs: sum(inputs), "A", "B")
Out[21]: 'C'

To unpack this code:

  • Computer.add() is used to build the graph.

  • The first argument to add() is the label or key of the node; the description of what it will produce.

  • The following arguments describe the computation (for instance, a task with a specific operator) to be performed:

    • For nodes ‘A’ and ‘B’, these are simply a raw or literal value. When the node is executed, this value is returned.

    • For node ‘C’, it is a tuple with 3 items: (lambda *inputs: sum(inputs), 'A', 'B').

      1. lambda *inputs: sum(inputs), is an anonymous or ‘lambda’ function that computes the sum of its inputs.

      2. The label "A" is a reference to another node. This indicates that there is a graph edge from node "A" into node "C".

      3. Same as (2)

All the keys in a Computer can be listed with keys().

Execute tasks#

The task to produce “C”, and any direct or indirect inputs required, is executed using Computer.get():

In [22]: c.get("C")
Out[22]: 3

Computer.describe() displays a simple textual trace of the tasks used in this chain of computations. A portion of the graph is printed out as a nested list:

In [23]: print(c.describe("C"))
'C':
- <lambda>
- 'A':
  - 1
- 'B':
  - 2

This description shows how genno traverses the graph in order to calculate the desired quantity:

  1. The desired value is from node “C”, which computes a function of some arguments.

  2. The first argument is "A".

  3. “A” is the name of another node.

  4. Node “A” gives a literal value int(1), which is stored.

  5. The Computer returns to “C” and moves on to the next argument, “B”.

  6. Steps 3 and 4 are repeated for “B”, giving int(2).

  7. All of the arguments to “C” have been processed.

  8. The function/operator for “C” is called.

    As arguments, instead of the strings “A” and “B”, this function receives the computed int values from steps 4 and 6 respectively.

  9. The result is returned.

In this example, “A” and “B” are, at most, 1 step away from the node requested, and are each used once. In more realistic examples, the graph can have:

  • Long chains of calculations, each depending on the output of its ancestors, and/or

  • Multiple connection, so that results like “A” are used by more than one child calculations.

However, the Computer still follows the same procedure to traverse the graph and calculate the results.

Operators#

A operator is any Python function or callable that operates on Quantities or other data. genno.operator includes many common operators; see the API documentation for descriptions of each.

The power of genno is the ability to link any code, no matter how complex, into the graph, and have it operate on the results of other code. Tasks can perform complex tasks such as:

  • Read in exogenous data, including over a network connection,

  • Trigger output to files(s) or a database, or

  • Execute user-defined methods.