Caching

Basics

Use Computer.cache() to decorate another function, func, that will be added as the operator/callable in a task. Caching is useful when get() is called multiple times on the same Computer, or across processes, invoking a slow func each time.

# Configure the directory for cache file storage
c = Computer(cache_path=Path("/some/directory"))

@c.cache
def myfunction(*args, **kwargs):
    # Expensive operations, e.g. load large files;
    # invoke external programs
    return data

c.add("myvar", (myfunction,))

# Data is cached in /some/directory/myfunction-*.pkl
c.get("myvar")

# Cached value is loaded and returned
c.get("myvar")

A cache key is computed from:

  1. the name of func.

  2. the arguments to func, and

  3. the compiled bytecode of func (see hash_code()).

If a file exists in cache_path with a matching key, it is loaded and returned instead of calling func.

If no matching file exists (a “cache miss”) or the cache_skip configuration option is True, func is executed and its return value is cached in the cache directory, cache_path (see Configuration → Caching). A cache miss will occur if any part of the key changes; that is, if:

  1. the function is renamed in the source code,

  2. the function is called with different arguments, or

  3. the function source code is modified.

Cache data loaded from files

Consider a function that loads a very large file, or performs some slow processing on its contents:

from pathlib import Path

import pandas as pd
from genno import Quantity

@c.cache
def slow_data_load(path, _extra_cache_key=None):
    # Load data in some way
    result = pd.read_xml(path, ...)
    # … further processing …
    return Quantity(result)

We want to cache the result of slow_data_load, but have the cache refreshed when the file contents change. We do this using the _extra_cache_key argument to the function. This argument is not used in the function, but does affect the value of the cache key.

When calling the function, pass some value that indicates whether the contents of path have changed. One possibility is the modification time, via pathlib.Path.stat():

def load_cached_1(path):
    return slow_data_load(path, path.stat().st_mtime)

Another possibility is to hash the entire file. hash_contents() is provided for this purpose:

from genno.caching import hash_contents

def load_cached_2(path):
    return slow_data_load(path, hash_contents(path))

Warning

For very large files, even hashing the file in this way can be slow, and this check must always be performed in order to check for a matching cache key.

The decorated functions can be used as operators in the graph, or called directly:

c.add("A1", load_cached_1, "example-file-A.xml")
c.add("A2", load_cached_2, "example-file-A.xml")

# Load and process the contents of example-file-A.xml
c.get("A1")

# Load again; the value is retrieved from cache if the
# file has not been modified
c.get("A1")

# Same without using the Computer
load_cached1("example-file-A.xml")
load_cached1("example-file-A.xml")

Integrate and extend

  • Encoder may be configured to handle (or ignore) additional/custom types that may appear as arguments to functions decorated with Computer.cache(). See the examples for Encoder.register() and Encoder.ignore().

  • decorate() can be used entirely independently of any Computer by passing the cache_path (and optional cache_skip) keyword arguments:

    from functools import partial
    
    from genno.caching import decorate
    
    # Create a decorator with a custom cache path
    mycache = partial(decorate, cache_path=Path("/path/to/cache"))
    
    @mycache
    def func(a, b=2):
        return a ** b
    

    In this usage, it offers a subset of the feature-set of joblib.Memory

Internals and utilities

class genno.caching.Encoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

JSON encoder.

This is a one-way encoder used only to serialize arguments for hash_args() and hash_code().

default(o)[source]

For o, return an object serializable by the base json.JSONEncoder.

  • pathlib.Path: the string representation of o.

  • Code objects (from Python’s built-in inspect module), for instance a function or lambda: blake2b() hash of the object’s bytecode and its serialized constants.

    Warning

    This is not 100% guaranteed to change if the operation of o (or other code called in turn by o) changes. If relying on this behaviour, check carefully.

  • Any type indicated with ignore(): empty tuple.

  • Any type with a serializer registered with register(): the return value of the serializer, called on o.

classmethod ignore(*types)[source]

Tell the Encoder (thus hash_args()) to ignore arguments of types.

Example

>>> class Bar:
>>>    pass
>>>
>>> # Don't use Bar instances in cache keys
>>> @genno.caching.Encoder.ignore(Bar)

Ignore all unrecognized types

>>> @genno.caching.Encoder.ignore(object)
classmethod register(func)[source]

Register func to serialize a type not handled by json.JSONEncoder.

func should return a type that is handled by JSONEncoder; see the docs.

Example

>>> class Foo:
>>>    a = 3
>>>
>>> @genno.caching.Encoder.register
>>> def _encode_foo(o: Foo):
>>>     return dict(a=o.a)  # JSONEncoder can handle dict()
genno.caching.decorate(func: Callable, computer: Computer | None = None, cache_path=None, cache_skip: bool = False) Callable[source]

Helper for Computer.cache().

Parameters:
  • computer (Computer, optional) – If supplied, the config dictionary stored in the Computer is used to look up values for cache_path and cache_skip, at the moment when func is called.

  • cache_path (os.PathLike, optional) – Directory in which to store cache files.

  • cache_skip (bool, optional) – If True, ignore existing cache entries and overwrite them with new values from func.

See also

hash_args, hash_code

genno.caching.hash_args(*args, **kwargs)[source]

Return a 20-character hashlib.blake2b() hex digest of args and kwargs.

Used by decorate().

See also

Encoder

genno.caching.hash_code(func: Callable) str[source]

Return the hashlib.blake2b() hex digest of the compiled bytecode of func.

See also

Encoder

genno.caching.hash_contents(path: Path | str, chunk_size=65536) str[source]

Return the hashlib.blake2b() hex digest the file contents of path.

Parameters:

chunk_size (int, optional) – Read the file in chunks of this size; default 64 kB.