Computer.cache() to decorate another function, func, that will be added as the computation/callable in a task.
Caching is useful when
get() is called multiple times on the same Computer, or across processes, invoking a slow func each time.
# Configure the directory for cache file storage c = Computer(cache_path=Path("/some/directory")) @c.cache def myfunction(*args, **kwargs): # Expensive operations, e.g. load large files; # invoke external programs return data c.add("myvar", (myfunction,)) # Data is cached in /some/directory/myfunction-*.pkl c.get("myvar") # Cached value is loaded and returned c.get("myvar")
A cache key is computed from:
the name of func.
the arguments to func, and
the compiled bytecode of func (see
If a file exists in
cache_path with a matching key, it is loaded and returned instead of calling func.
If no matching file exists (a “cache miss”) or the
cache_skip configuration option is
True, func is executed and its return value is cached in the cache directory,
cache_path (see Configuration → Caching).
A cache miss will occur if any part of the key changes; that is, if:
the function is renamed in the source code,
the function is called with different arguments, or
the function source code is modified.
Cache data loaded from files#
Consider a function that loads a very large file, or performs some slow processing on its contents:
from pathlib import Path import pandas as pd from genno import Quantity @c.cache def slow_data_load(path, _extra_cache_key=None): # Load data in some way result = pd.read_xml(path, ...) # … further processing … return Quantity(result)
We want to cache the result of
slow_data_load(), but have the cache refreshed when the file contents change.
We do this using the _extra_cache_key argument to the function.
This argument is not used in the function, but does affect the value of the cache key.
When calling the function, pass some value that indicates whether the contents of path have changed.
One possibility is the modification time, via
def load_cached_1(path): return slow_data_load(path, path.stat().st_mtime)
Another possibility is to hash the entire file.
hash_contents() is provided for this purpose:
from genno.caching import hash_contents def load_cached_2(path): return slow_data_load(path, hash_contents(path))
For very large files, even hashing the file in this way can be slow, and this check must always be performed in order to check for a matching cache key.
The decorated functions can be used as computations in the graph, or called directly:
c.add("A1", load_cached_1, "example-file-A.xml") c.add("A2", load_cached_2, "example-file-A.xml") # Load and process the contents of example-file-A.xml c.get("A1") # Load again; the value is retrieved from cache if the # file has not been modified c.get("A1") # Same without using the Computer load_cached1("example-file-A.xml") load_cached1("example-file-A.xml")
Integrate and extend#
Encodermay be configured to handle (or ignore) additional/custom types that may appear as arguments to functions decorated with
Computer.cache(). See the examples for
decorate()can be used entirely independently of any
Computerby passing the cache_path (and optional cache_skip) keyword arguments:
from functools import partial from genno.caching import decorate # Create a decorator with a custom cache path mycache = partial(decorate, cache_path=Path("/path/to/cache")) @mycache def func(a, b=2): return a ** b
In this usage, it offers a subset of the feature-set of
Internals and utilities#
- class genno.caching.Encoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]#
This is a one-way encoder used only to serialize arguments for
For o, return an object serializable by the base
pathlib.Path: the string representation of o.
codeobjects (from Python’s built-in
inspectmodule), e.g. a function or lambda:
blake2bhash of the object’s bytecode and its serialized constants.
This is not 100% guaranteed to change if the operation of o (or other code called in turn by o) changes. If relying on this behaviour, check carefully.
Any type with a serializer registered with
register(): the return value of the serializer, called on o.
- classmethod ignore(*types)[source]#
Tell the Encoder (thus
hash_args()) to ignore arguments of types.
>>> class Bar: >>> pass >>> >>> # Don't use Bar instances in cache keys >>> @genno.caching.Encoder.ignore(Bar)
Ignore all unrecognized types
- classmethod register(func)[source]#
Register a func to serialize a type not handled by
func should return a type that is handled by JSONEncoder; see the docs.
>>> class Foo: >>> a = 3 >>> >>> @genno.caching.Encoder.register >>> def _encode_foo(o: Foo): >>> return dict(a=o.a) # JSONEncoder can handle dict()
- genno.caching.decorate(func: Callable, computer=None, cache_path=None, cache_skip=False) Callable [source]#
computer (Computer, optional) – If supplied, the
configdictionary stored in the Computer is used to look up values for cache_path and cache_skip, at the moment when func is called.
cache_path (os.Pathlike, optional) – Directory in which to store cache files.
cache_skip (bool, optional) – If
True, ignore existing cache entries and overwrite them with new values from func.
- genno.caching.hash_args(*args, **kwargs)[source]#
Return a 20-character
blake2bhex digest of args and kwargs.
- genno.caching.hash_code(func: Callable) str [source]#
blake2bhex digest of the compiled bytecode of func.