Caching¶
Basics¶
Use Computer.cache()
to decorate another function, func, that will be added as the operator/callable in a task.
Caching is useful when get()
is called multiple times on the same Computer, or across processes, invoking a slow func each time.
# Configure the directory for cache file storage
c = Computer(cache_path=Path("/some/directory"))
@c.cache
def myfunction(*args, **kwargs):
# Expensive operations, e.g. load large files;
# invoke external programs
return data
c.add("myvar", (myfunction,))
# Data is cached in /some/directory/myfunction-*.pkl
c.get("myvar")
# Cached value is loaded and returned
c.get("myvar")
A cache key is computed from:
the name of func.
the arguments to func, and
the compiled bytecode of func (see
hash_code()
).
If a file exists in cache_path
with a matching key, it is loaded and returned instead of calling func.
If no matching file exists (a “cache miss”) or the cache_skip
configuration option is True
, func is executed and its return value is cached in the cache directory, cache_path
(see Configuration → Caching).
A cache miss will occur if any part of the key changes; that is, if:
the function is renamed in the source code,
the function is called with different arguments, or
the function source code is modified.
Cache data loaded from files¶
Consider a function that loads a very large file, or performs some slow processing on its contents:
from pathlib import Path
import pandas as pd
from genno import Quantity
@c.cache
def slow_data_load(path, _extra_cache_key=None):
# Load data in some way
result = pd.read_xml(path, ...)
# … further processing …
return Quantity(result)
We want to cache the result of slow_data_load
, but have the cache refreshed when the file contents change.
We do this using the _extra_cache_key argument to the function.
This argument is not used in the function, but does affect the value of the cache key.
When calling the function, pass some value that indicates whether the contents of path have changed.
One possibility is the modification time, via pathlib.Path.stat()
:
def load_cached_1(path):
return slow_data_load(path, path.stat().st_mtime)
Another possibility is to hash the entire file.
hash_contents()
is provided for this purpose:
from genno.caching import hash_contents
def load_cached_2(path):
return slow_data_load(path, hash_contents(path))
Warning
For very large files, even hashing the file in this way can be slow, and this check must always be performed in order to check for a matching cache key.
The decorated functions can be used as operators in the graph, or called directly:
c.add("A1", load_cached_1, "example-file-A.xml")
c.add("A2", load_cached_2, "example-file-A.xml")
# Load and process the contents of example-file-A.xml
c.get("A1")
# Load again; the value is retrieved from cache if the
# file has not been modified
c.get("A1")
# Same without using the Computer
load_cached1("example-file-A.xml")
load_cached1("example-file-A.xml")
Integrate and extend¶
Encoder
may be configured to handle (or ignore) additional/custom types that may appear as arguments to functions decorated withComputer.cache()
. See the examples forEncoder.register()
andEncoder.ignore()
.decorate()
can be used entirely independently of anyComputer
by passing the cache_path (and optional cache_skip) keyword arguments:from functools import partial from genno.caching import decorate # Create a decorator with a custom cache path mycache = partial(decorate, cache_path=Path("/path/to/cache")) @mycache def func(a, b=2): return a ** b
In this usage, it offers a subset of the feature-set of
joblib.Memory
Internals and utilities¶
- class genno.caching.Encoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶
JSON encoder.
This is a one-way encoder used only to serialize arguments for
hash_args()
andhash_code()
.- default(o)[source]¶
For o, return an object serializable by the base
json.JSONEncoder
.pathlib.Path
: the string representation of o.Code objects (from Python’s built-in
inspect
module), for instance a function or lambda:blake2b()
hash of the object’s bytecode and its serialized constants.Warning
This is not 100% guaranteed to change if the operation of o (or other code called in turn by o) changes. If relying on this behaviour, check carefully.
Any type with a serializer registered with
register()
: the return value of the serializer, called on o.
- classmethod ignore(*types)[source]¶
Tell the Encoder (thus
hash_args()
) to ignore arguments of types.Example
>>> class Bar: >>> pass >>> >>> # Don't use Bar instances in cache keys >>> @genno.caching.Encoder.ignore(Bar)
Ignore all unrecognized types
>>> @genno.caching.Encoder.ignore(object)
- classmethod register(func)[source]¶
Register func to serialize a type not handled by
json.JSONEncoder
.func should return a type that is handled by JSONEncoder; see the docs.
Example
>>> class Foo: >>> a = 3 >>> >>> @genno.caching.Encoder.register >>> def _encode_foo(o: Foo): >>> return dict(a=o.a) # JSONEncoder can handle dict()
- genno.caching.decorate(func: Callable, computer: Computer | None = None, cache_path=None, cache_skip: bool = False) Callable [source]¶
Helper for
Computer.cache()
.- Parameters:
computer (
Computer
, optional) – If supplied, theconfig
dictionary stored in the Computer is used to look up values for cache_path and cache_skip, at the moment when func is called.cache_path (
os.PathLike
, optional) – Directory in which to store cache files.cache_skip (
bool
, optional) – IfTrue
, ignore existing cache entries and overwrite them with new values from func.
- genno.caching.hash_args(*args, **kwargs)[source]¶
Return a 20-character
hashlib.blake2b()
hex digest of args and kwargs.Used by
decorate()
.See also
- genno.caching.hash_code(func: Callable) str [source]¶
Return the
hashlib.blake2b()
hex digest of the compiled bytecode of func.See also