Algorithms¶
Algorithms perform the graph computations in Metagraph. They can also be used as helper functions or anywhere that type conversion needs additional inputs.
Abstract Algorithms¶
Abstract algorithms define the API that the user will call. It indicates which abstract types are used as inputs and what abstract type will be the output.
Abstract algorithms can also specify required abstract properties that must be satisfied in order for the algorithm to work properly. For example, strongly connected components needs a directed graph. Passing in an undirected graph is invalid for the algorithm, regardless of which specific implementation is chosen.
Abstract algorithm definitions look like this.
@abstract_algorithm("clustering.strongly_connected_components")
def strongly_connected_components(graph: Graph(is_directed=True)) -> NodeMap:
pass
The abstract_algorithm
decorator is required. The full path of the algorithm is passed to
the decorator. This determines how the algorithm will be found below resolver.algos
.
The actual name of the Python function does not matter. By convention, it should match the final name in the algorithm path.
Typing is required in the function signature as this is used to validate concrete versions of the algorithm. Required properties should be passed in to the abstract type constructor. If an argument is a plain Python type (like bool or int), that should be indicated as such. Default values may be listed for plain Python types.
The body of the function is irrelevant for abstract algorithms. Convention is to simply pass
.
Concrete Algorithms¶
Concrete algorithms do the actual work on real data objects. They must match the signature of their corresponding abstract algorithm, but be typed with concrete types rather than abstract types.
Abstract properties should not be listed in the function signature, but concrete properties can be indicated by passing them to the concrete type. Default values must never be listed. Instead, they are inherited from the abstract signature.
Concrete algorithm definitions look like this.
@concrete_algorithm("clustering.strongly_connected_components")
def nx_strongly_connected_components(graph: NetworkXGraph) -> PythonNodeMapType:
index_to_label = dict()
for i, nodes in enumerate(nx.strongly_connected_components(graph.value)):
for node in nodes:
index_to_label[node] = i
return index_to_label
The concrete_algorithm
decorator is required. The full path of the algorithm must match
the abstract algorithm’s path exactly. This is the piece that links the concrete version to
the abstract definition.
As with the abstract algorithm, the actual name of the Python function does not matter. There is no convention for what it should be named.
Notice that the signature matches the abstract algorithm, but choosing a concrete type rather than the abstract type. Concrete algorithms must contain a matching entry for every parameter in the abstract signature, but may contain additional parameters which are specialized to this implementation.
Each additional parameter must contain a default value. When calling using the normal dispatch mechanism, these default values will be used. The only way for a user to indicate a value for the additional parameters is to make an exact algorithm call.
The body of the function can be as complex as needed. Often, when building a plugin for an existing library, the concrete algorithm body is quite short because it calls the underlying library’s implementation. The only thing else to do is package the results in the correct concrete type and return.
Using the Resolver in Concrete Algorithms¶
If a concrete algorithm needs access to the Resolver
object which called it,
set the include_resolver
flag in the decorator and include a resolver
keyword argument
in the signature.
@concrete_algorithm("path.to.algorithm", include_resolver=True)
def my_conc_algo(x: NumpyNodeMap, *, resolver) -> int:
# resolver is now available
Union, List, and Optional types¶
Typically, an algorithm parameter has a single type – either a Metagraph defined type like
NodeMap
or a Python type like int
.
There are cases, however, where a single type is not sufficient. graph_build
is a good example
as shown below:
@abstract_algorithm("util.graph.build")
def graph_build(
edges: mg.Union[EdgeSet, EdgeMap],
nodes: mg.Optional[mg.Union[NodeSet, NodeMap]] = None,
) -> Graph:
pass
edges
can be either an EdgeSet
or an EdgeMap
. nodes
can be one of two possible
types, but can additionally be unspecified (i.e. optional).
To indicate these, we use the standard Python typing
objects Union
and Optional
. However,
these are limited to class objects only. In Metagraph, we often need to specialize our types –
EdgeSet(is_directed=True)
rather than just EdgeSet
. For the specialized case, the regular
Python typing.Union
would fail. To work around this limitation, Metagraph has mg.Union
, mg.List
, and
mg.Optional
which behave identically to the typing
counterparts, but accept classes and instances.
It is recommended to always use the Metagraph versions of Union
, List
, and Optional
when
defining algorithms in Metagraph.
Interaction between Union and unambiguous_subcomponents¶
When a single type is declared, or is declared as Optional, Metagraph will attempt to translate
input to be compatible with an algorithm signature. With unambiguous_subcomponents
allowing
translation across abstract types, this leads to a nice outcome where passing a NodeMap
to an
algorithm expecting a NodeSet
will just work. The algorithm obviously only needs the set of nodes,
so dropping the weights from the NodeMap
allows the algorithm to still run correctly.
For the case of Union
, however, allowing translation across abstract types is problematic.
For the case of graph_build
, if we allowed an EdgeMap
to be translated into an EdgeSet
,
we would lose critical information. A Union
indicates either is acceptable, but does not indicate
that both are equivalent.
For this reason, when a Union
is used in an algorithm signature, unambiguous_subcomponents
will be ignored for the purpose of translating input objects.
Algorithm Versions¶
Metagraph allows algorithms to be versioned. By default, all algorithm signatures define version 0 of the algorithm. To indicate other versions, include the version in the decorator.
@abstract_algorithm("clustering.strongly_connected_components", version=2)
def strongly_connected_components(graph: Graph(is_directed=True)) -> NodeMap:
pass
The algorithm version must be an integer (i.e. no semantic versioning) and should increment one higher than the previous version.
Algorithms might need to bump their version when the algorithm signature changes, but also to allow rearranging of the algorithm hierarchy and path structure.
Multiple versions of an algorithm are allowed to be defined within a single release of Metagraph or a Metagraph plugin. Even though multiple versions are defined, Metagraph will only use the latest abstract version defined. This keeps the usage of Metagraph simple while allowing plugin authors to write implementations for multiple releases of Metagraph. This allows plugins to update asynchronously from core Metagraph.