This page was generated from getting_started/tutorial.ipynb.
This is a brief tutorial of basic Metagraph usage.
First, we import Metagraph:
import metagraph as mg
The default resolver automatically pulls in all registered Metagraph plugins.
res = mg.resolver
A hierarchy of available types is automatically added as properties on res.
res
dir(res.types)
['BipartiteGraph', 'DataFrame', 'EdgeMap', 'EdgeSet', 'Graph', 'Matrix', 'NodeMap', 'NodeSet', 'Vector']
Two important concepts in Metagraph are abstract types and concrete types.
Abstract types describe a generic kind of data container with potentially many equivalent representations.
Concrete types describe a specific data object which fits under the abstract type category.
One can think of abstract types as data container specifications and concrete types as implementations of those specifications.
For each abstract type, there are several concrete types.
Within a single abstract type, all concrete types are able to represent equivalent data, but in a different format or data structure.
Here we show the concrete types which represent Graphs:
Graphs
dir(res.types.Graph)
['GrblasGraphType', 'NetworkXGraphType', 'ScipyGraphType']
Algorithms are also listed under res.algos and grouped by categories.
res.algos
dir(res.algos)
['bipartite', 'centrality', 'clustering', 'embedding', 'flow', 'subgraph', 'traversal', 'util']
dir(res.algos.traversal)
['all_pairs_shortest_paths', 'astar_search', 'bellman_ford', 'bfs_iter', 'bfs_tree', 'dfs_iter', 'dfs_tree', 'dijkstra', 'minimum_spanning_tree']
Let’s see how to use Metagraph by first constructing a graph from an edge list.
Begin with an input csv file representing an edge list and weights.
data = """ Source,Destination,Weight 0,1,4 0,3,2 0,4,7 1,3,3 1,4,5 2,4,5 2,5,2 2,6,8 3,4,1 4,7,4 5,6,4 5,7,6 """
Read in the csv file and convert to a Pandas DataFrame.
DataFrame
import pandas as pd import io csv_file = io.StringIO(data) df = pd.read_csv(csv_file)
This DataFrame represents a graph’s edges, but Metagraph doesn’t know that yet. To use the DataFrame within Metagraph, we first need to convert it into an EdgeMap.
EdgeMap
A PandasEdgeMap takes a DataFrame plus the labels of the columns representing source, destination, and weight. With these, Metagraph will know how to interpret the DataFrame as an EdgeMap.
PandasEdgeMap
em = res.wrappers.EdgeMap.PandasEdgeMap(df, 'Source', 'Destination', 'Weight', is_directed=False) em.value
Graphs and EdgeMaps have many similarities, but Graphs are more powerful. Graphs can have weights on the nodes, not just on the edges. Graphs can also have isolate nodes (nodes with no edges), which EdgeMaps cannot have.
EdgeMaps
Most Metagraph algorithms take a Graph as input, so we will convert our PandasEdgeMap into a Graph. In this case, it will become a NetworkXGraph.
Graph
NetworkXGraph
g = res.algos.util.graph.build(em) g
<metagraph.plugins.networkx.types.NetworkXGraph at 0x7f30cb809790>
g.value.edges(data=True)
EdgeDataView([(0, 1, {'weight': 4}), (0, 3, {'weight': 2}), (0, 4, {'weight': 7}), (1, 3, {'weight': 3}), (1, 4, {'weight': 5}), (3, 4, {'weight': 1}), (4, 2, {'weight': 5}), (4, 7, {'weight': 4}), (2, 5, {'weight': 2}), (2, 6, {'weight': 8}), (5, 6, {'weight': 4}), (5, 7, {'weight': 6})])
Because Metagraph knows how to interpret g as a Graph, we can easily convert it other Graph formats.
g
Let’s convert it to a ScipyGraph. This format stores the edges and weights in a scipy.sparse matrix along with a numpy array mapping the position to a NodeId (in case the nodes are not sequential from 0..n). Any node weights are stored in a separate numpy array.
ScipyGraph
g2 = res.translate(g, res.wrappers.Graph.ScipyGraph) g2
<metagraph.plugins.scipy.types.ScipyGraph at 0x7f30bf36bd90>
The matrix is accessed using g2.value. The node list is accessed using .node_list.
g2.value
.node_list
We can verify the weighs and edges by inspecting the sparse adjacency matrix directly.
g2.value.toarray()
array([[0, 4, 0, 2, 7, 0, 0, 0], [4, 0, 0, 3, 5, 0, 0, 0], [0, 0, 0, 0, 5, 2, 8, 0], [2, 3, 0, 0, 1, 0, 0, 0], [7, 5, 5, 1, 0, 0, 0, 4], [0, 0, 2, 0, 0, 0, 4, 6], [0, 0, 8, 0, 0, 4, 0, 0], [0, 0, 0, 0, 4, 6, 0, 0]])
We can also convert g into an adjacency matrix representation using a GrblasGraph. This also stores the edges and node weights separately.
GrblasGraph
g3 = res.translate(g, res.types.Graph.GrblasGraphType) g3
<metagraph.plugins.graphblas.types.GrblasGraph at 0x7f30e00f44d0>
g3.value
grblas.Matrix
nvals
nrows
ncols
dtype
We can also visualize the graph.
import grblas grblas.io.draw(g3.value)
Rather than actually converting g into other formats, let’s ask Metagraph how it will do the conversion. Each conversion requires a translator (written by plugin developers) to convert between the two formats. However, even if there isn’t a direct translator between two formats, Metagraph will find a path and take several translation steps as needed to perform the task.
The mechanism for viewing the plan is to invoke the translation from res.plan.translate rather than res.translate. Other than the additional .plan, the call signature is identical.
res.plan.translate
res.translate
.plan
In this first example, there is a direct function which translates between NetworkXGraphType and ScipyGraphType.
NetworkXGraphType
ScipyGraphType
res.plan.translate(g, res.types.Graph.ScipyGraphType)
[Direct Translation] NetworkXGraphType -> ScipyGraphType
In this next example, there is no direct function which convert NetworkXGraphType into a GrblasGraphType. Instead, we have to first convert to ScipyGraphType and then to GrblasGraphType before finally arriving at our desired format.
GrblasGraphType
While Metagraph will do the conversion automatically, understanding the steps involved helps users plan for expected computation time and memory usage. If needed, plugin developers can write a plugin to provide a direct translation path.
res.plan.translate(g, res.types.Graph.GrblasGraphType)
[Multi-step Translation] (start) NetworkXGraphType -> ScipyGraphType (end) -> GrblasGraphType
Algorithms are described initially in an abstract definition. For bfs_iter, we take a Graph and return a Vector indicating the NodeIDs in the order visited.
Vector
After the abstract definition is written, multiple concrete implementations are written to operate on concrete types.
Let’s look at the signature and specific implementations available for bfs_iter.
res.algos.traversal.bfs_iter.signatures
Signature: (graph: metagraph.types.Graph, source_node: NodeID, depth_limit: int = -1) -> metagraph.types.Vector Implementations: (graph: NetworkXGraphType({}), source_node: NodeID, depth_limit: int = -1) -> NumpyVectorType({}) (graph: ScipyGraphType({}), source_node: NodeID, depth_limit: int = -1) -> NumpyVectorType({})
We see that there are two implementations available, each with a different type of input graph.
Let’s perform a breadth-first search with our different representations of g. We should get approximately the same answer no matter which implementation is chosen (same NodeIDs within each depth level of the traversal).
cc = res.algos.traversal.bfs_iter(g, 0) cc
array([0, 1, 3, 4, 2, 7, 5, 6])
cc2 = res.algos.traversal.bfs_iter(g2, 0) cc2
Similar to how we can view the plan for translations, we can view the plan for algorithms.
No translation is needed because we already have a concrete implementation which takes a NetworkXGraph as input.
res.plan.algos.traversal.bfs_iter(g, 0)
nx_breadth_first_search_iter (graph: NetworkXGraphType({}), source_node: NodeID, depth_limit: int = -1) -> NumpyVectorType({}) ===================== Argument Translations --------------------- ** graph ** NetworkXGraphType ** source_node ** NodeID ** depth_limit ** int ---------------------
In the next example, g2 also satisfies a concrete implementation, so no input translation is required.
g2
res.plan.algos.traversal.bfs_iter(g2, 0)
ss_breadth_first_search_iter (graph: ScipyGraphType({}), source_node: NodeID, depth_limit: int = -1) -> NumpyVectorType({}) ===================== Argument Translations --------------------- ** graph ** ScipyGraphType ** source_node ** NodeID ** depth_limit ** int ---------------------
Let’s look at the same pieces of information, but for pagerank. Pagerank takes a Graph and returns a NodeMap indicating the rank value of each node in the graph.
NodeMap
First, let’s verify the signature and the implementations available.
We see that there are two implementations available, taking a NetworkXGraph or GrblasGraph as input.
res.algos.centrality.pagerank.signatures
Signature: (graph: Graph({'edge_type': 'map', 'edge_dtype': ('float', 'int')}), damping: float = 0.85, maxiter: int = 50, tolerance: float = 1e-05) -> metagraph.types.NodeMap Implementations: (graph: GrblasGraphType({}), damping: float = 0.85, maxiter: int = 50, tolerance: float = 1e-05) -> GrblasNodeMapType({}) (graph: NetworkXGraphType({}), damping: float = 0.85, maxiter: int = 50, tolerance: float = 1e-05) -> PythonNodeMapType({})
Let’s look at the steps required in the plan if we start with a ScipyGraph. Then let’s perform the computation.
We see that the ScipyGraph will need to be translated to a GrblasGraph in order to call the algorithm. Metagraph will do this for us automatically.
res.plan.algos.centrality.pagerank(g2)
grblas_pagerank (graph: GrblasGraphType({}), damping: float = 0.85, maxiter: int = 50, tolerance: float = 1e-05) -> GrblasNodeMapType({}) ===================== Argument Translations --------------------- ** graph ** [Direct Translation] ScipyGraphType -> GrblasGraphType ** damping ** float ** maxiter ** int ** tolerance ** float ---------------------
pr = res.algos.centrality.pagerank(g2) pr
<metagraph.plugins.graphblas.types.GrblasNodeMap at 0x7f30bc1e97d0>
The result is a GrblasNodeMap which can be inspected by looking at the underlying .value.
GrblasNodeMap
.value
pr.value
grblas.Vector
size
Let’s translate it to a numpy array.
pr_array = res.translate(pr, res.types.NodeMap.NumpyNodeMapType) pr_array.value
array([0.11990989, 0.11990989, 0.12919109, 0.11990989, 0.19538403, 0.13300793, 0.09304149, 0.08964579])
Now let’s verify that we get the same answer with the NetworkX implementation of Pagerank. We can ensure the NetworkX implementation is called by passing in a NetworkXGraph. Because no translations are required, it will choose that implementation.
The result is a PythonNodeMapType, which is simply a Python dict.
PythonNodeMapType
dict
pr2 = res.algos.centrality.pagerank(g) pr2
{0: 0.11990989117844908, 1: 0.11990989117844908, 3: 0.11990989117844908, 4: 0.1953840289789895, 2: 0.12919108800740858, 5: 0.13300793197881575, 6: 0.09304148578762082, 7: 0.08964579171181795}
Translate to a numpy array and verify the same results (within tolerance)
pr2_array = res.translate(pr2, res.types.NodeMap.NumpyNodeMapType) pr2_array.value
abs(pr2_array - pr_array.value) < 1e-15
array([ True, True, True, True, True, True, True, True])