This document lists the core abstract algorithms in Metagraph.
Graphs often have natural structure which can be discovered, allowing them to be clustered into different groups or partitions.
clustering.
connected_components
The connected components algorithm groups nodes of an undirected graph into subgraphs where all subgraph nodes are reachable within a component.
a dense NodeMap where each node is assigned an integer indicating the component.
strongly_connected_components
Groups nodes of a directed graph into subgraphs where all subgraph nodes are reachable by each other along directed edges.
label_propagation_community
This algorithm discovers communities using label propagation.
a dense NodeMap where each node is assigned an integer indicating the community.
louvain_community_step
This algorithm performs one step of the Louvain algorithm, which discovers communities by maximizing modularity.
a dense NodeMap where each node is assigned an integer indicating the community
the modularity score
cluster.
triangle_count
This algorithm returns the total number of triangles in the graph.
global_clustering_coefficient
This algorithm returns the global clustering coefficient. The global clustering coefficient is the number of closed triplets over the total number of triplets in a graph. A triplet in a graph is a subgraph of 3 nodes where at least 2 edges are present. An open triplet has exactly 2 edges. A closed triplet has exactly 3 edges. A deeped explanation can be found here.
clustering.coloring.
greedy
Attempts to find the minimum number of colors required to label the graph such that no connected nodes have the same color. Color is represented as a value from 0..n.
(color for each node, number of unique colors)
Traversing through the nodes of a graph is extremely common and important in the area of search and understanding distance between nodes.
traversal.
bellman_ford
This algorithm calculates single-source shortest path (SSSP). It is slower than Dijkstra’s algorithm, but can handle negative weights and is parallelizable.
(parents, distance)
all_pairs_shortest_paths
This algorithm calculates the shortest paths between all node pairs. Choices for which algorithm to be used are backend implementation dependent.
bfs_iter
Breadth-first search algorithm.
Node IDs in breadth-first search order
bfs_tree
Breadth-first search algorithm. The return result parents will have the parent of source_node be source_node.
parents
source_node
(depth, parents)
dfs_iter
Depth-first search algorithm.
Node IDs in depth-first search order
dfs_tree
Depth-first search algorithm. The return result parents will have the parent of source_node be source_node.
dijkstra
Calculates single-source shortest path (SSSP) via Dijkstra’s algorithm.
minimum_spanning_tree
Minimum spanning tree (or forest in the case of multiple connected components in the graph).
Graph containing only the relevant edges from the original graph
astar_search
Finds the (possibly non-unique) shortest path via the A* algorithm. heuristic_func is a unary function that takes a node id and returns an estimated distance to target_node.
heuristic_func
target_node
Vector of node ids specifying the path from source_node to target_node
Many algorithms assign a ranking or value to each vertex/node in the graph based on different properties. This is usually done to find the most important nodes for that metric.
centrality.
betweenness
This algorithm calculates centrality based on the number of shortest paths passing through a node.
If nodes are provided, only computes an approximation of betweenness centrality based on those nodes.
nodes
katz
This algorithm calculates centrality based on total number of walks (as opposed to only considering shortest paths) passing through a node.
pagerank
This algorithm determines the importance of a given node in the network based on links between important nodes.
closeness
Calculates the closeness centrality metric, which estimates the average distance from a node to all other nodes. A high closeness score indicates a small average distance to other nodes.
eigenvector
Calculates the eigenvector centrality, which estimates the importance of a node in the graph.
hits
Hyperlink-Induced Topic Search (HITS) centrality ranks nodes based on incoming and outgoing edges.
(hubs, authority)
degree
Calculates the degree centrality for each node. The degree centrality for a node is its degree over the number of nodes minus 1.
If in_edges and out_edges are both false, the degree centrality for all nodes is 0. If the graph is undirected, setting in_edges or out_edges or both to true will give identical results (edges will only be counted once per node). If the graph is directed, in_edges and out_edges dictate which edges are considered for each node.
in_edges
out_edges
Graphs are often too large to handle, so a portion of the graph is extracted. Often this subgraph must satisfy certain properties or have properties similar to the original graph for the subsequent analysis to give good results.
subgraph.
extract_subgraph
Given a set of nodes, this algorithm extracts the subgraph containing those nodes and any edges between those nodes.
k_core
This algorithm finds a maximal subgraph that contains nodes of at least degree k.
k
k_truss
Finds the maximal subgraph whose edges are supported by k - 2 other edges forming triangles.
maximal_independent_set
Finds a maximal set of independent nodes, meaning the nodes in the set share no edges with each other and no additional nodes in the graph can be added which satisfy this criteria.
subisomorphic
Indicates whether subgraph is an isomorphic subcomponent of graph.
subgraph
graph
subgraph.sample.
node_sampling
Returns a subgraph created by randomly sampling nodes and including edges which exist between sampled nodes in the original graph.
edge_sampling
Returns a subgraph created by randomly sampling edges and including both node endpoints.
ties
Totally Induced Edge Sampling extends edge sampling by also including any edges between the nodes which exist in the original graph. See the paper for more details.
random_walk
Samples the graph using a random walk. For each step, there is a jump_probability to reset the walk. When resetting the walk, if the start_node is specified, it always returns to this node. Otherwise a random node is chosen for each resetting. The sampling stops when any of num_steps, num_nodes, or num_edges is reached.
jump_probability
start_node
num_steps
num_nodes
num_edges
Bipartite Graphs contain two unique sets of nodes. Edges can exist between nodes from different groups, but not between nodes of the same group.
bipartite.
graph_projection
Given a bipartite graph, project a graph for one of the two node groups (group 0 or 1).
Algorithms pertaining to the flow capacity of edges.
flow.
max_flow
Compute the maximum flow possible from source_node to target_node.
(max flow rate, computed flow graph)
min_cut
Compute the minimum cut to separate source from target node. This is the list of edges which disconnect the graph along edges with sum to the minimum weight. Performing this computation yields the maximum flow.
(max flow rate, graph containing cut edges)
These algorithms are small utility functions which perform common operations needed in graph analysis.
util.nodeset.
choose_random
Given a set of nodes, choose k random nodes (no duplicates).
from_vector
Convert the values in a Vector into a NodeSet
util.nodemap.
sort
Sorts nodes by value, returning a Vector of NodeIDs.
select
Selects certain nodes to keep from a NodeMap.
filter
Filters a NodeMap based on values passed through the filter function. Returns a set of nodes where the function returned True.
apply
Applies a unary function to every node, mapping the values to different values.
reduce
Performs a reduction across all nodes, collapsing the values into a single result.
util.edgemap.
from_edgeset
Converts and EdgeSet into an EdgeMap by giving each edge a default value.
util.graph.
Computes the degree of each node. in_edges and out_edges can be used to control which degree is computed.
aggregate_edges
Aggregates the edge weights around a node, returning a single value per node.
If in_edges and out_edges are False, each node will contain the initial value. For undirected graphs, setting in_edges or out_edges or both to true will give identical results (edges will only be counted once per node). For directed graphs, in_edges and out_edges affect the result. Setting both will still only give a single value per node, combining all outbound and inbound edge weights.
filter_edges
Removes edges if filter function returns True. All nodes remain, even if they becomes isolate nodes in the graph.
assign_uniform_weight
Update all edge weights (or if none exist, assign them) to a uniform value of weight.
weight
build
Given edges and possibly nodes, build a Graph.
If nodes are not provided, assume the only nodes are those found in the EdgeSet/Map.
collapse_by_label
Collapse a Graph into a smaller Graph by combining clusters of nodes into a single node. labels indicates the node groupings. aggregator indicates how to combine edge weights.
labels
aggregator
isomorphic
Indicates whether g1 and g2 are isomorphic.
g1
g2
util.node_embedding.
Returns a dense matrix given an embedding and a vector of NodeIDs.
Embeddings convert graph nodes or whole graphs into a dense vector representations.
embedding.train.
node2vec
Computes the node2vec embedding.