List of Core Algorithms

This document lists the core abstract algorithms in Metagraph.

Bipartite

Bipartite Graphs contain two unique sets of nodes. Edges can exist between nodes from different groups, but not between nodes of the same group.

bipartite.graph_projection(bgraph: BipartiteGraph, nodes_retained: int = 0) → Graph

Given a bipartite graph, project a graph for one of the two node groups (group 0 or 1).

Centrality

Many algorithms assign a ranking or value to each vertex/node in the graph based on different properties. This is usually done to find the most important nodes for that metric.

centrality.betweenness(graph: Graph(edge_type=’map’, edge_dtype={‘int’, ‘float’}), nodes: Optional[NodeSet] = None, normalize: bool = False) → NodeMap

This algorithm calculates centrality based on the number of shortest paths passing through a node.

If nodes are provided, only computes an approximation of betweenness centrality based on those nodes.

centrality.closeness(graph: Graph(edge_type=’map’, edge_dtype={‘int’, ‘float’}), nodes: Optional[NodeSet] = None) → NodeMap

Calculates the closeness centrality metric, which estimates the average distance from a node to all other nodes. A high closeness score indicates a small average distance to other nodes.

centrality.degree(graph: Graph, in_edges: bool = False, out_edges: bool = True) → NodeMap

Calculates the degree centrality for each node. The degree centrality for a node is its degree over the number of nodes minus 1.

If in_edges and out_edges are both false, the degree centrality for all nodes is 0. If the graph is undirected, setting in_edges or out_edges or both to true will give identical results (edges will only be counted once per node). If the graph is directed, in_edges and out_edges dictate which edges are considered for each node.

centrality.eigenvector(graph: Graph(edge_type=’map’, edge_dtype={‘int’, ‘float’})) → NodeMap

Calculates the eigenvector centrality, which estimates the importance of a node in the graph.

centrality.hits(graph: Graph(edge_type=’map’, edge_dtype={‘int’, ‘float’}), max_iter: int = 100, tol: float = 1e-05, normalize: bool = True) → Tuple[NodeMap, NodeMap]

Hyperlink-Induced Topic Search (HITS) centrality ranks nodes based on incoming and outgoing edges.

Return type:

(hubs, authority)

centrality.katz(graph: Graph(edge_type=’map’, edge_dtype={‘int’, ‘float’}), attenuation_factor: float = 0.01, immediate_neighbor_weight: float = 1.0, maxiter: int = 50, tolerance: float = 1e-05) → NodeMap

This algorithm calculates centrality based on total number of walks (as opposed to only considering shortest paths) passing through a node.

centrality.pagerank(graph: Graph(edge_type=’map’, edge_dtype={‘int’, ‘float’}), damping: float = 0.85, maxiter: int = 50, tolerance: float = 1e-05) → NodeMap

This algorithm determines the importance of a given node in the network based on links between important nodes.

Clustering

Graphs often have natural structure which can be discovered, allowing them to be clustered into different groups or partitions.

clustering.coloring.greedy(graph: Graph(is_directed=False)) → Tuple[NodeMap, int]

Attempts to find the minimum number of colors required to label the graph such that no connected nodes have the same color. Color is represented as a value from 0..n.

Return type:

(color for each node, number of unique colors)

clustering.connected_components(graph: Graph(is_directed=False)) → NodeMap

The connected components algorithm groups nodes of an undirected graph into subgraphs where all subgraph nodes are reachable within a component.

Return type:

a dense NodeMap where each node is assigned an integer indicating the component.

clustering.strongly_connected_components(graph: Graph(is_directed=True)) → NodeMap

Groups nodes of a directed graph into subgraphs where all subgraph nodes are reachable by each other along directed edges.

Return type:

a dense NodeMap where each node is assigned an integer indicating the component.

clustering.global_clustering_coefficient(graph: Graph(is_directed=False)) → float

This algorithm returns the global clustering coefficient. The global clustering coefficient is the number of closed triplets over the total number of triplets in a graph. A triplet in a graph is a subgraph of 3 nodes where at least 2 edges are present. An open triplet has exactly 2 edges. A closed triplet has exactly 3 edges. A deeped explanation can be found here.

clustering.label_propagation_community(graph: Graph(is_directed=False)) → NodeMap

This algorithm discovers communities using label propagation.

Return type:

a dense NodeMap where each node is assigned an integer indicating the community.

clustering.louvain_community(graph: Graph(is_directed=False, edge_type=’map’, edge_dtype={‘int’, ‘float’})) → Tuple[NodeMap, float]

This algorithm performs one step of the Louvain algorithm, which discovers communities by maximizing modularity.

Return type:

  • a dense NodeMap where each node is assigned an integer indicating the community

  • the modularity score

clustering.triangle_count(graph: Graph(is_directed=False)) → int

This algorithm returns the total number of triangles in the graph.

Embedding

Embeddings convert graph nodes or whole graphs into a dense vector representations.

embedding.apply.graph_sage(embedding: GraphSageNodeEmbedding, graph: Graph, node_features: Matrix, node2row: NodeMap) → Matrix

Returns a dense matrix from a GraphSage embedding.

embedding.apply.nodes(matrix: Matrix, node2row: NodeMap, nodes: Vector) → Matrix

Returns a dense matrix given an embedding, node-to-row mapping, and a vector of NodeIDs.

embedding.train.graph2vec(graphs: mg.List[Graph(edge_type=’set’, is_directed=False)], subgraph_degree: int, embedding_size: int, epochs: int, learning_rate: float) → Matrix

Computes the graph2vec embedding.

embedding.train.graph_sage.mean(graph: Graph(edge_type=’map’, is_directed=True), node_features: Matrix, node2row: NodeMap, walk_length: int, walks_per_node: int, layer_sizes: Vector, samples_per_layer: Vector, epochs: int, learning_rate: float, batch_size: int) → GraphSageNodeEmbedding

Computes the GraphSAGE embedding.

embedding.train.graphwave(graph: Graph(edge_type=’set’, is_directed=False), scales: Vector, sample_point_count: int, sample_point_max: float, chebyshev_degree: int) → Tuple[Matrix, NodeMap]

Computes the graphwave embedding.

embedding.train.hope.katz(graph: Graph(edge_type=’map’, is_directed=True), embedding_size: int, beta: float) → Tuple[Matrix, NodeMap]

Computes the High-Order Proximity preserved Embedding (HOPE).

embedding.train.line(graph: Graph, walks_per_node: int, negative_sample_count: int, embedding_size: int, epochs: int, learning_rate: float, batch_size: int) → Tuple[Matrix, NodeMap]

Computes the Large-scale Information Network Embedding (LINE).

embedding.train.node2vec(graph: Graph, p: float, q: float, walks_per_node: int, walk_length: int, embedding_size: int, epochs: int, learning_rate: float) → Tuple[Matrix, NodeMap]

Computes the node2vec embedding.

Flow

Algorithms pertaining to the flow capacity of edges.

flow.max_flow(graph: Graph(edge_type=’map’, edge_dtype={‘int’, ‘float’}), source_node: NodeID, target_node: NodeID) → Tuple[float, Graph]

Compute the maximum flow possible from source_node to target_node.

Return type:

(max flow rate, computed flow graph)

flow.min_cut(graph: Graph(edge_type=’map’, edge_dtype={‘int’, ‘float’}), source_node: NodeID, target_node: NodeID) → Tuple[float, Graph]

Compute the minimum cut to separate source from target node. This is the list of edges which disconnect the graph along edges with sum to the minimum weight. Performing this computation yields the maximum flow.

Return type:

(max flow rate, graph containing cut edges)

Subgraph

Graphs are often too large to handle, so a portion of the graph is extracted. Often this subgraph must satisfy certain properties or have properties similar to the original graph for the subsequent analysis to give good results.

subgraph.extract_subgraph(graph: Graph, nodes: NodeSet) → Graph

Given a set of nodes, this algorithm extracts the subgraph containing those nodes and any edges between those nodes.

subgraph.k_core(graph: Graph(is_directed=False), k: int) → Graph

This algorithm finds a maximal subgraph that contains nodes of at least degree k.

subgraph.k_truss(graph: Graph(is_directed=False), k: int) → Graph

Finds the maximal subgraph whose edges are supported by k - 2 other edges forming triangles.

subgraph.maximal_independent_set(graph: Graph) → NodeSet

Finds a maximal set of independent nodes, meaning the nodes in the set share no edges with each other and no additional nodes in the graph can be added which satisfy this criteria.

subgraph.sample.edge_sampling(graph: Graph, p: float = 0.2) → Graph

Returns a subgraph created by randomly sampling edges and including both node endpoints.

subgraph.sample.node_sampling(graph: Graph, p: float = 0.2) → Graph

Returns a subgraph created by randomly sampling nodes and including edges which exist between sampled nodes in the original graph.

subgraph.sample.random_walk(graph: Graph, num_steps: Optional[int] = None, num_nodes: Optional[int] = None, num_edges: Optional[int] = None, jump_probability: int = 0.15, start_node: Optional[NodeID] = None) → Graph

Samples the graph using a random walk. For each step, there is a jump_probability to reset the walk. When resetting the walk, if the start_node is specified, it always returns to this node. Otherwise a random node is chosen for each resetting. The sampling stops when any of num_steps, num_nodes, or num_edges is reached.

subgraph.sample.ties(graph: Graph, p: float = 0.2) → Graph

Totally Induced Edge Sampling extends edge sampling by also including any edges between the nodes which exist in the original graph. See the paper for more details.

subgraph.subisomorphic(graph: Graph, subgraph: Graph) → bool

Indicates whether subgraph is an isomorphic subcomponent of graph.

Traversal

Traversing through the nodes of a graph is extremely common and important in the area of search and understanding distance between nodes.

traversal.all_pairs_shortest_paths(graph: Graph(edge_type=’map’, edge_dtype={‘int’, ‘float’})) → Tuple[Graph, Graph]

This algorithm calculates the shortest paths between all node pairs. Choices for which algorithm to be used are backend implementation dependent.

Return type:

(parents, distance)

Finds the (possibly non-unique) shortest path via the A* algorithm. heuristic_func is a unary function that takes a node id and returns an estimated distance to target_node.

Return type:

Vector of node ids specifying the path from source_node to target_node

traversal.bellman_ford(graph: Graph(edge_type=’map’, edge_dtype={‘int’, ‘float’}), source_node: NodeID) → Tuple[NodeMap, NodeMap]

This algorithm calculates single-source shortest path (SSSP). It is slower than Dijkstra’s algorithm, but can handle negative weights and is parallelizable.

Return type:

(parents, distance)

traversal.bfs_iter(graph: Graph, source_node: NodeID, depth_limit: int = - 1) → Vector

Breadth-first search algorithm.

Return type:

Node IDs in breadth-first search order

traversal.bfs_tree(graph: Graph, source_node: NodeID, depth_limit: int = - 1) → Tuple[NodeMap, NodeMap]

Breadth-first search algorithm. The return result parents will have the parent of source_node be source_node.

Return type:

(depth, parents)

traversal.dfs_iter(graph: Graph, source_node: NodeID) → Vector

Depth-first search algorithm.

Return type:

Node IDs in depth-first search order

traversal.dfs_tree(graph: Graph, source_node: NodeID) → NodeMap

Depth-first search algorithm. The return result parents will have the parent of source_node be source_node.

Return type:

parents

traversal.dijkstra(graph: Graph(edge_type=’map’, edge_dtype={‘int’, ‘float’}, edge_has_negative_weights=False), source_node: NodeID) → Tuple[NodeMap, NodeMap]

Calculates single-source shortest path (SSSP) via Dijkstra’s algorithm.

Return type:

(parents, distance)

traversal.minimum_spanning_tree(graph: Graph(is_directed=False, edge_type=’map’, edge_dtype={‘int’, ‘float’})) → Graph

Minimum spanning tree (or forest in the case of multiple connected components in the graph).

Return type:

Graph containing only the relevant edges from the original graph

Utility

These algorithms are small utility functions which perform common operations needed in graph analysis.

util.edgemap.from_edgeset(edgeset: EdgeSet, default_value: Any) → EdgeMap

Converts and EdgeSet into an EdgeMap by giving each edge a default value.

util.graph.aggregate_edges(graph: Graph(edge_type="map"), func: Callable[[Any, Any], Any]), initial_value: Any, in_edges: bool = False, out_edges: bool = True) → NodeMap

Aggregates the edge weights around a node, returning a single value per node.

If in_edges and out_edges are False, each node will contain the initial value. For undirected graphs, setting in_edges or out_edges or both to true will give identical results (edges will only be counted once per node). For directed graphs, in_edges and out_edges affect the result. Setting both will still only give a single value per node, combining all outbound and inbound edge weights.

util.graph.assign_uniform_weight(graph: Graph, weight: Any = 1)

Update all edge weights (or if none exist, assign them) to a uniform value of weight.

util.graph.build(edges: Union[EdgeSet, EdgeMap], nodes: Optional[Union[NodeSet, NodeMap]] = None) → Graph

Given edges and possibly nodes, build a Graph.

If nodes are not provided, assume the only nodes are those found in the EdgeSet/Map.

util.graph.collapse_by_label(graph: Graph(is_directed=False), labels: NodeMap, aggregator: Callable[[Any, Any], Any]) → Graph

Collapse a Graph into a smaller Graph by combining clusters of nodes into a single node. labels indicates the node groupings. aggregator indicates how to combine edge weights.

util.graph.degree(graph: Graph, in_edges: bool = False, out_edges: bool = True) → NodeMap

Computes the degree of each node. in_edges and out_edges can be used to control which degree is computed.

util.graph.filter_edges(graph: Graph(edge_type=’map’), func: Callable[[Any], bool]) → Graph

Removes edges if filter function returns True. All nodes remain, even if they becomes isolate nodes in the graph.

util.graph.isomorphic(g1: Graph, g2: Graph) → bool

Indicates whether g1 and g2 are isomorphic.

util.nodemap.apply(x: NodeMap, func: Callable[[Any], Any]) → NodeMap

Applies a unary function to every node, mapping the values to different values.

util.nodemap.filter(x: NodeMap, func: Callable[[Any], bool]) → NodeSet

Filters a NodeMap based on values passed through the filter function. Returns a set of nodes where the function returned True.

util.nodemap.reduce(x: NodeMap, func: Callable[[Any, Any], Any]) → Any

Performs a reduction across all nodes, collapsing the values into a single result.

util.nodemap.select(x: NodeMap, nodes: NodeSet) → NodeMap

Selects certain nodes to keep from a NodeMap.

util.nodemap.sort(x: NodeMap, ascending: bool = True, limit: Optional[int] = None) → Vector

Sorts nodes by value, returning a Vector of NodeIDs.

util.nodeset.choose_random(x: NodeSet, k: int) → NodeSet

Given a set of nodes, choose k random nodes (no duplicates).

util.nodeset.from_vector(x: Vector) → NodeSet

Convert the values in a Vector into a NodeSet