Minimum spanning tree

data structure, subgraph of a weighted graph

A number of problems from graph theory are called Minimum spanning tree. In graph theory, a tree is a way of connecting all the vertices together, so that there is exactly one path from any one vertex, to any other vertex of the tree. If the graph represents a number of cities connected by roads, one could select a number of roads, so that each city can be reached from every other, but that there is no more than one way to travel from one city to another.

The minimum spanning tree of a planar graph. Each edge is labeled with its weight, which here is roughly proportional to its length.

A graph can have more than one spanning tree, just like there may be more than one way to select the roads between the cities.

Most of the time, graphs are weighted; each connection between two cities has a weight: It might cost something to travel on a given road, or one connection may be longer than the other, this means it takes more time to travel on that connection. In the language of graph theory, the connections are called edges.

A minimum spanning tree is a tree. It is different from other trees in that it minimizes the total of the weights attached to the edges. Depending on what the graph looks like, there may be more than one minimum spanning tree. In a graph where all the edges have the same weight, every tree is a minimum spanning tree. If all the edges have different weights (that is: there are no two edges with the same weight), there is exactly one minimal spanning tree.

Finding the minimum spanning tree

change

A first try

change

It can be very simple to make an algorithm that will discover a minimum spanning tree:

 function MST(G,W):
    T = {}
    while T does not form a spanning tree:
        find the minimum weighted edge in E that is safe for T
        T = T union {(u,v)}
    return T

In this case, "safe" means that including the edge does not form a cycle in the graph. A cycle means starting at a vertex, travelling to a number of other vertices and ending up at the starting point again without using the same edge twice.

History

change

Czech scientist Otakar Borůvka developed the first known algorithm for finding a minimum spanning tree in 1926. He wanted to solve the problem of finding an efficient coverage of Moravia with electricity. Today, this algorithm is known as Borůvka's algorithm. Two other algorithms are commonly used today. One of them was developed by Vojtěch Jarník in 1930, and put in practice by Robert Clay Prim in 1957. Edsger Wybe Dijkstra rediscovered it in 1959, and called it Prim's algorithm. The other algorithm is called Kruskal's algorithm, and was published by Joseph Kruskal in 1956. All three algorithms are greedy, and run in polynomial time.

The fastest minimum spanning tree algorithm to date was developed by Bernard Chazelle. The algorithm is based on the soft heap, an approximate priority queue.[1][2] Its running time is O(m α(m,n)), where m is the number of edges, n is the number of vertices and α is the classical functional inverse of the Ackermann function. The function α grows extremely slowly, so that for all practical purposes it may be considered a constant no greater than 4; thus Chazelle's algorithm takes very close to linear time.

What is the fastest possible algorithm for this problem? That is one of the oldest open questions in computer science. There is clearly a linear lower bound, since we must at least examine all the weights. If the edge weights are integers with a bounded bit length, then deterministic algorithms are known with linear running time.[3] For general weights, there are randomized algorithms whose expected running time is linear.[4][5]

The problem can also be approached in a distributed manner. If each node is considered a computer and no node knows anything except its own connected links, one can still calculate the distributed minimum spanning tree.

References

change
  1. Chazelle, B. 2000. A minimum spanning tree algorithm with inverse-Ackermann type complexity. J. ACM 47, 6 (Nov. 2000), 1028-1047.
  2. Chazelle, B. 2000. The soft heap: an approximate priority queue with optimal error rate. J. ACM 47, 6 (Nov. 2000), 1012-1027.
  3. Fredman, M. L. and Willard, D. E. 1994. Trans-dichotomous algorithms for minimum spanning trees and shortest paths. J. Comput. Syst. Sci. 48, 3 (Jun. 1994), 533-551.
  4. Karger, D. R., Klein, P. N., and Tarjan, R. E. 1995. A randomized linear-time algorithm to find minimum spanning trees. J. ACM 42, 2 (Mar. 1995), 321-328.
  5. Pettie, S. and Ramachandran, V. 2002. Minimizing randomness in minimum spanning tree, parallel connectivity, and set maxima algorithms. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms (San Francisco, California, January 06 - 08, 2002). Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, 713-722.