Min-heaps

Welcome back! Great to see you again!

Last week, we looked at how to implement the DFS and BFS using queuing structures. This week, to implement Dijkstra’s algorithm, we’re going to need a queuing structure that’s a bit more elaborate: one that’s going to take the weights in the graph into account.

Let’s take a look at another queuing structure to implement Dijkstra’s algorithm. Say hello to the min-heap.

A min-heap is a data structure in which the elements are stored as a couple *(key, value)*. Value is a quantity that has to be part of a totally ordered set, for example, the set of integers or real numbers.

With a min-heap, two elementary operations can be performed. The first one is add-or-replace. The add-or-replace operation is used to add a (key, value) couple to the min-heap. If the key already exists, and its associated value is larger than the one we want to add, then the value is updated with the new value. If the value is smaller, then the operation is ignored.

The second elementary operation of the min-heap is remove, which will return the (key, value) couple corresponding to the minimum value in the min-heap.

Let’s illustrate the use of a min-heap with an example. We initialize an empty min-heap.

Let’s add three *(key, value)* couples: $(A,50)$ , $(B,22)$ and $(C,10)$ .

Now let’s remove a value from this min-heap. This operation will return $(C, 10)$ , and the status of the min-heap will be $(A,50)$ , $(B, 22)$ .

Let’s now add-or-replace in the min-heap with the couple $(B, 45)$ . Because 45 is larger than 22, nothing happens.

Now, let’s add-or-replace with the couple $(A, 35)$ . 35 is smaller than 50, and so the entry corresponding to $A$ will be updated, resulting in the following min-heap.

Finally, let’s remove an element from the min-heap. This will return $(B,22)$ , and update the min-heap as follows.

Got it? Now let’s see how we can use a min-heap to implement the Dijkstra algorithm.

Dijkstra using a min-heap

The goal of Dijkstra’s algorithm is to find shortest paths between a starting position and all other vertices in a weighted graph. To implement this algorithm using a min-heap, all we have to do is to add vertices as keys, and distances to the starting position as associated values.

We can simply initialize an empty min-heap, and add the starting vertex with a value of 0.

Next, we’ll perform a loop while the min-heap is not empty. We remove a *(key, value)* pair from the min-heap, which will return a vertex and its distance.

Next, we examine each neighbor of the vertex that we’ve just removed from the min-heap. Distances are updated by adding the stored distance and the edge weight corresponding to this neighbor.

Then, we add-or-replace the entry corresponding to this neighbor in the min-heap, which will automatically update the distance to the minimum one, if needed.

This process continues until the min-heap is empty, corresponding to the fact that all vertices have been visited.

def dijkstra(graph, start_vertex):

    # initialize
    min_heap = new min_heap
    add_or_replace(min_heap, start_vertex, 0)
    
    # algorithm loop
    while not(empty(min_heap)):
        v, distance = remove(min_heap)
        for each i neighbor of v:
            distance_through_v = distance + graph[v][i]
            add_or_replace(min_heap, i, distance_through_v)

Concluding words

So, that’s it for today, see you for the next video!

Remark #1: In this video, we have used the terms key (the thing we want to store) and value (its importance in the heap). Other terminologies exist! The only important aspect is that items in the min-heaps should be sorted according to elements of an ordered set.

Remark #2: In the video, you are presented a possible implementation of Dijkstra’s algorithm, that uses a function called add_or_replace. Here, we just explain you what the function does but do not provide its code. When implementing the algorithm in Python, you will need to create such a function, or to think otherwise… Indeed, a naive implementation of add_or_replace may have a large complexity, which would impact code performance and energy consumption. On the other hand, we could notice that it is not really a problem if an element has multiple occurrences in the structure, if we can determine which one has the smallest associated value. You will learn more about that in the Lab.

Naive implementation of a priority queue using a list

A simple (yet quite inefficient) way to implement a priority queue is to use a list. This list stores pairs (key, value). Here, we show here how to create a min-heap using a list.

Creation

To create a priority queue using a list, simply create an empty list:

def priority_queue () :
    return []

Vacuity test

To test if a priority queue is empty, simply compare it with the empty list:

def is_empty (queue) :
    return queue == priority_queue()

Insertion

For insertion and recovery, a choice is available to us. We can either do the expensive work at the time of insertion, for example by maintaining our list sorted in ascending order of values:

def insert (queue, key, value) :
    for i in range(len(queue)) :
        key_i, value_i = queue[i]
        if value < value_i  :
            return queue[:i] + [(key, value)] + queue[i:]
    return queue + [(key, value)]

Or do so at the time of recovery, looking for the smallest value. So we simply add the item to the list:

def insert (queue, key, value) :
    return queue + [(key, value)]

Extraction

To retrieve the element with the smallest value in this priority queue (and the remaining priority queue), we find ourselves with two versions, depending on the scenario chosen for insertion. For version 1:

def extract (queue) :
    return queue[0], queue[1:]

For version 2:

def extract (queue) :
    min_i = 0
    min_key, min_value = queue[min_i]
    for i in range(len(queue)) :
        key, value = queue[i]
        if value < min_value :
            min_key = key_i
            min_value = value_i
            min_i = i
    return (min_key, min_value), queue[:min_i] + queue[min_i+1:]

Tests

To test our functions, let’s execute the following commands:

# Structure preparation
queue = priority_queue()
queue = insert(queue, "five", 5)
queue = insert(queue, "three", 3)
queue = insert(queue, "eight", 8)

# [('three', 3), ('five', 5), ('eight', 8)] if solution 1 is chosen
# [('five', 5), ('three', 3), ('eight', 8)] if solution 2 is chosen
print(queue)

# "three", then "five", then "eight"
while not empty(queue) :
    (key, value), queue = extract(queue)
    print(value)

# []
print(queue)

Complexity

The problem with this implementation is the complexity of the operations. Depending on the version, either insertion or recovery requires browsing the entire list, leading to a number of elementary operations at least proportional to the size of the list (i.e., $O(n)$ , with $n$ the number of items in the list). In practice this complexity is far too great.

Implementation using a balanced binary tree

Definitions

Let us start with a few definitions. You should already know what a tree is, and what we call its root.

A vertex in a tree is a node if it has neighbors farther from the tree root (which we call children). If not, it is a leaf.

The depth of a tree is the length of the longest shortest path starting from the root.

A tree is balanced if all its shortest paths starting from the root have the same length, $\pm 1$ .

The tree model we will use

Instead of using a list, it is proposed to use a balanced binary tree with the following properties:

Each node of the tree is associated with a couple (key, value). This property makes it possible to link the tree to the elements to be stored/extracted.
Every parent node has two children, unless the total number of elements in the priority queue is even, in which case a single node has only one child and all the others have two.
Note: Ternary/quaternary/quaternary/etc. trees could also have been considered (this changes the number of children). As the number of children increases, the complexity of the insertion operation decreases, but the complexity of the extraction increases.
The tree is balanced. This property ensures that the tree does not have a shape similar to a long chain, by making a kind of list, thus losing all the interest of using trees.
A parent’s value is always smaller than that of his sons. This essential property ensures that the minimum value node is always at the root.

Creation

To initialize such a priority queue, simply return an empty tree.

Vacuity test

To test if a priority queue is empty, check if it is equal to an empty tree.

Insertion

To insert a couple (key, value) into the tree, proceed as follows:

The couple (key, value) is added at the end of the tree, i.e.:
- If all parent nodes have two children, a leaf of minimum depth is transformed into a parent of this node.
- If a parent node has only one child, it is added as a second child this node.
If the value of this new node is smaller than that of its parent, the couples (key, value) of the node and its parent are exchanged.
We continue by going back up the tree as much as necessary.

Since we are only replacing elements in the tree with ones of smaller values, the resulting tree keeps the stated properties if the original one had them.

Extraction

To retrieve an element, proceed as follows:

The couple (key, value) associated with the root of the tree is returned.
We replace it with one leaf of maximum depth of the tree.
If the new root has a value larger than one of its children, the couple (key, value) is exchanged with the child with smallest value.
We start again by going down as much as necessary.

Here again, it is easy to check that the resulting tree respects the stated properties if the starting tree also respects them.

Complexity

The complexity of insertion and extraction operations requires to go from the root to a leaf of maximum depth in the worst case. The tree being binary, this represents $log_2(n)$ operations, where $n$ is the number of elements in the queue. We are therefore much less complex than with a naive implementation using a list (where the number of operations was at least linear in $n$ ).

To go further

A more complete list of priority structures.
Priority structures help developing heuristics for visiting neighbors.
Python’s heapq module: in practice, you should use this for PyRat.

Min-heaps

Min-heaps

Dijkstra using a min-heap

Concluding words

Naive implementation of a priority queue using a list

Creation

Vacuity test

Insertion

Extraction

Tests

Complexity

Implementation using a balanced binary tree

Definitions

The tree model we will use

Creation

Vacuity test

Insertion

Extraction

Complexity

To go further

Quiz