Contents of the page


On this page, we discuss how to consider a trade-off between speed and correctness in the case of approximate algorithms. Such trade-off need to be considered whenever exact solutions are not feasible.


So, here we are again, nice to see you’ve made it to week 5! Today we’ll talk about comparing implementations. When facing practical problems like the Traveling Salesman Problem, the TSP, we’d obviously prefer the solution that’s both fast and correct.

Pareto frontier and Pareto optimal

In the case of NP-complete problems, which is the case for the TSP, we know that this just isn’t possible. We have to sacrifice speed, correctness, or a bit of both. This kind of trade-off is often referred to as a Pareto frontier, which we can define in the following way: what’s the fastest algorithm we can find to solve a problem with a given amount of correctness?

Pareto

We call Pareto optimal a point that is lying on a Pareto frontier, which means it corresponds to an optimal trade-off between speed, and correctness. A Pareto optimal is typically an optimal that depends on the relative importance of two or more parameters, such as the size of the problem we are dealing with. 

Measuring speed and correctness

Let’s define some notions that are related to measuring speed and correctness. We can measure how fast an algorithm is using complexity or measures of time of execution. But the correctness of an algorithm is tricky to define, which makes it difficult to measure.

For the TSP, we know that we want to find a shortest path, and so correctness could simply be measured as a deviation from the length of the shortest path.

But in practice, it isn’t always so simple. Finding the actual shortest path can be impossible if there are more than a few dozen vertices, as we saw that the complexity of an exhaustive search scales exponentially. This means that correctness of an algorithm approaching a solution for the TSP is very hard to estimate.

Comparing algorithms

What we know is that the more complexity we have to deal with, the better the solution should be. In the worst case, we can always rely on a less complex solution that has already been proposed.

Let’s take the TSP again as an example. There are many intermediate steps between the greedy algorithm, which corresponds to always targetting the next closest city, and the brute force one, which exhaustively examines all possible paths to visit every city. Finding an intermediate between these two extremes should only be considered if they actually provide improvements in complexity. 

To determine whether a proposed solution is efficient, you should always compare it with a simple approach, both in terms of correctness and complexity. If your solution needs to be 10 times more complex to provide a 1 % improvement in terms of correctness, maybe it’s not worth the effort, except if this 1 % improvement makes a significant difference compared with other solutions.


Example: coloring graphs

Algorithms

Let us consider the following example: we call coloring of a graph a vector of integers, each associated with one of its vertices. These integers are called vertex colors, and can be identical.

A coloring is said to be clean if all vertices connected by an edge are of different colors. The chromatic number of a graph is the minimum number of colors that appear in a specific coloring.

This problem is a known example of a NP-Complete problem. One way to solve it accurately is to list all possible 1-color colorings, then 2-colors, etc. until you find a clean coloring.

The following Python code solves the problem as described above:

# Easy iteration over permutations
import itertools



# Function to check if a coloring is correct
# A coloring is correct if no neighbours share a color
def check_coloring (graph, colors) :
    
    # We test the colors of every pair of connected nodes
    for vertex in range(len(graph)) : 
        for neighbor in graph[vertex] :
            if colors[vertex] == colors[neighbor] :
                return False
    return True



# This function returns a coloring of the given graph using a minimum number of colors
def find_coloring (graph) :
    
    # We gradually increase the number of available colors
    for nb_colors in range(len(graph)) :
        
        # We test all possible arrangements of colors
        # This could be improved as product(2, ...)  is a subset of product(3, ...) for instance
        for coloring in itertools.product(range(nb_colors), repeat=len(graph)) :
            print("Nb colors :", nb_colors, "- Candidate coloring :", coloring)
            if check_coloring(graph, coloring) :
                return coloring

    
    
# Test graph
graph = [[1, 2, 5], [0, 2, 5], [0, 1], [4, 5], [3, 5], [0, 1, 3, 4]]
result = find_coloring(graph)
print(result)

In that example, a graph is implemented in the form of a list of lists. It contains 6 vertices, and 8 (symetric) edges. The solution returned by the algorithm is:

(0, 1, 2, 0, 1, 2)

This indicates that three colors are sufficient to have a clean coloring of the graph, with v_1 and v_2 sharing color 0, v_3 and v_4 sharing color 1, and v_5 and v_6 sharing color 2.

This algorithm is very complex and a well-known approximate solution is to sort the vertices by decreasing number of neighbors, coloring them in this order by choosing the smallest positive color that leaves the coloring clean. This approximate algorithm is described below:

# For min-heaps
import heapq



# Function to check if a coloring is correct
# A coloring is correct if no neighbours share a color
def check_coloring (graph, colors) :
    
    # We test the colors of every pair of connected nodes
    for vertex in range(len(graph)) :
        if colors[vertex] is not None :
            for neighbor in graph[vertex] :
                if colors[neighbor] is not None :
                    if colors[vertex] == colors[neighbor] :
                        return False
    return True
    
    
 
# This function greedily tries to color the graph from highest degree node to lowest degree one
def greedy_coloring (graph) :
    
    # Sorting nodes in descending degree order using a max-heap (negative min-heap)
    heap = []
    for vertex in range(len(graph)) :
        heapq.heappush(heap, (-len(graph[vertex]), vertex))
    
    # Coloring
    colors = [None] * len(graph)
    while len(heap) > 0 :
        degree, vertex = heapq.heappop(heap)
        for color in range(len(graph)) :
            colors[vertex] = color
            if check_coloring(graph, colors) :
                break
    return colors
    
    
    
# Test graph
graph = [[1, 2, 5], [0, 2, 5], [0, 1], [4, 5], [3, 5], [0, 1, 3, 4]]
result = greedy_coloring(graph)
print(result)

This algorithm is much less complex than the previous one, and therefore allows considering graphs with a lot more vertices.

Simulations

In order to evaluate the quality of our approximate algorithm, we will focus on two things: computation time and accuracy. To test these two quantities, we will generate a large number of random graphs using the following program:

# Various imports
import math
import random
import heapq



# Constants
NB_NODES = 100
EDGE_PROBABILITY = math.log(NB_NODES) / NB_NODES
NB_TESTS = 1000



# Generates an Erdos-Renyi graph
def generate_graph () :
    graph = []
    for i in range(NB_NODES) :
        graph.append([])
    for i in range(NB_NODES) :
        for j in range(i + 1, NB_NODES) :
            if random.random() < EDGE_PROBABILITY :
                graph[i].append(j)
                graph[j].append(i)
    return graph



# Function to check if a coloring is correct
# A coloring is correct if no neighbours share a color
def check_coloring (graph, colors) :
    
    # We test the colors of every pair of connected nodes
    for vertex in range(len(graph)) :
        if colors[vertex] is not None :
            for neighbor in graph[vertex] :
                if colors[neighbor] is not None :
                    if colors[vertex] == colors[neighbor] :
                        return False
    return True
    
    
 
# This function greedily tries to color the graph from highest degree node to lowest degree one
def greedy_coloring (graph) :
    
    # Sorting nodes in descending degree order using a max-heap (negative min-heap)
    heap = []
    for vertex in range(len(graph)) :
        heapq.heappush(heap, (-len(graph[vertex]), vertex))
    
    # Coloring
    colors = [None] * len(graph)
    while len(heap) > 0 :
        degree, vertex = heapq.heappop(heap)
        for color in range(len(graph)) :
            colors[vertex] = color
            if check_coloring(graph, colors) :
                break
    return colors


# Tests
nb_colors = 0.0
for i in range(NB_TESTS) :
    solution = greedy_coloring(generate_graph())
    nb_colors += len(set(solution)) / NB_TESTS
print(nb_colors)

We call this program measure_greedy.py and use a similar program measure_exhaustive.py for the exhaustive solution.

Computation time

To measure execution time, we use the Unix time command:

time python measure_greedy.py

This generates the following output:

4.36799999999995

real	0m5,001s
user	0m4,992s
sys	0m0,008s

The first line is the result of our algorithm, i.e., the number of colors used in average for the 1,000 tests.

Then we can read different times. Real is the time actually elapsed. User corresponds to the time consumed by the user on your machine for this task. Sys is the time used by the system. We are interested in User here.

Note that these measurements include graph creation. In order to measure a more precise aspect of the program, i.e., the coloring, we can adapt our Python code as follows:

# Various imports
import math
import random
import heapq
import time



# Constants
NB_NODES = 100
EDGE_PROBABILITY = math.log(NB_NODES) / NB_NODES
NB_TESTS = 1000



# Generates an Erdos-Renyi graph
def generate_graph () :
    graph = []
    for i in range(NB_NODES) :
        graph.append([])
    for i in range(NB_NODES) :
        for j in range(i + 1, NB_NODES) :
            if random.random() < EDGE_PROBABILITY :
                graph[i].append(j)
                graph[j].append(i)
    return graph



# Function to check if a coloring is correct
# A coloring is correct if no neighbours share a color
def check_coloring (graph, colors) :
    
    # We test the colors of every pair of connected nodes
    for vertex in range(len(graph)) :
        if colors[vertex] is not None :
            for neighbor in graph[vertex] :
                if colors[neighbor] is not None :
                    if colors[vertex] == colors[neighbor] :
                        return False
    return True
    
    
 
# This function greedily tries to color the graph from highest degree node to lowest degree one
def greedy_coloring (graph) :
    
    # Sorting nodes in descending degree order using a max-heap (negative min-heap)
    heap = []
    for vertex in range(len(graph)) :
        heapq.heappush(heap, (-len(graph[vertex]), vertex))
    
    # Coloring
    colors = [None] * len(graph)
    while len(heap) > 0 :
        degree, vertex = heapq.heappop(heap)
        for color in range(len(graph)) :
            colors[vertex] = color
            if check_coloring(graph, colors) :
                break
    return colors



# Start timing
time_start = time.time()



# Tests
nb_colors = 0.0
for i in range(NB_TESTS) :
    solution = greedy_coloring(generate_graph())
    nb_colors += len(set(solution)) / NB_TESTS
print(nb_colors)



# Stop timing
print("Elapsed", time.time() - time_start)

This generates the following output:

4.363999999999949
Elapsed 4.759778022766113

Making curves

In order to automatize things a bit, we will adapt our code to take argument out of the command line:

# Various imports
import math
import random
import heapq
import time
import sys



# Arguments
NB_NODES = sys.argv[1]
EDGE_PROBABILITY = math.log(NB_NODES) / NB_NODES
NB_TESTS = sys.argv[2]



# Generates an Erdos-Renyi graph
def generate_graph () :
    graph = []
    for i in range(NB_NODES) :
        graph.append([])
    for i in range(NB_NODES) :
        for j in range(i + 1, NB_NODES) :
            if random.random() < EDGE_PROBABILITY :
                graph[i].append(j)
                graph[j].append(i)
    return graph



# Function to check if a coloring is correct
# A coloring is correct if no neighbours share a color
def check_coloring (graph, colors) :
    
    # We test the colors of every pair of connected nodes
    for vertex in range(len(graph)) :
        if colors[vertex] is not None :
            for neighbor in graph[vertex] :
                if colors[neighbor] is not None :
                    if colors[vertex] == colors[neighbor] :
                        return False
    return True
    
    
 
# This function greedily tries to color the graph from highest degree node to lowest degree one
def greedy_coloring (graph) :
    
    # Sorting nodes in descending degree order using a max-heap (negative min-heap)
    heap = []
    for vertex in range(len(graph)) :
        heapq.heappush(heap, (-len(graph[vertex]), vertex))
    
    # Coloring
    colors = [None] * len(graph)
    while len(heap) > 0 :
        degree, vertex = heapq.heappop(heap)
        for color in range(len(graph)) :
            colors[vertex] = color
            if check_coloring(graph, colors) :
                break
    return colors



# Start timing
time_start = time.time()



# Tests
nb_colors = 0.0
for i in range(NB_TESTS) :
    solution = greedy_coloring(generate_graph())
    nb_colors += len(set(solution)) / NB_TESTS
print(nb_colors)



# Stop timing
print("Elapsed", time.time() - time_start)

We can now run the following command:

for n in {1..10}; do python measure_greedy.py ${n}0 100 >> results.txt ; done

And using Gnuplot, we get the following curve:

The curve is not very smooth because we have only run 100 tests for averaging here.

Similarly, let us compare the performance (in terms of precision and execution time) of the two algorithms:

Clearly, the gain in execution time is significant.

Looking at the precision curve, it seems that the precision loss is not very high, at least for the problem sizes that could be computed using the exhaustive approach. For larger graphs, we cannot compute the exact solution due to complexity of finding it.

In order to evaluate a bit more that aspect, we are going to evaluate the solutions found using the greedy approach on bipartite graphs, for which we know the chromatic number is always 2:

Here again, the result seems reasonable, which leads us think this heuristic is well-suited to the problem.


To go further


Quiz

An approximate solution is used...
A Pareto frontier...
A Pareto optimal is...