Hi everyone! In this lesson, we will talk about how to choose data structures for representing graphs.

Representing data in computer memory

As I’m sure you’ll recall, computer memory is organized into locations in which data are stored.

Accessing data therefore requires us to know at which location (or address) they’ve been stored.
In the same way that it’s easier to find a house by its address rather than by its description, or a library book by its call number, it’s easier to find data when you know the specific address.

Basic data structures to represent graphs

Let’s imagine that we want to store the adjacency matrix of a graph.

One way to store data in memory is to use an array. Arrays are adjacent data structures that are used to represent sequences of data, where each piece of data uses the same size in memory.

So if you know the address of the first piece of data, the address of any other piece of data can easily be computed. For example, if the first element address is 45, and if each piece of data is stored using one address, then the fourth element address is 48.
So, this means that you can rapidly access the n-th piece of data of an array. On the other hand, accessing the n-th non-zero element can be time-consuming, because all cells need to be checked one by one to see whether or not they contain a zero.
In terms of graph adjacency matrices, arrays are efficient to check if two vertices u and v are connected by an edge.
On the other hand they are not that efficient if you want to retrieve all the neighbors of u, because you need to test each vertex v one by one.

Another common data structure is a list. Lists are not adjacent in memory, and to find a piece of data, you need to find all the previous ones first.

The principle of lists is that each address not only contains data, but also the address of the next piece of data to look for.
So if you want to access the third piece of data in a list, you first must access the first, look at the address of the second, then look at the second to find the address of the third.

Accessing the n-th piece of data in a list can be time-consuming. However, it is possible to overcome the time-consuming process of accessing the n-th piece of data by only storing pieces of data of interest, which considerably reduces memory usage compared to an array.

In terms of graphs, lists can be used to store only information about the neighbors of vertices for example, since non-neighbors are often irrelevant. If each vertex has only a few neighbors, representing only neighbors with lists can significantly save memory. On the other hand, checking if u and v are connected by an edge may require you to search the full list of u‘s neighbors.

A final example of data structures is dictionaries. Dictionaries are elaborate structures that aim to combine the advantages of both arrays and lists. In particular, fast access and efficient memory usage, respectively.

Dictionaries make use of hash functions, which are basically mechanisms to transform contents in addresses. This is the optimal choice for prototyping, because it allows programmers to benefit from speed of access and low memory usage at the same time.

As a rule of thumb, dictionaries should always be used if you don’t know exactly what you’re doing.

Concluding words

Thanks for your attention! That’s it for memory structures, and I will see you again very soon to talk about good programming practices. Bye!


More details on the list-based solution

We have seen before that an adjacency matrix is a convenient object for representing a graph in memory.

However, in most cases, graphs are sparse objects, i.e., the number of existing edges is low compared to the number of edges of a complete graph. A direct implication is that most of the entries of the adjacency matrix are 0s. Since the number of elements in an adjacency matrix is equal to the square of the graph order, this can quickly lead to a lot of memory space used.

A possible solution to circumvent this problem is to use a different data structure: a list of lists. Let us call such an object L = [l_1, \dots, l_n], with l_1, \dots, l_n being lists. In this structure, l_i (i \in [1, n]) will represent the edges that can be accessed from vertex i.

As an example, consider the following adjacency matrix: \begin{pmatrix}0 & 1 & 0 & 1 & 0 & 0 & 0 \\1 & 0 & 1 & 1 & 0 & 1 & 0 \\0 & 1 & 0 & 1 & 0 & 0 & 0 \\1 & 1 & 1 & 0 & 0 & 1 & 0 \\0 & 0 & 0 & 0 & 0 & 1 & 0 \\0 & 1 & 0 & 1 & 1 & 0 & 1 \\0 & 0 & 0 & 0 & 0 & 1 & 0\end{pmatrix}.

Assuming vertices to be labelled from 1 to n, this matrix is equivalent to the list L = [ [2, 4] , [1, 3, 4, 6], [2, 4], [1, 2, 3, 6], [6], [2, 4, 5, 7], [6] ].

We can quickly notice that the number of stored numbers has shrunk from n^2 = 49 to 18.

While this solution saves some memory space, it suffers from different limitations:

  • Checking existence of an edge \{i, j\} requires to go through all elements of the list l_i to verify if j is one of its elements. This can take some time if i has a lot of neighbors. In comparison, making the same check with an adjacency matrix A takes a single operation, as one just need to verify that A[i, j] \neq 0.
  • It is not as easy to extend to weighted graphs. In the case of adjacency matrices, entries represent the weight associated with the edge. Here, entries are indices of non-zero elements, which cannot be altered without creating/deleting edges. A possible solution is to replace the lists of indices l_i = [j, k, \dots] with lists of couples l_i = [(j, w_j), (k, w_k), \dots], where w_j is the weight of edge \{i, j\}.

To go further


Quiz

Which data structures need the number of elements to be defined beforehand?
We store the adjacency matrix of a graph of order 20 in an array. How many elements does this array contain?