Hi everyone! In this lesson, we will talk about how to choose data structures for representing graphs.
Representing data in computer memory
As I’m sure you’ll recall, computer memory is organized into locations in which data are stored.
Basic data structures to represent graphs
Let’s imagine that we want to store the adjacency matrix of a graph.
One way to store data in memory is to use an array. Arrays are adjacent data structures that are used to represent sequences of data, where each piece of data uses the same size in memory.
Another common data structure is a list. Lists are not adjacent in memory, and to find a piece of data, you need to find all the previous ones first.
Accessing the -th piece of data in a list can be time-consuming. However, it is possible to overcome the time-consuming process of accessing the -th piece of data by only storing pieces of data of interest, which considerably reduces memory usage compared to an array.
A final example of data structures is dictionaries. Dictionaries are elaborate structures that aim to combine the advantages of both arrays and lists. In particular, fast access and efficient memory usage, respectively.
As a rule of thumb, dictionaries should always be used if you don’t know exactly what you’re doing.
Thanks for your attention! That’s it for memory structures, and I will see you again very soon to talk about good programming practices. Bye!
More details on the list-based solution
We have seen before that an adjacency matrix is a convenient object for representing a graph in memory.
However, in most cases, graphs are sparse objects, i.e., the number of existing edges is low compared to the number of edges of a complete graph. A direct implication is that most of the entries of the adjacency matrix are 0s. Since the number of elements in an adjacency matrix is equal to the square of the graph order, this can quickly lead to a lot of memory space used.
A possible solution to circumvent this problem is to use a different data structure: a list of lists. Let us call such an object , with being lists. In this structure, () will represent the edges that can be accessed from vertex .
As an example, consider the following adjacency matrix: .
Assuming vertices to be labelled from 1 to , this matrix is equivalent to the list .
We can quickly notice that the number of stored numbers has shrunk from to .
While this solution saves some memory space, it suffers from different limitations:
- Checking existence of an edge requires to go through all elements of the list to verify if is one of its elements. This can take some time if has a lot of neighbors. In comparison, making the same check with an adjacency matrix takes a single operation, as one just need to verify that .
- It is not as easy to extend to weighted graphs. In the case of adjacency matrices, entries represent the weight associated with the edge. Here, entries are indices of non-zero elements, which cannot be altered without creating/deleting edges. A possible solution is to replace the lists of indices with lists of couples , where is the weight of edge .
To go further
- Understanding the efficiency of GPU algorithms for matrix-matrix multiplication: A research paper illustrating one of the main reasons why matrices are frequently used.
- Graph Processing on FPGAs: Taxonomy, Survey, Challenges: A research paper illustrating the use of specific hardware (here, FPGA) for processing large graphs.