Computer science is a difficult science: concepts are often linked to discrete mathematics and proving intuitive results can be complex.

Rather than proving a property with a mathematical formalism, it often happens that we try to support a property by numerous and well thought simulations.

Let us show you on an example how such a scientific approach is implemented to analyze a problem. 

Problem

The problem we are considering here is that of the famous Syracuse sequence.

A Syracuse sequence is obtained by induction as follows:

  • The first term is a parameter u_0, which is a strictly positive integer.
  • The u_{i+1} term is computed as follows:
    • if u_i is even, then u_{i+1} = u_i / 2.
    • if u_i is odd, then u_{i+1} = 3\cdot u_i + 1.

As soon as a 1 is encountered in this sequence, the cycle 1,4,2,1,4,2,… repeats itself indefinitely.

It is supposed that for any choice of u_0, the sequence will end up passing through 1 and thus repeating this cycle. However, no one has proved this.

We will try to analyze the impact of  u_0 on the number of values the sequence takes before meeting the value 1.

Python program

Let’s write a program for computing the number of iterations of the Syracuse sequence in a file called syracuse.py:

import sys

# We retrieve the first argument as u0
u0 = int(sys.argv[1])

# The syracuse function takes as an argument an integer n strictly higher than 0
# It outputs the number of steps before encountering 1
def syracuse (n) :
  if n == 1 :
    return 0
  else :
    if n % 2 == 0 :
      return 1 + syracuse(n / 2)
    else :
      return 1 + syracuse(3 * n + 1)

# We execute the function and output its result
print(u0, ",", syracuse(u0))

Let’s start our program from the terminal by setting u_0 as argument. To do this, we write in a terminal:

python syracuse.py 14

The output printed in the terminal is 14, 17.

Average results

We will execute our script for values of u_0 between 1 and 100 to see how it behaves. We therefore write in a terminal:

for i in {1..100}; do python syracuse.py $i; done

Note that this line would only work in a Unix environment (e.g., Linux). If you are using Windows instead, it could be replaced by the following (Windows translations courtesy of Colin McNicholl):

for ($i = 1; $i -le 20; $i++) {py -3 syracuse.py $i}

The output (100 lines with a number on each) is not very readable… Instead we would prefer to have a plot. To do this, we will create a file in csv format with the result as a function of u_0. The command line in the terminal becomes as follows, where >> results.csv allows to redirect the output of the program to the file results.csv:

for i in {1..100}; do python syracuse.py $i >> results.csv; done

The first line ensures that the old results.csv file, which might exist, is removed before new data is added to the file. 

Again, for Windows users, the command is a bit different:

for ($i=1; $i -le 100; $i++) {py -3 syracuse.py $i | Out-File results.csv -Append}

To plot these data, we use Libreoffice:

Number of steps before encountering 1, as a function of the starting number u_0.

Behavior analysis

What we have obtained is already interesting. It seems that globally our curve is increasing even if there are strong local variations.

In order to verify this statement, we propose to check the situation for larger values of u_0 . We cannot be exhaustive so we will look at what happens for the first multiples of 1,000,000. The problem is that given the variations, we cannot just draw a plot with a point for every multiple of 1,000,000.

What we propose then is to look at a large number of numbers close to multiples of 1,000,000 and average them:

import sys

# We retrieve the first argument as u0
u0 = int(sys.argv[1])

# We study a neighborhood of ± 50,000 around u0
NEIGHBORHOOD = 50000

# The syracuse function takes as an argument an integer n strictly higher than 0
# It outputs the number of steps before encountering 1
def syracuse (n) :
  if n == 1 :
    return 0
  else :
    if n % 2 == 0 :
      return 1 + syracuse(n / 2)
    else :
      return 1 + syracuse(3 * n + 1)

# We start the computations and store the results in total
total = 0.0
for i in range(-NEIGHBORHOOD, NEIGHBORHOOD) :
    total += syracuse(u0 + i) / (2 * NEIGHBORHOOD)

# We write the output
print(u0, ",", total)

Let’s calculate the result for different values, and write the result in a file results2.csv:

for i in {1..100}; do python syracuse.py ${i}000000 >> results2.csv; done

For Windows users:

for ($i=1; $i -le 100; $i++) {py -3 syracuse.py ${i}000000 | Out-File results2.csv -Append}

This script takes a little time, which is normal given the amount of calculations required. We obtain the following curve:

Number of steps before encountering 1, as a function of the starting number u_0.