Contents of the page


Computer science is a difficult science: concepts are often linked to discrete mathematics and proving intuitive results can be complex.

Rather than proving a property with a mathematical formalism, it often happens that we try to support a property by numerous and well thought simulations.

Let us show you on an example how such a scientific approach is implemented to analyze a problem. 

Problem

The problem we are considering here is that of the famous Syracuse sequence.

A Syracuse sequence is obtained by induction as follows:

  • The first term is a parameter u_0, which is a strictly positive integer.
  • The u_{i+1} term is computed as follows:
    • if u_i is even, then u_{i+1} = u_i / 2.
    • if u_i is odd, then u_{i+1} = 3\cdot u_i + 1.

As soon as a 1 is encountered in this sequence, the cycle 1,4,2,1,4,2,… repeats itself indefinitely.

It is supposed that for any choice of u_0, the sequence will end up passing through 1 and thus repeating this cycle. However, no one has proved this.

We will try to analyze the impact of  u_0 on the number of values the sequence takes before meeting the value 1.

Python program

In order to analyze this impact, we start by writing the function which calculates the Syracuse sequence:

# The syracuse function takes as an argument an integer n strictly higher than 0
# It outputs the number of steps before encountering 1
def syracuse (n) :
  if n == 1 :
    return 0
  else :
    if n % 2 == 0 :
      return 1 + syracuse(n / 2)
    else :
      return 1 + syracuse(3 * n + 1)

Let us save this function in a file called syracuse.py.

Small values of u_0

First, we will look at what happens with small values of u_0. To do this, we are going to make our program interfaceable with a script, i.e., ease its execution with different parameters.

The Python sys module is used for this:

import sys

# We retrieve the first argument as u0
u0 = int(sys.argv[1])

# The syracuse function takes as an argument an integer n strictly higher than 0
# It outputs the number of steps before encountering 1
def syracuse (n) :
  if n == 1 :
    return 0
  else :
    if n % 2 == 0 :
      return 1 + syracuse(n / 2)
    else :
      return 1 + syracuse(3 * n + 1)

# We execute the function and output its result
print(syracuse(u0))

Thanks to this module, we can now start our program from the terminal by setting u_0 as argument. To do this, we write in a terminal:

python syracuse.py 14

The output printed in the terminal is 17. 

Average results

We will execute our script for values of u_0 between 1 and 100 to see how it behaves. We therefore write in a terminal:

for i in {1..100}; do python syracuse.py $i; done

Note that this line would only work in a unix environment (e.g., Linux). If you are using Windows instead, it could be replaced bythe following (Windows translations courtesy of Colin_McNicholl):

for ($i = 1; $i -le 20; $i++) {py -3 syracuse.py $i}

The output (100 lines with a number on each) is not very readable …

Instead we would prefer to have a plot. To do this, we will create a file in csv format with the result as a function of u_0 . Modify the display of the result in the file syracuse.py:

print(u0, syracuse(u0))

The command line in the terminal becomes as follows, where >> results.csv allows to redirect the output of the program to the file results.csv:

rm -f results.csv
for i in {1..100}; do python syracuse.py $i >> results.csv; done

The first line ensures that the old results.csv file, which might exist, is removed before new data is added to the file. 

Again, for Windows users, the command is a bit different:

for ($i=1; $i -le 100; $i++) {py -3 syracuse.py $i | Out-File results.csv -Append}

To plot these data, we use LibreOffice:

libreoffice results.csv

We specify that the data are separated by spaces and we can then draw the following plot:

x axis: u_0 — y axis: Number of steps before encountering 1

Behavior analysis

What we have obtained is already interesting. It seems that globally our curve is increasing even if there are strong local variations.

In order to verify this statement, we propose to check the situation for larger values of u_0 . We cannot be exhaustive so we will look at what happens for the first multiples of 1,000,000. The problem is that given the variations, we cannot just draw a plot with a point for every multiple of 1,000,000.

What we propose then is to look at a large number of numbers close to multiples of 1,000,000 and average them. For that, we will use the random module of Python:

import sys
import random

# We retrieve the first argument as u0
u0 = int(sys.argv[1])

# We perform 10.000 tests
TESTS = 10000

# We study a neighborhood of ± 50.000 around u0
NEIGHBORHOOD = 50000

# The syracuse function takes as an argument an integer n strictly higher than 0
# It outputs the number of steps before encountering 1
def syracuse (n) :
  if n == 1 :
    return 0
  else :
    if n % 2 == 0 :
      return 1 + syracuse(n / 2)
    else :
      return 1 + syracuse(3 * n + 1)

# We start the computations and store the results in total
total = 0
for _ in range(TESTS) :
    total = total + syracuse(u0 + random.randint(-NEIGHBORHOOD, NEIGHBORHOOD))

# We write the output
print(u0, total / TESTS)

If we run the tests multiple times with

python syracuse.py 1000000

we notice that the values do not change much, which means that 10,000 tests for each value is reasonable.

We still have to calculate the result for our different values, and write the result in a file results.csv:

rm -f results.csv
for i in {1..100}; do python syracuse.py ${i}000000 >> results.csv; done

For Windows users (note that the new file is named results2.csv this time):

for ($i=1; $i -le 100; $i++) {py -3 syracuse.py ${i}000000 | Out-File results2.csv -Append}

This script takes a little time, which is normal given the amount of calculations required. We obtain the following curve:

x axis: Average starting u_0 — y axis: Number of steps before encountering 1

Let’s optimize the program

Instead of recalculating everything each time, the results already obtained can be stored in memory to speed up calculations. This is at the expense of a much higher memory consumption (and potentially too high for your computers: if your computer runs out of memory, it will use the SWAP or worse, it will freeze. Do not hesitate to interrupt the program by pressing CTRL+C in case of problems):

import sys

# We retrieve the first argument as u0
u0 = int(sys.argv[1])

# The idea is to fill a dictionary to count the number of values
# before a 1 is encountered, and to refer to them instead of computing
# everything again for other values
values = {1:0}

# Now, finding the requested value can be done either by consulting the dictionary,
# or by computing it if not present
def syracuse (n) :
    if n in values.keys() :
        return values[n]
    else :
        if n % 2 == 0 :
            res = 1 + syracuse(int(n / 2))
        else :
            res = 1 + syracuse(3 * n + 1)
    values[n] = res
    return res

# Function to compute everythig for a given u0
def computations (u0) :

    # We compute the mean for a certain neighborhood
    # Remark: here, we compute exhaustively in the neighborhood rather than using a random function
    NEIGHBORHOOD = 50000

    # We start the computations and store the results in total
    total = 0
    for i in range(NEIGHBORHOOD) :
        total = total + syracuse(u0 + i) + syracuse(u0 - i)

    return total / 2 / NEIGHBORHOOD

# We write the output
print(u0, computations(u0))

The program is called with large values of u_0:

rm -f results.csv
for i in {1..100}; do echo $i; python syracuseopt.py $[ $i * 4 * 10000000 ] >> results.csv; done

For Windows users:

for ($i=1; $i -le 100; $i++) {py -3 syracuse_opt.py ($i * 4 * 10000000) | Out-File results3.csv -Append}

We obtain the following plot:

x axis: Center u_0 — y axis: Number of steps before encountering 1

This plot is obtained with the Gnuplot tool and the following script:

# Header         
set encoding utf8

# Labels
set xlabel "Central value of u0"
set ylabel "Number of steps before obtaining a 1"
set style line 11 lc rgb '#606060' lt 1
set border 3 back ls 11
set tics nomirror out scale 0.75

# Ranges
set yrange [0:]

# Tics
set xtics 1000000000

# Zoom
set size 0.8,0.8

# Caption position
set key at 0.7,0.95

# Output options
set terminal png
set output "syracuse.png"

# Lines definitions
set linestyle 1 lt 1 lw 4

# Plot
plot "results.csv" using 1:2 notitle with linespoints ls 1 smooth bezier