### Contents of the page

Computer science is a difficult science: concepts are often linked to discrete mathematics and proving intuitive results can be complex.

Rather than proving a property with a mathematical formalism, it often happens that we try to support a property by numerous and well thought simulations.

Let us show you on an example how such a scientific approach is implemented to analyze a problem.

## Problem

The problem we are considering here is that of the famous Syracuse sequence.

A Syracuse sequence is obtained by induction as follows:

• The first term is a parameter $u_0$, which is a strictly positive integer.
• The $u_{i+1}$ term is computed as follows:
• if $u_i$ is even, then $u_{i+1} = u_i / 2$.
• if $u_i$ is odd, then $u_{i+1} = 3\cdot u_i + 1$.

As soon as a 1 is encountered in this sequence, the cycle 1,4,2,1,4,2,… repeats itself indefinitely.

It is supposed that for any choice of $u_0$, the sequence will end up passing through 1 and thus repeating this cycle. However, no one has proved this.

We will try to analyze the impact of $u_0$ on the number of values the sequence takes before meeting the value 1.

## Python program

In order to analyze this impact, we start by writing the function which calculates the Syracuse sequence:

# The syracuse function takes as an argument an integer n strictly higher than 0
# It outputs the number of steps before encountering 1
def syracuse (n) :
if n == 1 :
return 0
else :
if n % 2 == 0 :
return 1 + syracuse(n / 2)
else :
return 1 + syracuse(3 * n + 1)

Let us save this function in a file called syracuse.py.

## Small values of $u_0$

First, we will look at what happens with small values of $u_0$. To do this, we are going to make our program interfaceable with a script, i.e., ease its execution with different parameters.

The Python sys module is used for this:

import sys

# We retrieve the first argument as u0
u0 = int(sys.argv)

# The syracuse function takes as an argument an integer n strictly higher than 0
# It outputs the number of steps before encountering 1
def syracuse (n) :
if n == 1 :
return 0
else :
if n % 2 == 0 :
return 1 + syracuse(n / 2)
else :
return 1 + syracuse(3 * n + 1)

# We execute the function and output its result
print(syracuse(u0))

Thanks to this module, we can now start our program from the terminal by setting $u_0$ as argument. To do this, we write in a terminal:

python syracuse.py 14

The output printed in the terminal is 17.

## Average results

We will execute our script for values of $u_0$ between 1 and 100 to see how it behaves. We therefore write in a terminal:

for i in {1..100}; do python syracuse.py $i; done Note that this line would only work in a unix environment (e.g., Linux). If you are using Windows instead, it could be replaced bythe following (Windows translations courtesy of Colin_McNicholl): for ($i = 1; $i -le 20;$i++) {py -3 syracuse.py $i} The output (100 lines with a number on each) is not very readable … Instead we would prefer to have a plot. To do this, we will create a file in csv format with the result as a function of $u_0$ . Modify the display of the result in the file syracuse.py: print(u0, syracuse(u0)) The command line in the terminal becomes as follows, where >> results.csv allows to redirect the output of the program to the file results.csv: rm -f results.csv for i in {1..100}; do python syracuse.py$i >> results.csv; done

The first line ensures that the old results.csv file, which might exist, is removed before new data is added to the file.

Again, for Windows users, the command is a bit different:

for ($i=1;$i -le 100; $i++) {py -3 syracuse.py$i | Out-File results.csv -Append}

To plot these data, we use LibreOffice:

libreoffice results.csv

We specify that the data are separated by spaces and we can then draw the following plot:

### Behavior analysis

What we have obtained is already interesting. It seems that globally our curve is increasing even if there are strong local variations.

In order to verify this statement, we propose to check the situation for larger values of $u_0$ . We cannot be exhaustive so we will look at what happens for the first multiples of 1,000,000. The problem is that given the variations, we cannot just draw a plot with a point for every multiple of 1,000,000.

What we propose then is to look at a large number of numbers close to multiples of 1,000,000 and average them. For that, we will use the random module of Python:

import sys
import random

# We retrieve the first argument as u0
u0 = int(sys.argv)

# We perform 10.000 tests
TESTS = 10000

# We study a neighborhood of ± 50.000 around u0
NEIGHBORHOOD = 50000

# The syracuse function takes as an argument an integer n strictly higher than 0
# It outputs the number of steps before encountering 1
def syracuse (n) :
if n == 1 :
return 0
else :
if n % 2 == 0 :
return 1 + syracuse(n / 2)
else :
return 1 + syracuse(3 * n + 1)

# We start the computations and store the results in total
total = 0
for _ in range(TESTS) :
total = total + syracuse(u0 + random.randint(-NEIGHBORHOOD, NEIGHBORHOOD))

# We write the output
print(u0, total / TESTS)

If we run the tests multiple times with

python syracuse.py 1000000

we notice that the values do not change much, which means that 10,000 tests for each value is reasonable.

We still have to calculate the result for our different values, and write the result in a file results.csv:

rm -f results.csv
for i in {1..100}; do python syracuse.py ${i}000000 >> results.csv; done For Windows users (note that the new file is named results2.csv this time): for ($i=1; $i -le 100;$i++) {py -3 syracuse.py ${i}000000 | Out-File results2.csv -Append} This script takes a little time, which is normal given the amount of calculations required. We obtain the following curve: x axis: Average starting — y axis: Number of steps before encountering 1 ### Let’s optimize the program Instead of recalculating everything each time, the results already obtained can be stored in memory to speed up calculations. This is at the expense of a much higher memory consumption (and potentially too high for your computers: if your computer runs out of memory, it will use the SWAP or worse, it will freeze. Do not hesitate to interrupt the program by pressing CTRL+C in case of problems): import sys # We retrieve the first argument as u0 u0 = int(sys.argv) # The idea is to fill a dictionary to count the number of values # before a 1 is encountered, and to refer to them instead of computing # everything again for other values values = {1:0} # Now, finding the requested value can be done either by consulting the dictionary, # or by computing it if not present def syracuse (n) : if n in values.keys() : return values[n] else : if n % 2 == 0 : res = 1 + syracuse(int(n / 2)) else : res = 1 + syracuse(3 * n + 1) values[n] = res return res # Function to compute everythig for a given u0 def computations (u0) : # We compute the mean for a certain neighborhood # Remark: here, we compute exhaustively in the neighborhood rather than using a random function NEIGHBORHOOD = 50000 # We start the computations and store the results in total total = 0 for i in range(NEIGHBORHOOD) : total = total + syracuse(u0 + i) + syracuse(u0 - i) return total / 2 / NEIGHBORHOOD # We write the output print(u0, computations(u0)) The program is called with large values of $u_0$: rm -f results.csv for i in {1..100}; do echo$i; python syracuseopt.py $[$i * 4 * 10000000 ] >> results.csv; done

For Windows users:

for ($i=1;$i -le 100; $i++) {py -3 syracuse_opt.py ($i * 4 * 10000000) | Out-File results3.csv -Append}

We obtain the following plot:

This plot is obtained with the Gnuplot tool and the following script:

# Header
set encoding utf8

# Labels
set xlabel "Central value of u0"
set ylabel "Number of steps before obtaining a 1"
set style line 11 lc rgb '#606060' lt 1
set border 3 back ls 11
set tics nomirror out scale 0.75

# Ranges
set yrange [0:]

# Tics
set xtics 1000000000

# Zoom
set size 0.8,0.8

# Caption position
set key at 0.7,0.95

# Output options
set terminal png
set output "syracuse.png"

# Lines definitions
set linestyle 1 lt 1 lw 4

# Plot
plot "results.csv" using 1:2 notitle with linespoints ls 1 smooth bezier