### Contents of the page

Computer science is a difficult science: concepts are often linked to discrete mathematics and proving intuitive results can be complex.

Rather than proving a property with a mathematical formalism, it often happens that we try to support a property by numerous and well thought simulations.

Let us show you on an example how such a scientific approach is implemented to analyze a problem.

## Problem

The problem we are considering here is that of the famous Syracuse sequence.

A Syracuse sequence is obtained by induction as follows:

- The first term is a parameter , which is a strictly positive integer.
- The term is computed as follows:
- if is even, then .
- if is odd, then .

As soon as a 1 is encountered in this sequence, the cycle 1,4,2,1,4,2,… repeats itself indefinitely.

It is supposed that for any choice of , the sequence will end up passing through 1 and thus repeating this cycle. However, no one has proved this.

We will try to analyze the impact of on the number of values the sequence takes before meeting the value 1.

## Python program

In order to analyze this impact, we start by writing the function which calculates the Syracuse sequence:

# The syracuse function takes as an argument an integer n strictly higher than 0 # It outputs the number of steps before encountering 1 def syracuse (n) : if n == 1 : return 0 else : if n % 2 == 0 : return 1 + syracuse(n / 2) else : return 1 + syracuse(3 * n + 1)

Let us save this function in a file called `syracuse.py`

.

## Small values of

First, we will look at what happens with small values of . To do this, we are going to make our program interfaceable with a script, *i.e.*, ease its execution with different parameters.

The Python `sys`

module is used for this:

import sys # We retrieve the first argument as u0 u0 = int(sys.argv[1]) # The syracuse function takes as an argument an integer n strictly higher than 0 # It outputs the number of steps before encountering 1 def syracuse (n) : if n == 1 : return 0 else : if n % 2 == 0 : return 1 + syracuse(n / 2) else : return 1 + syracuse(3 * n + 1) # We execute the function and output its result print(syracuse(u0))

Thanks to this module, we can now start our program from the terminal by setting as argument. To do this, we write in a terminal:

python syracuse.py 14

The output printed in the terminal is 17.

## Average results

We will execute our script for values of between 1 and 100 to see how it behaves. We therefore write in a terminal:

for i in {1..100}; do python syracuse.py $i; done

Note that this line would only work in a unix environment (*e.g.*, Linux). If you are using Windows instead, it could be replaced bythe following (*Windows translations courtesy of Colin_McNicholl*):

for ($i = 1; $i -le 20; $i++) {py -3 syracuse.py $i}

The output (100 lines with a number on each) is not very readable …

Instead we would prefer to have a plot. To do this, we will create a file in `csv`

format with the result as a function of . Modify the display of the result in the file` syracuse.py`

:

print(u0, syracuse(u0))

The command line in the terminal becomes as follows, where `>> results.csv`

allows to redirect the output of the program to the file `results.csv`

:

rm -f results.csv for i in {1..100}; do python syracuse.py $i >> results.csv; done

The first line ensures that the old `results.csv`

file, which might exist, is removed before new data is added to the file.

Again, for Windows users, the command is a bit different:

for ($i=1; $i -le 100; $i++) {py -3 syracuse.py $i | Out-File results.csv -Append}

To plot these data, we use LibreOffice:

libreoffice results.csv

We specify that the data are separated by spaces and we can then draw the following plot:

### Behavior analysis

What we have obtained is already interesting. It seems that globally our curve is increasing even if there are strong local variations.

In order to verify this statement, we propose to check the situation for larger values of . We cannot be exhaustive so we will look at what happens for the first multiples of 1,000,000. The problem is that given the variations, we cannot just draw a plot with a point for every multiple of 1,000,000.

What we propose then is to look at a large number of numbers close to multiples of 1,000,000 and average them. For that, we will use the `random`

module of Python:

import sys import random # We retrieve the first argument as u0 u0 = int(sys.argv[1]) # We perform 10.000 tests TESTS = 10000 # We study a neighborhood of ± 50.000 around u0 NEIGHBORHOOD = 50000 # The syracuse function takes as an argument an integer n strictly higher than 0 # It outputs the number of steps before encountering 1 def syracuse (n) : if n == 1 : return 0 else : if n % 2 == 0 : return 1 + syracuse(n / 2) else : return 1 + syracuse(3 * n + 1) # We start the computations and store the results in total total = 0 for _ in range(TESTS) : total = total + syracuse(u0 + random.randint(-NEIGHBORHOOD, NEIGHBORHOOD)) # We write the output print(u0, total / TESTS)

If we run the tests multiple times with

python syracuse.py 1000000

we notice that the values do not change much, which means that 10,000 tests for each value is reasonable.

We still have to calculate the result for our different values, and write the result in a file `results.csv`

:

rm -f results.csv for i in {1..100}; do python syracuse.py ${i}000000 >> results.csv; done

For Windows users (note that the new file is named `results2.csv`

this time):

for ($i=1; $i -le 100; $i++) {py -3 syracuse.py ${i}000000 | Out-File results2.csv -Append}

This script takes a little time, which is normal given the amount of calculations required. We obtain the following curve:

### Let’s optimize the program

Instead of recalculating everything each time, the results already obtained can be stored in memory to speed up calculations. This is at the expense of a much higher memory consumption (and potentially too high for your computers: if your computer runs out of memory, it will use the SWAP or worse, it will freeze. Do not hesitate to interrupt the program by pressing `CTRL+C`

in case of problems):

import sys # We retrieve the first argument as u0 u0 = int(sys.argv[1]) # The idea is to fill a dictionary to count the number of values # before a 1 is encountered, and to refer to them instead of computing # everything again for other values values = {1:0} # Now, finding the requested value can be done either by consulting the dictionary, # or by computing it if not present def syracuse (n) : if n in values.keys() : return values[n] else : if n % 2 == 0 : res = 1 + syracuse(int(n / 2)) else : res = 1 + syracuse(3 * n + 1) values[n] = res return res # Function to compute everythig for a given u0 def computations (u0) : # We compute the mean for a certain neighborhood # Remark: here, we compute exhaustively in the neighborhood rather than using a random function NEIGHBORHOOD = 50000 # We start the computations and store the results in total total = 0 for i in range(NEIGHBORHOOD) : total = total + syracuse(u0 + i) + syracuse(u0 - i) return total / 2 / NEIGHBORHOOD # We write the output print(u0, computations(u0))

The program is called with large values of :

rm -f results.csv for i in {1..100}; do echo $i; python syracuseopt.py $[ $i * 4 * 10000000 ] >> results.csv; done

For Windows users:

for ($i=1; $i -le 100; $i++) {py -3 syracuse_opt.py ($i * 4 * 10000000) | Out-File results3.csv -Append}

We obtain the following plot:

This plot is obtained with the Gnuplot tool and the following script:

# Header set encoding utf8 # Labels set xlabel "Central value of u0" set ylabel "Number of steps before obtaining a 1" set style line 11 lc rgb '#606060' lt 1 set border 3 back ls 11 set tics nomirror out scale 0.75 # Ranges set yrange [0:] # Tics set xtics 1000000000 # Zoom set size 0.8,0.8 # Caption position set key at 0.7,0.95 # Output options set terminal png set output "syracuse.png" # Lines definitions set linestyle 1 lt 1 lw 4 # Plot plot "results.csv" using 1:2 notitle with linespoints ls 1 smooth bezier