Multiple model runs

In the last page you successfully performed a single run modelling an outbreak of the lurgy that started in London. This run (which we call a model run) is stochastic, meaning that the results will be slightly different every time it is performed.

To gain confidence in any predictions, we need to perform a model run multiple times, and average over the results.

Performing multiple model runs

metawards has the command line option --repeats (or -r) to set the number of times a model run should be repeated. For example, run the below command to repeat the model run four times;

metawards -d lurgy -a ExtraSeedsLondon.dat --repeats 4

metawards will automatically use as many of the cores on your computer as it can to parallelise the jobs. On my computer, the output shows;

Performing 4 runs of each set of parameters

Number of threads to use for each model run is 1
Number of processes used to parallelise model runs is 4
Parallelisation will be achieved using multiprocessing

I have four processor cores on my laptop, so I see the four repeats run in parallel using four processes, with each model run performed using 1 thread. You will see a different distribution of threads and processes if you have a different number of cores on your computer. You can set the number of processes that metawards should use via the --nprocs command line option. You can set the number of threads that metawards should use via the --nthreads command line option.

This calculation may take some time (2-3 minutes). This time, instead of seeing a summary of the outbreak, metawards will show a summary of the different model run jobs. Something similar to this should be printed;

Running 4 jobs using 4 process(es)

Running jobs in parallel using a multiprocessing pool...

Completed job 1 of 4
(NO_CHANGE)[repeat 1]
2020-12-19: DAY: 243 S: 11776504    E: 0    I: 0    R: 44305573    IW: 1   TOTAL POPULATION 56082077

Completed job 2 of 4
(NO_CHANGE)[repeat 2]
2020-12-16: DAY: 240 S: 11787147    E: 0    I: 0    R: 44294930    IW: 0   TOTAL POPULATION 56082077

Completed job 3 of 4
(NO_CHANGE)[repeat 3]
2020-11-25: DAY: 219 S: 11789948    E: 0    I: 0    R: 44292129    IW: 0   TOTAL POPULATION 56082077

Completed job 4 of 4
(NO_CHANGE)[repeat 4]
2020-12-04: DAY: 228 S: 11782418    E: 0    I: 0    R: 44299659    IW: 1   TOTAL POPULATION 56082077

Writing a summary of all results into the
csv file /Users/chris/GitHub/tutorial/output/results.csv.bz2.

In this case, all four outbreaks completed within 219-243 days, while the number of the population who progressed to the ‘R’ state were all around 44.3 million.

The results.csv.bz2 file

The day-by-day progress of each the outbreak for each model run is recorded in the output file results.csv.bz2. This is a comma-separated file that has been compressed using bzip2.

You can read this file easily using Python Pandas or with R. You can even import this into Excel (although you may need to uncompress this file first using bunzip2).

For example, if you have Pandas installed, then you can read this file via an ipython or Jupyter notebook session via;

>>> import pandas as pd
>>> df = pd.read_csv("output/results.csv.bz2")
>>> df
    fingerprint  repeat  day        date         S  E  I         R  IW
0     NO_CHANGE       1    0  2020-04-20  56082077  0  0         0   0
1     NO_CHANGE       1    1  2020-04-21  56082077  0  0         0   0
2     NO_CHANGE       1    2  2020-04-22  56082072  5  0         0   0
3     NO_CHANGE       1    3  2020-04-23  56082072  0  5         0   0
4     NO_CHANGE       1    4  2020-04-24  56082068  0  5         4   4
..          ...     ...  ...         ...       ... .. ..       ...  ..
929   NO_CHANGE       4  224  2020-11-30  11782419  0  4  44299654   0
930   NO_CHANGE       4  225  2020-12-01  11782419  0  3  44299655   0
931   NO_CHANGE       4  226  2020-12-02  11782419  0  1  44299657   0
932   NO_CHANGE       4  227  2020-12-03  11782419  0  1  44299657   0
933   NO_CHANGE       4  228  2020-12-04  11782418  0  0  44299659   1

[934 rows x 9 columns]

Each repeat is given its own number, which is in the repeat column. The day of the outbreak is given in the day column. This counts up from day zero when the outbreak started, to the last day when the outbreak was over. You can control the start day of the outbreak using the --start-day command line option.

The date column contains the date of each day in the outbreak. By default, metawards assumes that day zero is today. You can set the date of day zero using the --start-date command line option, e.g. --start-date tomorrow would start tomorrow, while --start-date Jan 1 would start on January 1st this year.

The values of S, E, I, R and IW for each repeat for each day are then given in their correspondingly named columns.

The fingerprint column not used for this calculation - we will see what it is later.