Multiple model runs

In the last page you successfully performed a single run modelling an outbreak of the lurgy that started in London. This run (which we call a model run) is stochastic, meaning that the results will be slightly different every time it is performed.

To gain confidence in any predictions, we need to perform a model run multiple times, and average over the results.

Performing multiple model runs

metawards has the command line option --repeats (or -r) to set the number of times a model run should be repeated. For example, run the below command to repeat the model run four times;

metawards -d lurgy -a ExtraSeedsLondon.dat --repeats 4

metawards will automatically use as many of the cores on your computer as it can to parallelise the jobs. On my computer, the output shows;

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Running the model ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Using random number seed: 87340504
Running 4 jobs using 4 process(es)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ MULTIPROCESSING ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

I have four processor cores on my laptop, so I see the four repeats run in parallel using four processes, with each model run performed using 1 thread. You will see a different distribution of threads and processes if you have a different number of cores on your computer. You can set the number of processes that metawards should use via the --nprocs command line option. You can set the number of threads that metawards should use via the --nthreads command line option.

This calculation may take some time (2-3 minutes). This time, instead of seeing a summary of the outbreak, metawards will show a summary of the different model run jobs. Something similar to this should be printed;

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ MULTIPROCESSING ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Computing model run ✔
┌────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                        │
│  Completed job 1 of 4                                                                  │
│  (NO_CHANGE)[repeat 1]                                                                 │
│  2021-01-13: DAY: 238 S: 11784852    E: 0    I: 0    R: 44297225    IW: 0   UV: 1.0    │
│  TOTAL POPULATION 56082077                                                             │
│                                                                                        │
└────────────────────────────────────────────────────────────────────────────────────────┘
Computing model run ✔
┌────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                        │
│  Completed job 2 of 4                                                                  │
│  (NO_CHANGE)[repeat 2]                                                                 │
│  2021-01-05: DAY: 230 S: 11770162    E: 0    I: 0    R: 44311915    IW: 1   UV: 1.0    │
│  TOTAL POPULATION 56082077                                                             │
│                                                                                        │
└────────────────────────────────────────────────────────────────────────────────────────┘
Computing model run ✔
┌────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                        │
│  Completed job 3 of 4                                                                  │
│  (NO_CHANGE)[repeat 3]                                                                 │
│  2021-02-05: DAY: 261 S: 11789449    E: 0    I: 0    R: 44292628    IW: 1   UV: 1.0    │
│  TOTAL POPULATION 56082077                                                             │
│                                                                                        │
└────────────────────────────────────────────────────────────────────────────────────────┘
Computing model run ✔
┌────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                        │
│  Completed job 4 of 4                                                                  │
│  (NO_CHANGE)[repeat 4]                                                                 │
│  2021-01-04: DAY: 229 S: 11779688    E: 0    I: 0    R: 44302389    IW: 0   UV: 1.0    │
│  TOTAL POPULATION 56082077                                                             │
│                                                                                        │
└────────────────────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                        │
│  Writing a summary of all results into the csv file                                    │
│  /Users/chris/GitHub/tutorial/test/output/results.csv.bz2. You can use this to         │
│  quickly look at statistics across all runs using e.g. R or pandas                     │
│                                                                                        │
└────────────────────────────────────────────────────────────────────────────────────────┘

Note

metawards prints a progress spinner to the screen while the jobs are running, so show you that it hasn’t crashed. You can switch off the spinner using the --no-spinner option if it annoys you. Similarly, metawards by default writes output to the screen using a very colourful theme. You can change this to a more simple and less-colourful theme by passing in the option --theme simple.

In this case, all four outbreaks completed within 229-261 days, while the number of the population who progressed to the ‘R’ state were all around 44.3 million.

The results.csv.bz2 file

The day-by-day progress of each the outbreak for each model run is recorded in the output file results.csv.bz2. This is a comma-separated file that has been compressed using bzip2.

You can read this file easily using Python Pandas or with R. You can even import this into Excel (although you may need to uncompress this file first using bunzip2).

For example, if you have Pandas installed, then you can read this file via an ipython or Jupyter notebook session via;

>>> import pandas as pd
>>> df = pd.read_csv("output/results.csv.bz2")
>>> df
    fingerprint  repeat  day        date         S  E  I         R  IW
0     NO_CHANGE       1    0  2020-04-20  56082077  0  0         0   0
1     NO_CHANGE       1    1  2020-04-21  56082077  0  0         0   0
2     NO_CHANGE       1    2  2020-04-22  56082072  5  0         0   0
3     NO_CHANGE       1    3  2020-04-23  56082072  0  5         0   0
4     NO_CHANGE       1    4  2020-04-24  56082068  0  5         4   4
..          ...     ...  ...         ...       ... .. ..       ...  ..
929   NO_CHANGE       4  224  2020-11-30  11782419  0  4  44299654   0
930   NO_CHANGE       4  225  2020-12-01  11782419  0  3  44299655   0
931   NO_CHANGE       4  226  2020-12-02  11782419  0  1  44299657   0
932   NO_CHANGE       4  227  2020-12-03  11782419  0  1  44299657   0
933   NO_CHANGE       4  228  2020-12-04  11782418  0  0  44299659   1

[934 rows x 9 columns]

Each repeat is given its own number, which is in the repeat column. The day of the outbreak is given in the day column. This counts up from day zero when the outbreak started, to the last day when the outbreak was over. You can control the start day of the outbreak using the --start-day command line option.

The date column contains the date of each day in the outbreak. By default, metawards assumes that day zero is today. You can set the date of day zero using the --start-date command line option, e.g. --start-date tomorrow would start tomorrow, while --start-date Jan 1 would start on January 1st this year.

The values of S, E, I, R and IW for each repeat for each day are then given in their correspondingly named columns.

The fingerprint column not used for this calculation - we will see what it is later.