=================== Multiple model runs =================== In the last page you successfully performed a single run modelling an outbreak of the lurgy that started in London. This run (which we call a *model run*) is stochastic, meaning that the results will be slightly different every time it is performed. To gain confidence in any predictions, we need to perform a *model run* multiple times, and average over the results. Performing multiple model runs ------------------------------ ``metawards`` has the command line option ``--repeats`` (or ``-r``) to set the number of times a ``model run`` should be repeated. For example, run the below command to repeat the *model run* four times; .. code-block:: bash metawards -d lurgy -a ExtraSeedsLondon.dat --repeats 4 ``metawards`` will automatically use as many of the cores on your computer as it can to parallelise the jobs. On my computer, the output shows; :: Performing 4 runs of each set of parameters Number of threads to use for each model run is 1 Number of processes used to parallelise model runs is 4 Parallelisation will be achieved using multiprocessing I have four processor cores on my laptop, so I see the four repeats run in parallel using four processes, with each *model run* performed using 1 thread. You will see a different distribution of threads and processes if you have a different number of cores on your computer. You can set the number of processes that ``metawards`` should use via the ``--nprocs`` command line option. You can set the number of threads that ``metawards`` should use via the ``--nthreads`` command line option. This calculation may take some time (2-3 minutes). This time, instead of seeing a summary of the outbreak, ``metawards`` will show a summary of the different *model run* jobs. Something similar to this should be printed; :: Running 4 jobs using 4 process(es) Running jobs in parallel using a multiprocessing pool... Completed job 1 of 4 (NO_CHANGE)[repeat 1] 2020-12-19: DAY: 243 S: 11776504 E: 0 I: 0 R: 44305573 IW: 1 TOTAL POPULATION 56082077 Completed job 2 of 4 (NO_CHANGE)[repeat 2] 2020-12-16: DAY: 240 S: 11787147 E: 0 I: 0 R: 44294930 IW: 0 TOTAL POPULATION 56082077 Completed job 3 of 4 (NO_CHANGE)[repeat 3] 2020-11-25: DAY: 219 S: 11789948 E: 0 I: 0 R: 44292129 IW: 0 TOTAL POPULATION 56082077 Completed job 4 of 4 (NO_CHANGE)[repeat 4] 2020-12-04: DAY: 228 S: 11782418 E: 0 I: 0 R: 44299659 IW: 1 TOTAL POPULATION 56082077 Writing a summary of all results into the csv file /Users/chris/GitHub/tutorial/output/results.csv.bz2. In this case, all four outbreaks completed within 219-243 days, while the number of the population who progressed to the '**R**' state were all around 44.3 million. The results.csv.bz2 file ------------------------ The day-by-day progress of each the outbreak for each *model run* is recorded in the output file ``results.csv.bz2``. This is a comma-separated file that has been compressed using `bzip2 `__. You can read this file easily using `Python Pandas `__ or with `R `__. You can even import this into Excel (although you may need to uncompress this file first using ``bunzip2``). For example, if you have Pandas installed, then you can read this file via an `ipython `__ or `Jupyter notebook `__ session via; .. code-block:: python >>> import pandas as pd >>> df = pd.read_csv("output/results.csv.bz2") >>> df fingerprint repeat day date S E I R IW 0 NO_CHANGE 1 0 2020-04-20 56082077 0 0 0 0 1 NO_CHANGE 1 1 2020-04-21 56082077 0 0 0 0 2 NO_CHANGE 1 2 2020-04-22 56082072 5 0 0 0 3 NO_CHANGE 1 3 2020-04-23 56082072 0 5 0 0 4 NO_CHANGE 1 4 2020-04-24 56082068 0 5 4 4 .. ... ... ... ... ... .. .. ... .. 929 NO_CHANGE 4 224 2020-11-30 11782419 0 4 44299654 0 930 NO_CHANGE 4 225 2020-12-01 11782419 0 3 44299655 0 931 NO_CHANGE 4 226 2020-12-02 11782419 0 1 44299657 0 932 NO_CHANGE 4 227 2020-12-03 11782419 0 1 44299657 0 933 NO_CHANGE 4 228 2020-12-04 11782418 0 0 44299659 1 [934 rows x 9 columns] Each repeat is given its own number, which is in the ``repeat`` column. The day of the outbreak is given in the ``day`` column. This counts up from *day zero* when the outbreak started, to the last day when the outbreak was over. You can control the start day of the outbreak using the ``--start-day`` command line option. The ``date`` column contains the date of each day in the outbreak. By default, ``metawards`` assumes that *day zero* is today. You can set the date of *day zero* using the ``--start-date`` command line option, e.g. ``--start-date tomorrow`` would start tomorrow, while ``--start-date Jan 1`` would start on January 1st this year. The values of **S**, **E**, **I**, **R** and **IW** for each repeat for each day are then given in their correspondingly named columns. The *fingerprint* column not used for this calculation - we will see what it is later.