Extracting rates by ward
One of the major applications of a custom extractor is to enable you
to add your own live-analysis that is performed while the simulation
is running. This is much faster than having metawards
write all
data to disk, and then running analysis in a post-processing step.
To analyse data, you need to understand the
Workspace
class.
Workspace
The Workspace
class provides a workspace in which data
is accumulated and analysis performed. It is cleared between iterations,
and holds the following information;
- Data indexed by disease stage:
inf_tot
: The total number of workers in each disease stage, summed over all wards.pinf_tot
: The total number of players in each disease stage, summed over all wards.n_inf_wards
: The number of wards which have at least one individual in the disease stage,
- Data indexed by ward:
total_inf_ward
: The total number of infections in each wardtotal_new_inf_ward
: The total number of new infections in each wardincidence
: The incidence of infection in each wardS_in_wards
: The total population in theS
state in each wardE_in_wards
: The total population in theE
state in each wardI_in_wards
: The total population in theI
state in each wardR_in_wards
: The total population in theR
state in each ward
- Data indexed by disease stage and then by ward
ward_inf_tot
: The total population in each disease stage in each ward
Extracting the population in I1
We can use a custom extractor to report the total number of individuals
who are in the first I
stage (I1) for each ward for each day.
To do this, create a new extractor called extract_i1.py
and copy
in the below;
from metawards.extractors import extract_default
def output_i1(population, workspace, output_dir, **kwargs):
# Open the file "total_i1.csv" in the output directory
FILE = output_dir.open("total_i1.csv")
ward_inf_tot = workspace.ward_inf_tot
# The I1 state is stage 2
I1_inf_tot = ward_inf_tot[2]
FILE.write(str(population.day) + ",")
FILE.write(",".join([str(x) for x in I1_inf_tot]) + "\n")
def extract_i1(**kwargs):
# return all of the functions from "extract_default"
# plus our new "output_i1"
funcs = extract_default(**kwargs)
funcs.append(output_i1)
return funcs
This defines a new output function called output_i1
. This calls the
open
function of the OutputFiles
object held
in output_dir
to open the file total_i1.csv
in the output
directory.
Note
The OutputFiles
class is clever enough to only
open the file once, and will automatically close it when needed. It
will also ensure that the file is opened in the correct output
directory for a model run and will compress the file using bz2
if the --auto-bzip
command-line option has been passed (the default),
or will disable automatic compression if --no-auto-bzip
is set.
The function then gets I1_inf_tot
from the third disease stage data
held in workspace.ward_inf_tot
. The third stage is the first I
stage as the first stage (ward_inf_tot[0]
) is a special *
stage,
used for advanced bookkeeping, while the second stage (ward_inf_tot[1]
)
is the latent, E
stage.
Note
The *
stage is used to help evaluate how many wards see new infections.
Individuals are moved into the *
stage at the point of infection,
and are moved into the E
stage on the day after infection. By default,
individuals in the *
stage are counted into R
, which is why this
does not appear to rise continuously. You can control how individuals
in the *
stage are counted using either --star-is-E
to count them
as E
(additional latent stage), --star-is-R
to count them as
R
(default behaviour) or --disable-star
to remove the *
stage and instead treat this first stage as the one and only E
stage.
Note that if you do this, you will need to update the disease files to
remove the *
stage, and to update your above extractor as now
stage 1 will be the first I
stage.
Now that we have the population in the I1
stage in each ward in
I1_inf_tot
, we write this as a comma-separated line to the file,
starting each line with the day number obtained from the passed
Population
typed population
object.
To use your extractor run metawards
using;
metawards -d lurgy3 --extract extract_i1 -a ExtraSeedsLondon.dat --nsteps 30
Note
Note that we’ve set nsteps
to 30 for illustration only,
just to limit the runtime and the size of the file. In a real production
run you wouldn’t need to set the number of steps.
You should see that your file called total_i1.csv.bz2
has been created
in the output directory, and that this contains the populations of the I1
state for each ward.
Calculating rates
As well as outputting raw data, you can also perform some simple analysis that is run live during the model run. For example, you may want to record the number of individuals entering each state, so that you can calculate the rate of progress across states.
To do this, you will need to save the ward_inf_tot
data from the previous
day’s state. You can do this by adding it as a custom attribute to the
workspace.
Create a new extractor by creating extract_rate.py
and copying in the below;
from metawards.extractors import extract_default
from copy import deepcopy
def output_rate(population, workspace, output_dir, **kwargs):
if not hasattr(workspace, "output_rate_previous"):
# This is the first day, so we cannot calculate the rate.
# Instead, just save today's data so that it can be
# be used tomorrow
workspace.output_rate_previous = deepcopy(workspace.ward_inf_tot)
return
# get yesterday's data
ward_inf_previous = workspace.output_rate_previous
# get today's data
ward_inf_tot = workspace.ward_inf_tot
# calculate and write the difference between the two to files for
# each disease stage...
for i in range(0, workspace.n_inf_classes):
FILE = output_dir.open(f"rate_{i}.csv")
FILE.write(str(population.day))
# loop over the data for each ward and write the
# difference to the file
for old, new in zip(ward_inf_previous[i], ward_inf_tot[i]):
FILE.write("," + str(new - old))
FILE.write("\n")
# save today's data so that it can be used tomorrow
workspace.output_rate_previous = deepcopy(ward_inf_tot)
def extract_rate(**kwargs):
funcs = extract_default(**kwargs)
funcs.append(output_rate)
return funcs
This extractor looks a little more complex, but it builds on what you have
seen before. It defines output_rate
, which if the function that will
output the rates, and extract_rate
which returns all of the functions
from extract_default
, plus your new output_rate
function.
The first job of output_rate
is to determine if it has been called on
the first day of the model. If it has, then there is no previous data
from “yesterday” that can be used to calculate the rate. The function
detects if this is the case by checking for a new custom attribute
that will be under the control of this function. We will call this
attribute output_rate_previous
, so to minimise the risk of a name
collision. If this attribute doesn’t exist, then we must be on the first
day. We this save today’s data so that it can be used tomorrow.
If the attribute does exist, then we can calculate a rate. We do that by
getting yesterday’s data from output_rate_previous
and todays data
from workspace.ward_inf_tot
. We then loop over all of the disease
stages, and open an output file for each stage (called rate_{i}.csv
).
We then write into this file the day, then the difference between today’s
and yesterday’s population in ward, for this ith
disease stage.
Finally, we save today’s data into workspace.output_rate_previous
, so
that it can be used tomorrow as yesterday’s data.
Run this extractor in metawards
using;
metawards -d lurgy3 --extract extract_rate -a ExtraSeedsLondon.dat --nsteps 30
(again, we are limiting this to 30 steps just for demonstration reasons)
You should see that you have files rate_0.csv.bz2
, rate_1.csv.bz2
etc.
now created in the output directory. If you look at these files you should
see that they contain the differences between the populations in each ward
for each disease stage between each day.
Writing to a database
In the last example we wrote rates to a large number of files. While this has worked, the data is beginning to get so large and multi-dimensional that we are reaching the limits of what a CSV or other data file can reasonably support. As data sizes get larger, it is better to start writing data to a database.
The OutputFiles
class has in-built support for opening
connections to SQLite3
databases. To use this, we call the function
open_db()
. For example, let’s now create a new
extractor that will output the size of the population at each disease stage
for each day, and the change compared to the previous day. To do this,
open a file called extract_db.py
and copy in the below;
from metawards.extractors import extract_default
def create_tables(N_INF_CLASSES):
# return a function that creates the tables
# for the specified number of disease classes
def initialise(conn):
# create a table for the values...
values = ",".join([f"stage_{i} int" for i in range(0, N_INF_CLASSES)])
c = conn.cursor()
c.execute(f"create table totals(day int, S int, {values})")
# create a table for the rates...
c.execute(f"create table deltas(day int, S int, {values})")
conn.commit()
return initialise
def output_db(population, workspace, output_dir, **kwargs):
have_yesterday = hasattr(workspace, "output_rate_previous")
# get today's data
inf_tot = workspace.inf_tot
pinf_tot = workspace.pinf_tot
S = population.susceptibles
N_INF_CLASSES = workspace.n_inf_classes
# open a database to hold the data - call the 'create_tables'
# function on this database when it is first opened
conn = output_dir.open_db("stages.db",
initialise=create_tables(N_INF_CLASSES))
c = conn.cursor()
# get the values for today
today = [population.day, S] + [inf+pinf for inf, pinf in zip(inf_tot, pinf_tot)]
# convert this to a string
today_str = ",".join([str(t) for t in today])
# write these to the database
c.execute(f"insert into totals VALUES ({today_str})")
if hasattr(workspace, "output_rate_db"):
yesterday = workspace.output_rate_db
# calculate the difference in all columns of today and yesterday
deltas = [t - y for t, y in zip(today, yesterday)]
# (except for the day, which should be today)
deltas[0] = today[0]
delta_str = ",".join([str(d) for d in deltas])
# write this to the database
c.execute(f"insert into deltas values ({delta_str})")
conn.commit()
# save today's data so that it can be used tomorrow
workspace.output_rate_db = today
def extract_db(**kwargs):
funcs = extract_default(**kwargs)
funcs.append(output_db)
return funcs
Here, we have created a new function called create_tables
that is called
to create a function that is returned and passed to
open_db()
. This function creates two tables
in the database; totals
which contains the total population at each
disease stage, and deltas
, which contains the difference from the
previous day.
Next, we have output_db
. This function calls
open_db()
to create the connection, conn
to the SQLite3 database. This connection is a standard Python
sqlite3 connection object.
We calculate the total population in each stage as the sum of
inf_tot
(the workers at each stage) and pinf_tot
(the players
at each stage). We prepend the number of susceptibles and also the
day number.
We then write this, as today’s data, to the database via a cursor.
Next, we check if there is any data from yesterday by looking for the
custom attribute workspace.output_rate_db
. If there is, then we
get this data, and then calculate the difference from the previous day.
This is then written to the deltas
table via the cursor.
Finally, we commit the changes to the database, and then save today’s
data to workspace.output_rate_db
so that it can be used tomorrow.
Run this extractor by typing;
metawards -d lurgy3 --extract extract_db -a ExtraSeedsLondon.dat --nsteps 30
(again, we limit to 30 days just for illustration purposes)
Once this has finished, you should see a file called output/stages.db.bz2
.
Uncompress this file and then examine it using any SQLite3 database viewer, e.g.
# sqlite3 output/stages.db
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE totals(day int, S int, stage_0 int,stage_1 int,stage_2 int,stage_3 int,stage_4 int,stage_5 int);
INSERT INTO totals VALUES(0,56082077,0,0,0,0,0,0);
INSERT INTO totals VALUES(1,56082072,0,5,0,0,0,0);
INSERT INTO totals VALUES(2,56082072,0,0,5,0,0,0);
INSERT INTO totals VALUES(3,56082067,5,0,3,2,0,0);
INSERT INTO totals VALUES(4,56082064,3,5,2,1,2,0);
INSERT INTO totals VALUES(5,56082061,3,3,6,1,2,1);
INSERT INTO totals VALUES(6,56082051,10,3,8,1,3,1);
INSERT INTO totals VALUES(7,56082039,12,10,9,3,2,2);
INSERT INTO totals VALUES(8,56082026,13,12,17,4,2,3);
INSERT INTO totals VALUES(9,56082002,24,13,26,4,4,4);
INSERT INTO totals VALUES(10,56081982,20,24,34,8,3,6);
INSERT INTO totals VALUES(11,56081950,32,20,55,5,8,7);
INSERT INTO totals VALUES(12,56081887,63,32,66,10,6,13);
INSERT INTO totals VALUES(13,56081827,60,63,76,30,6,15);
INSERT INTO totals VALUES(14,56081733,94,60,128,25,20,17);
INSERT INTO totals VALUES(15,56081588,145,94,169,31,21,29);
INSERT INTO totals VALUES(16,56081405,183,145,222,58,22,42);
INSERT INTO totals VALUES(17,56081131,274,183,317,78,42,52);
INSERT INTO totals VALUES(18,56080808,323,274,434,105,57,76);
INSERT INTO totals VALUES(19,56080318,490,323,618,144,82,102);
INSERT INTO totals VALUES(20,56079687,631,490,798,229,96,146);
INSERT INTO totals VALUES(21,56078886,801,631,1134,273,159,193);
INSERT INTO totals VALUES(22,56077748,1138,801,1539,356,224,271);
INSERT INTO totals VALUES(23,56076190,1558,1138,2028,488,301,374);
INSERT INTO totals VALUES(24,56074172,2018,1558,2759,652,388,530);
INSERT INTO totals VALUES(25,56071530,2642,2018,3760,883,525,719);
INSERT INTO totals VALUES(26,56067981,3549,2642,4997,1226,714,968);
INSERT INTO totals VALUES(27,56063321,4660,3549,6723,1522,961,1341);
INSERT INTO totals VALUES(28,56057020,6301,4660,8943,2095,1221,1837);
INSERT INTO totals VALUES(29,56048552,8468,6301,11808,2834,1631,2483);
CREATE TABLE deltas(day int, S int, stage_0 int,stage_1 int,stage_2 int,stage_3 int,stage_4 int,stage_5 int);
INSERT INTO deltas VALUES(1,-5,0,5,0,0,0,0);
INSERT INTO deltas VALUES(2,0,0,-5,5,0,0,0);
INSERT INTO deltas VALUES(3,-5,5,0,-2,2,0,0);
INSERT INTO deltas VALUES(4,-3,-2,5,-1,-1,2,0);
INSERT INTO deltas VALUES(5,-3,0,-2,4,0,0,1);
INSERT INTO deltas VALUES(6,-10,7,0,2,0,1,0);
INSERT INTO deltas VALUES(7,-12,2,7,1,2,-1,1);
INSERT INTO deltas VALUES(8,-13,1,2,8,1,0,1);
INSERT INTO deltas VALUES(9,-24,11,1,9,0,2,1);
INSERT INTO deltas VALUES(10,-20,-4,11,8,4,-1,2);
INSERT INTO deltas VALUES(11,-32,12,-4,21,-3,5,1);
INSERT INTO deltas VALUES(12,-63,31,12,11,5,-2,6);
INSERT INTO deltas VALUES(13,-60,-3,31,10,20,0,2);
INSERT INTO deltas VALUES(14,-94,34,-3,52,-5,14,2);
INSERT INTO deltas VALUES(15,-145,51,34,41,6,1,12);
INSERT INTO deltas VALUES(16,-183,38,51,53,27,1,13);
INSERT INTO deltas VALUES(17,-274,91,38,95,20,20,10);
INSERT INTO deltas VALUES(18,-323,49,91,117,27,15,24);
INSERT INTO deltas VALUES(19,-490,167,49,184,39,25,26);
INSERT INTO deltas VALUES(20,-631,141,167,180,85,14,44);
INSERT INTO deltas VALUES(21,-801,170,141,336,44,63,47);
INSERT INTO deltas VALUES(22,-1138,337,170,405,83,65,78);
INSERT INTO deltas VALUES(23,-1558,420,337,489,132,77,103);
INSERT INTO deltas VALUES(24,-2018,460,420,731,164,87,156);
INSERT INTO deltas VALUES(25,-2642,624,460,1001,231,137,189);
INSERT INTO deltas VALUES(26,-3549,907,624,1237,343,189,249);
INSERT INTO deltas VALUES(27,-4660,1111,907,1726,296,247,373);
INSERT INTO deltas VALUES(28,-6301,1641,1111,2220,573,260,496);
INSERT INTO deltas VALUES(29,-8468,2167,1641,2865,739,410,646);
COMMIT;