# Viewing output/results.csv.bz2

This notebook is used to view the data in output/results.csv.bz2 which is produced by [metawards](https://metawards.org). This notebook is described in the [metawards tutorial here](https://metawards.org/tutorial/part01/02_repeating.html).

We need to start this notebook by importing pandas and matplotlib

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

Next, read the results into a pandas dataframe

In [None]:
df = pd.read_csv("output/results.csv.bz2")

This is a table that contains the data for four repeats of a *model run* simulating an outbreak of the lurgy that started in London.

The columns represent the repeat number, day within the outbreak, date, and the values of the **S**, **E**, **I**, **R** and **IW** parameters, 
[as described here](https://metawards.github.io/MetaWards/tutorial/part01/02_repeating.html).

In [None]:
df

Our first plot will be four graphs, one for each of E, I, IW and R. The data is pivoted around the 'date' column, and grouped using the value of the 'repeat' column. This is performed for each of the 'E', 'I', 'IW' and 'R' columns in turn, e.g. for 'E', this results in a dataframe that is indexed by 'date', with four columns of 'E', one for each of the four repeats.

This is then plotted into a 2-by-2 graph grid that has been setup in matplotlib.

The remaining lines set the title, rotate the x-axis labels by 90 degrees and set the y-axis label to 'Population'.

Finally we save the plot to a file called 'overview.pdf'. Note that you can choose a different format by changing the file extension.

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10,10))

i = 0
j = 0

for column in ["E", "I", "IW", "R"]:
    ax = df.pivot(index="date", columns="repeat", values=column).plot.line(ax=axes[i][j])
    ax.tick_params('x', labelrotation=90) 
    ax.get_legend().remove()
    ax.set_title(column)
    ax.set_ylabel("Population")
    
    j += 1
    if j == 2:
        j = 0
        i += 1

fig.tight_layout(pad=1)
        
plt.show()
fig.savefig("overview.pdf")

Next we will calculate the mean and standard deviation of each of the E, I, IW and R columns for each day. This can be achieved using the 'groupby' function to group entries together that have the same value in a column. Thus 'groupby("date")' will group all columns together that have the same date. The '.mean()' function takes the mean average of those columns, while the '.std()' function calculates the normalised standard deviation.

We can then plot each of the four columns into a pre-prepared 2-by-2 graph grid, using the same matplotlib function as above to arrange the graph, set the axis labels etc. Finally, we save the figure to a file called 'average.pdf'.

Note that we are using mean and standard deviation to get a rough view of the data. Better statistical methods should be used to gain proper insight.

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10,10))

mean_average = df.groupby("date").mean()
stddev = df.groupby("date").std()

i = 0
j = 0

for column in ["E", "I", "IW", "R"]:
    ax = mean_average.plot.line(y=column, yerr=stddev[column], ax=axes[i][j])
    ax.tick_params('x', labelrotation=90) 
    ax.get_legend().remove()
    ax.set_title(column)
    ax.set_ylabel("Population")
    
    j += 1
    if j == 2:
        j = 0
        i += 1

fig.tight_layout(pad=1)
        
plt.show()
fig.savefig("average.pdf")