One of the things that data scientists, specially those in healthcare will likely have to do is to simulate the spread of an infection, an epidemic or a pandemic as we are currently experiencing. Simulating that is actually not as complicated as it may seem. One of the models which is rather simple but effective is the SIR model.
The SIR Model
SIR in an abbreviation for susceptible, infected and removed (think recovered). The model assumes that the population remains constant and at any point during the spread, there are people who are susceptible to the infection but not yet infection, infected people and those who are done with it. You also assume that the population moves from susceptible to infected to removed (recovered or died).
The SIR equation:
At any point the susceptible population is the group that is not yet infection or removed. Let’s say we start from day 1, n, we can represent the S population as follow:
S=S-I-R
Now moving forward, in each step, we have to take the number of people that get infected or removed out of the susceptible population. now each infected person can infect a fraction of S, and that fraction is represented by B. Thus the number of people that be infection by an infected person would be BXS. Total number of people infected in the next step would be I x β x S(n). Thus if the starting point in n, the S population at the next step, n+1 would be represented as:
S(n+1) = S(n) - I(n) x β x S(n)
S(n+1) = S(n) - I(n) x β x S(n)
Keeping the same logic in mind, we can calculate the number of recovered people in the next step. Consider γ fraction of the infected population recover daily. So the next day, the number of recovered people will increase by γ x I(n), so we can calculate R as:
R(n+1) = R(n) + γ x I(n)
The next component is calculating the number of infected people. It is a little complicated as newly infected are added and recovered are subtracted. Each day, each member of I infects β fraction of S, so I increases by I x β x S while γ fraction of I recovers.
I(n+1) = I(n) + I x β x S(n) - γ x I(n)
Now that we know how the population progresses from S to I to R, we can plot this changing population type over a period of time and predict the course of the disease as it spreads through the population.
Vaccination
Vaccination will protect a fraction of the S population therefore we remove them from the S population and add them to the R population. This fraction of the population, the vaccinated one is represented by kvacc. Thus we can change the equations to:
S(n+1) = S(n)-I(n) x β x S(n) - kvacc x S(n) x I(n+1)
= I(n )+ I x β x S(n) - γ x I(n) + kvacc x S(n)
Coding the SIR model in python The purpose of the SIR model is to plot the progression of the disease as it spreads through the population. We can write a function when takes beta, γ and the number of people in the population and plots the daily number of S, I and R over a period specified by the parameter days.
from matplotlib import pyplot as plt
def sir(beta, gamma, population, days, kvacc=0):
plt.clf
N=population
I=[1/N]
S=[1.0-I[0]]
R=[0]
T=[0]
for t in range(days):
s=S[t]-I[t]*beta*S[t]-kvacc*S[t]
i=I[t]+I[t]*beta*S[t]-I[t]*gamma
r=R[t]+I[t]*gamma+kvacc*S[t]
S.append(s)
I.append(i)
R.append(r)
T.append(t+1)
_ = plt.figure(figsize=(6,4))
plt.plot(T,S, 'green', label='Susceptible')
plt.plot(T,I, 'red', label='Infected')
plt.plot(T,R, 'black', label='Recovered')
plt.xlabel('Days')
plt.ylabel('Fraction')
plt.title('SIR Model')
plt.grid(True)
Plotting the data 1/β represents the number of days an infected person will continue to infect others, so if we say, an infected person in infectious for 10 days, β=1/10 or 0.1. Similarly, if we say it takes 20 days to recover, then 1/γ is 20, γ=1/20 or γ= 0.05. We can plot this model, imagining we have a population of 100000. We want to display data over a period of a year or 365 days.
sir(0.1, 0.05, 100000, 365)
The progression of the disease process is thus impacted by beta and γ. Anything that speeds up recovery will increase γ, anything that reduces infectivity or provides protection of the infection will reduce beta.
Vaccination: Let’s consider we have a vaccine and we vaccinate 0.15% of the population daily. So we can plot the graph and see if it makes a difference. Our kvacc is 0.0015%.
sir(0.1, 0.05, 100000, 365, 0.0015)
You can see how that has sort of “flattened” the curve and prevented the peak.
Medications/Treatment: Let’s say we come up with a treatment that cuts down recovery time by half. So now our γ is 1/10 or 0.1. Let’s plot the data.
sir(0.1, 0.06, 100000, 365, 0.0015)
This was a very basic SIR model. Obviously it does not apply 100% to any population. This model assumes the population to be static but in reality, no population is static and is very dynamic where people move in and out of the population. Many other parameters can be considered in the SIR model like number of initially infected, incubation period etc. but a basic SIR model is a good start.