Home     About     Archive     Links

COVID-19 outbreak: the case of Portugal(Update)

COVID-19: a data epidemic

With the outbreak of COVID-19 one thing that is certain is that never before a virus has gone so much viral on the internet. Especially, a lot of data about the spread of the virus is going around.

In the space of a few weeks, complex terms like “group immunity” (herd immunity) or “flattening the curve” started to circulate on social networks.

Here we will try an approach that demonstrates the limitations of the model from the previous post The importance of “flattening the curve”.
This post is based on Contagiousness of COVID-19 Part I: Improvements of Mathematical Fitting (Guest Post), this post compared to the previous one shows that the fitting of the data with the SIR model is tricky, the general problem is that the fitting-algorithm is not always finding it’s way to the best solution.

Why doesn’t the algorithm converge to the optimal solution?

There are two main reasons why the model is not converging well.

Early stopping of the algorithm

The first reason is that the optim algorithm is stopping too early before it has found the right solution.

Ill-conditioned problem

The second reason for the bad convergence behaviour of the algorithm is that the problem is ill-conditioned.

The latest data from:

https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series

x <- util_scrape_table("https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")

deaths <- util_scrape_table("https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv")
recovered <- util_scrape_table("https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv")

y <- x %>% row_to_names(1) %>% clean_names %>% remove_empty() %>%
    gather(date, infections, -1:-4) %>%
    mutate(date = str_remove_all(date, "x")) %>%
    mutate(date = mdy(date)) %>%
    mutate(infections = parse_number(infections))

dy <- deaths %>% row_to_names(1) %>% clean_names %>% remove_empty() %>%
    gather(date, infections, -1:-4) %>%
    mutate(date = str_remove_all(date, "x")) %>%
    mutate(date = mdy(date)) %>%
    mutate(infections = parse_number(infections))

ry <- recovered %>% row_to_names(1) %>% clean_names %>% remove_empty() %>%
    gather(date, infections, -1:-4) %>%
    mutate(date = str_remove_all(date, "x")) %>%
    mutate(date = mdy(date)) %>%
    mutate(infections = parse_number(infections))
###################
from_date <-  dmy("02/03/2020")
sir_start_date <- "2020-03-02"
####################

pti    <- y %>% filter(country_region == "Portugal") %>%
    filter(date >= from_date) %>%
    group_by(date) %>% summarise(infections_pt = sum(infections)) #%>%

ptd    <- dy %>% filter(country_region == "Portugal") %>%
    filter(date >= from_date) %>%
    group_by(date) %>% summarise(death_pt = sum(infections)) #%>%

ptr    <- ry %>% filter(country_region == "Portugal") %>%
    filter(date >= from_date) %>%
    group_by(date) %>% summarise(recovered_pt = sum(infections)) #%>%



pt <- pti %>%
    left_join(ptd, by ="date") %>%
    left_join(ptr, by ="date")
Infected <- pt$infections_pt
Infected
##  [1]     2     2     5     8    13    20    30    30    41    59    59   112   169   245   331   448   448
## [18]   785  1020  1280  1600  2060  2362  2995  3544  4268  5170  5962  6408  7443  8251  9034  9886 10524
## [35] 11278 11730 12442 13141 13956 15472 15987 16585 16934 17448 18091
D <- pt$death_pt
D
##  [1]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   2   3   6  12  14  23  33  43  60
## [26]  76 100 119 140 160 187 209 246 266 295 311 345 380 409 435 470 504 535 567 599
R <- pt$recovered_pt
R
##  [1]   0   0   0   0   0   0   0   0   0   0   0   1   2   2   3   3   3   3   5   5   5   5  22  22  43
## [26]  43  43  43  43  43  43  68  68  75  75 140 184 196 205 233 266 277 277 347 383
Day <- 1:(length(Infected))
#N <- 1400000000 # population of mainland china
N <- 10276617 # população de Portugal do INE

RSS <- function(parameters) {
    names(parameters) <- c("beta", "gamma")
    out <- ode(y = init, times = Day, func = SIR, parms = parameters)
    fitInfected <- out[,3]
    sum((Infected - fitInfected)^2)
}

SIR <- function(time, state, parameters) {
    par <- as.list(c(state, parameters))
    with(par, {
        dS <- -beta/N * I * S
        dI <- beta/N * I * S - gamma * I
        list(c(dS, dI))
    })
}

SIR2 <- function(time, state, parameters) {
    par <- as.list(c(state, parameters))
    with(par, {
        dS <- I * K * (-S/N *  R0/(R0-1))
        dI <- I * K * ( S/N *  R0/(R0-1) - 1/(R0-1))
        list(c(dS, dI))
    })
}

RSS2 <- function(parameters) {
    names(parameters) <- c("K", "R0")
    out <- ode(y = init, times = Day, func = SIR2, parms = parameters)
    fitInfected <- out[,3]
    #fitInfected <- N-out[,2]
    sum((Infected - fitInfected)^2)
}

### Two functions RSS to do the optimization in a nested way
Infected_MC <- Infected
SIRMC2 <- function(R0,K) {
    parameters <- c(K=K, R0=R0)
    out <- ode(y = init, times = Day, func = SIR2, parms = parameters)
    fitInfected <- out[,3]
    #fitInfected <- N-out[,2]
    RSS <- sum((Infected_MC - fitInfected)^2)
    return(RSS)
}
SIRMC <- function(K) {
    optimize(SIRMC2, lower=1,upper=10^5,K=K, tol = .Machine$double.eps)$objective
}

#####
# wrapper to optimize and return estimated values
getOptim <- function() {
    opt1 <- optimize(SIRMC,lower=0,upper=1, tol = .Machine$double.eps)
    opt2 <- optimize(SIRMC2, lower=1,upper=10^5,K=opt1$minimum, tol = .Machine$double.eps)
    return(list(RSS=opt2$objective,K=opt1$minimum,R0=opt2$minimum))
}

# starting condition
init <- c(S = N-Infected[1], I = Infected[1]-R[1]-D[1])
#init <- c(S = N-Infected[1], I = Infected[1])

Result of estimation of the reproduction number (R0)

Opt <- optimr(par=c(0.5, 0.5), fn=RSS,
              #x=x,
              method = "L-BFGS-B",
              lower=c(0, 0),
              upper = c(1, 1),
              control = list(parscale = c(10^-4,10^-4)))
Opt$message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
#
Opt_par <- setNames(Opt$par, c("beta", "gamma"))
Opt_par
##      beta     gamma 
## 1.0000000 0.7826315
#
R0 <- as.numeric(Opt_par[1] / Opt_par[2])
R0
## [1] 1.277741

Currently the calculation of R0 by the model of the previous post

gives: R0 = 1.2777406

# fit <- data.frame(ode(y = init, times = t, func = SIR, parms = Opt_par))
#
# height_pand<- fit[fit$I == max(fit$I), "I", drop = FALSE] # height of pandemic
# height_pand
# attributes(height_pand)
# height_day <- as.integer(row.names(height_pand))
# from_date + height_day # height of pandemic
# # [1] "2020-05-02"
################################################################
Opt3 <- getOptim()
Opt3
## $RSS
## [1] 163285478
## 
## $K
## [1] 0.2632098
## 
## $R0
## [1] 1.06201
Opt3$R0
## [1] 1.06201
#  Opt3$R0
# [1] 1.0607

# Opt_par2 <- setNames(Opt2$par, c("K", "R0"))
# Opt_par2

R0 by the current model

The current model gives: R0 = 1.0620105 as we see the result is quite different.

Some differences of the SIR model

This SIR model uses the parameters $K = \beta - \gamma$ and $R_0 = \frac{\beta}{\gamma}$

Opt_par3 <- setNames(Opt3[2:3], c("K", "R0"))
Opt_par3
## $K
## [1] 0.2632098
## 
## $R0
## [1] 1.06201
# plotting the result
t <- 1:80 # time in days

fit3 <- data.frame(ode(y = init, times = t, func = SIR2, parms = Opt_par3))

par(mar = c(1, 2, 3, 1.1)) # Set the margin on all sides to 2
# par(mar = c(bottom, left, top, right))
old <- par(mfrow = c(1, 1))

#Opt3 <- getOptim()
#Opt_par3 <- setNames(Opt3[2:3], c("K", "R0"))
Opt_par3
## $K
## [1] 0.2632098
## 
## $R0
## [1] 1.06201
##############
from_date <-  dmy("02/03/2020")
sir_start_date <- "2020-03-02"

height_pand<- fit3[fit3$I == max(fit3$I), "I", drop = FALSE] # height of pandemic
height_pand
##           I
## 41 17870.26
#           I
# 57 330697.5 # 10-04-2020
# 58 318210.1 # 11-04-2020
attributes(height_pand)
## $names
## [1] "I"
## 
## $row.names
## [1] 41
## 
## $class
## [1] "data.frame"
height_day <- as.integer(row.names(height_pand))
height_day
## [1] 41
max_infected <- max(fit3$I)
height_of_pandemic <- max_infected

forecast_infections <- fit3$I

max_infected_day <- from_date + height_day # height of pandemic
max_infected_day
## [1] "2020-04-12"
# [1] "2020-04-11"

height_of_pandemic_date <- max_infected_day

actual_infections <- Infected

days_from_inic <- data.frame(
    date = as.Date(from_date + t))

#
fit_i <- fit3 %>%
    cbind(days_from_inic)

fit_i$actual_infections <- c(Infected, rep(NA, nrow(fit_i)-length(Infected)))

#
full_forecast_plot <- fit_i %>% # output_df
  ggplot(aes(x = date)) +
  geom_line(aes(y = I / 1000), colour = "blue") +
  geom_point(aes(y = actual_infections / 1000), colour = "red") +
  geom_vline(xintercept = height_of_pandemic_date, line_type = "dotted") +
  annotate(geom = "text",
           x = height_of_pandemic_date,
           y = (height_of_pandemic / 1000) / 2,
           inherits = FALSE,
           label = str_glue("Max. infections of {comma(height_of_pandemic / 1000, accuracy = 0.01)} on {height_of_pandemic_date}"),
           color = "black",
           angle = 90,
           vjust = -0.25,
           size = 3) + #  size = 4
  
  theme_bw() +
  xlab("") +
  ylab("Number of Infections - Thousands") +
  scale_x_date(breaks = scales::pretty_breaks(n = 12))+   # for one year date labels
  scale_y_continuous(label = comma) +
  ggtitle(label = "Portugal - Corona Virus Infections Forecast",
          subtitle = "Blue Points are Forecast and Red Points are Actuals")
# caption  = "Data: https://github.com/CSSEGISandData/COVID-19")


full_forecast_plot

plot of chunk unnamed-chunk-6

height_of_pandemic_date
## [1] "2020-04-12"

Final conclusions

Based on this model the peak of the epidemic has already happened on day 2020-04-12 with a maximum of 17 870.
This is all we want, but it doesn’t mean it is correct. Just because we use a mathematical model does not mean that our conclusions/predictions are trustworthy. We need to challenge the premises which are the underlying data and models.

References

The importance of “flattening the curve”

Lessons from the 1918 Spanish flu pandemic

The world of 2020 is vastly different from 1918, the year Spanish flu began to spread around the world. By 1920, Spanish flu is thought to have claimed the lives of up to 100 million people. But, as science writer and journalist Laura Spinney notes, many of the public health measures were similar to measures governments are taking today.

This chart of the 1918 Spanish flu shows why social distancing works

In 1918, the city of Philadelphia threw a parade that killed thousands of people. Ignoring warnings of influenza among soldiers preparing for World War I, the march to support the war effort drew 200,000 people who crammed together to watch the procession. Three days later, every bed in Philadelphia’s 31 hospitals was filled with sick and dying patients, infected by the Spanish flu. By the end of the week, more than 4,500 were dead in an outbreak that would claim as many as 100 million people worldwide. By the time Philadelphia’s politicians closed down the city, it was too late.

Proceedings of the National Academy of Sciences

A different story played out in St. Louis, just 900 miles away. Within two days of detecting its first cases among civilians, the city closed schools, playgrounds, libraries, courtrooms, and even churches. Work shifts were staggered and streetcar ridership was strictly limited. Public gatherings of more than 20 people were banned. The extreme measures—now known as social distancing, which is being called for by global health agencies to mitigate the spread of the novel coronavirus—kept per capita flu-related deaths in St. Louis to less than half of those in Philadelphia, according to a 2007 paper in the Proceedings of the National Academy of Sciences.

What The 1918 Flu Pandemic Teaches Us About The Coronavirus Outbreak

What The 1918 Flu Pandemic Teaches Us


John Barry, professor at the Tulane University School of Public Health and Tropical Medicine. Author of “The Great Influenza: The Story of the Deadliest Pandemic in History.(@johnmbarry)
Jack Beatty, On Point news analyst. (@JackBeattyNPR)

The COVID-19 curve

Reducing the basic reproduction number by drastically reducing contacts or quickly isolating infectious diseases also reduces the size of the outbreak. Using a simple model to illustrate this point.

The SIR model

The SIR model is one of the simplest compartmental models, and many models are derivatives of this basic form. The model consists of three compartments: S for the number of susceptible, I for the number of infectious, and R for the number of recovered or deceased (or immune) individuals. Each member of the population typically progresses from susceptible to infectious to recovered. This can be shown as a flow diagram in which the boxes represent the different compartments and the arrows the transition between compartments, i.e.

To model the dynamics of the outbreak we need three differential equations, one for the change in each group, where $\beta$ is the parameter that controls the transition between S and I and $\gamma$ which controls the transition between and :

#
x <- util_scrape_table("https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")

deaths <- util_scrape_table("https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv")
recovered <- util_scrape_table("https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv")

y <- x %>% row_to_names(1) %>% clean_names %>% remove_empty() %>%
  gather(date, infections, -1:-4) %>%
  mutate(date = str_remove_all(date, "x")) %>%
  mutate(date = mdy(date)) %>%
  mutate(infections = parse_number(infections))

dy <- deaths %>% row_to_names(1) %>% clean_names %>% remove_empty() %>%
  gather(date, infections, -1:-4) %>%
  mutate(date = str_remove_all(date, "x")) %>%
  mutate(date = mdy(date)) %>%
  mutate(infections = parse_number(infections))

ry <- recovered %>% row_to_names(1) %>% clean_names %>% remove_empty() %>%
  gather(date, infections, -1:-4) %>%
  mutate(date = str_remove_all(date, "x")) %>%
  mutate(date = mdy(date)) %>%
  mutate(infections = parse_number(infections))

#dy

#time_series_covid19_all <- rbind(y, dy, ry)
#time_series_covid19_all

from_date <-  dmy("02/03/2020")
sir_start_date <- "2020-03-02"

#y$country_region
#filter(y, country_region == "Portugal")
#filter(dy, country_region == "Portugal")
#filter(ry, country_region == "Portugal")


pt    <- y %>% filter(country_region == "Portugal") %>%
  filter(date >= from_date) %>%
  group_by(date, province_state) %>% summarise(infections_pt = sum(infections)) #%>%
  #filter(province_state == "New South Wales")
# group_by(date) %>% summarise(infections_pt = sum(infections))

#pt

Infected <- pt$infections_pt
Infected
##  [1]     2     2     5     8    13    20    30    30    41    59    59   112   169   245   331   448   448
## [18]   785  1020  1280  1600  2060  2362  2995  3544  4268  5170  5962  6408  7443  8251  9034  9886 10524
## [35] 11278 11730 12442 13141 13956
Day <- 1:(length(Infected))
N <- 10276617 # população de Portugal do INE

Putting this into R code:

SIR <- function(time, state, parameters) {
  par <- as.list(c(state, parameters))
  with(par, {
    dS <- -beta/N * I * S
    dI <- beta/N * I * S - gamma * I
    dR <- gamma * I
    list(c(dS, dI, dR))
  })
}

To fit the model to the data we need two things: a solver for differential equations and an optimizer. To solve differential equations the function ode from the deSolve package (on CRAN) is an excellent choice, to optimize we will use the optim function from base R. Concretely, we will minimize the sum of the squared differences between the number of infected I at time t and the corresponding number of predicted cases by our model Î(t):

$RSS(\beta, \gamma )=\sum_{t}(I(t) - Î(t)^2$

Putting it all together:

library(deSolve)
init <- c(S = N-Infected[1], I = Infected[1], R = 0)
RSS <- function(parameters) {
  names(parameters) <- c("beta", "gamma")
  out <- ode(y = init, times = Day, func = SIR, parms = parameters)
  fit <- out[ , 3]
  sum((Infected - fit)^2)
}
 
Opt <- optim(c(0.5, 0.5), RSS, 
             method = "L-BFGS-B", 
             lower = c(0, 0), 
             upper = c(1, 1),
             control = list(parscale = c(10^-4,10^-4))) # optimize with some sensible conditions
Opt$message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
 
Opt_par <- setNames(Opt$par, c("beta", "gamma"))
Opt_par
##      beta     gamma 
## 1.0000000 0.7571908
t <- 1:80 # time in days
fit <- data.frame(ode(y = init, times = t, func = SIR, parms = Opt_par))
col <- 1:4 # colour

old <- par(mfrow = c(1, 2))
# Plot Margins
par(mar = c(1, 2, 3, 1)) # Set the margin on all sides to 2
# par(mar = c(bottom, left, top, right))

matplot(fit$time, fit[ , 2:4], type = "l", xlab = "Day", ylab = "Number of subjects", lwd = 2, lty = 1, col = col)
matplot(fit$time, fit[ , 2:4], type = "l", xlab = "Day", ylab = "Number of subjects", lwd = 2, lty = 1, col = col, log = "y")
 
points(Day, Infected,  col = "blue")

legend("bottomleft", c("Susceptibles", "Forcast Infecteds", "Forecast Recovereds", "Infecteds"), cex=0.9, lty = 1, lwd = 3, col = col, inset = c(0.19, 0.01), box.col="green")

title("SIR model COVID-19 Portugal", outer = TRUE, line = -2)

plot of chunk unnamed-chunk-4

We can now extract some interesting statistics. One important number is the so-called basic reproduction number (also basic reproduction ratio) $R_0$ (pronounced “R naught”) which basically shows how many healthy people get infected by a sick person on average: $R_0 = \frac{\beta}{\gamma}$

par(old)
 
R0 <- setNames(Opt_par["beta"] / Opt_par["gamma"], "R0")
R0
##       R0 
## 1.320671
height_pand <- fit[fit$I == max(fit$I), "I", drop = FALSE] # height of pandemic
height_pand
##           I
## 57 330697.5
height_day <- as.integer(row.names(height_pand))

max_infected <- max(fit$I)
max_infected
## [1] 330697.5
max_infected / 5 # severe cases
## [1] 66139.5
max_infected_day <- from_date + height_day # height of pandemic
max_infected_day
## [1] "2020-04-28"
max_infected * 0.06 # cases with need for intensive care
## [1] 19841.85
max_deaths <- max(fit$I) * 0.02 # max deaths with supposed 2% fatality rate
max_deaths
## [1] 6613.95

According to this model, the height of a possible pandemic would be reached by 2020-04-28 (57 days after it started). About 330 698 people would be infected by then, which translates to about 66 140 severe cases, about 19 842 cases in need of intensive care and up to 6 614 deaths.
Those are the numbers our model produces and nobody knows whether they are correct while everybody hopes they are not.
Do not panic! All of this is hopefully (probably!) false. When you play along with the above model you will see that the fitted parameters are far from stable…

References

The importance of “flattening the curve”

Lessons from the 1918 Spanish flu pandemic

The world of 2020 is vastly different from 1918, the year Spanish flu began to spread around the world. By 1920, Spanish flu is thought to have claimed the lives of up to 100 million people. But, as science writer and journalist Laura Spinney notes, many of the public health measures were similar to measures governments are taking today.

This chart of the 1918 Spanish flu shows why social distancing works

In 1918, the city of Philadelphia threw a parade that killed thousands of people. Ignoring warnings of influenza among soldiers preparing for World War I, the march to support the war effort drew 200,000 people who crammed together to watch the procession. Three days later, every bed in Philadelphia’s 31 hospitals was filled with sick and dying patients, infected by the Spanish flu. By the end of the week, more than 4,500 were dead in an outbreak that would claim as many as 100 million people worldwide. By the time Philadelphia’s politicians closed down the city, it was too late.

Proceedings of the National Academy of Sciences

A different story played out in St. Louis, just 900 miles away. Within two days of detecting its first cases among civilians, the city closed schools, playgrounds, libraries, courtrooms, and even churches. Work shifts were staggered and streetcar ridership was strictly limited. Public gatherings of more than 20 people were banned. The extreme measures—now known as social distancing, which is being called for by global health agencies to mitigate the spread of the novel coronavirus—kept per capita flu-related deaths in St. Louis to less than half of those in Philadelphia, according to a 2007 paper in the Proceedings of the National Academy of Sciences.

What The 1918 Flu Pandemic Teaches Us About The Coronavirus Outbreak

What The 1918 Flu Pandemic Teaches Us


John Barry, professor at the Tulane University School of Public Health and Tropical Medicine. Author of “The Great Influenza: The Story of the Deadliest Pandemic in History.(@johnmbarry)
Jack Beatty, On Point news analyst. (@JackBeattyNPR)

The COVID-19 curve

Reducing the basic reproduction number by drastically reducing contacts or quickly isolating infectious diseases also reduces the size of the outbreak. Using a simple model to illustrate this point.

The SIR model

The SIR model is one of the simplest compartmental models, and many models are derivatives of this basic form. The model consists of three compartments: S for the number of susceptible, I for the number of infectious, and R for the number of recovered or deceased (or immune) individuals. Each member of the population typically progresses from susceptible to infectious to recovered. This can be shown as a flow diagram in which the boxes represent the different compartments and the arrows the transition between compartments, i.e.

To model the dynamics of the outbreak we need three differential equations, one for the change in each group, where $\beta$ is the parameter that controls the transition between S and I and $\gamma$ which controls the transition between and :

#
x <- util_scrape_table("https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")

deaths <- util_scrape_table("https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv")
recovered <- util_scrape_table("https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv")

y <- x %>% row_to_names(1) %>% clean_names %>% remove_empty() %>%
  gather(date, infections, -1:-4) %>%
  mutate(date = str_remove_all(date, "x")) %>%
  mutate(date = mdy(date)) %>%
  mutate(infections = parse_number(infections))

dy <- deaths %>% row_to_names(1) %>% clean_names %>% remove_empty() %>%
  gather(date, infections, -1:-4) %>%
  mutate(date = str_remove_all(date, "x")) %>%
  mutate(date = mdy(date)) %>%
  mutate(infections = parse_number(infections))

ry <- recovered %>% row_to_names(1) %>% clean_names %>% remove_empty() %>%
  gather(date, infections, -1:-4) %>%
  mutate(date = str_remove_all(date, "x")) %>%
  mutate(date = mdy(date)) %>%
  mutate(infections = parse_number(infections))

#dy

#time_series_covid19_all <- rbind(y, dy, ry)
#time_series_covid19_all

from_date <-  dmy("02/03/2020")
sir_start_date <- "2020-03-02"

#y$country_region
#filter(y, country_region == "Portugal")
#filter(dy, country_region == "Portugal")
#filter(ry, country_region == "Portugal")


pt    <- y %>% filter(country_region == "Portugal") %>%
  filter(date >= from_date) %>%
  group_by(date, province_state) %>% summarise(infections_pt = sum(infections)) #%>%
  #filter(province_state == "New South Wales")
# group_by(date) %>% summarise(infections_pt = sum(infections))

#pt

Infected <- pt$infections_pt
Infected
##  [1]     2     2     5     8    13    20    30    30
##  [9]    41    59    59   112   169   245   331   448
## [17]   448   785  1020  1280  1600  2060  2362  2995
## [25]  3544  4268  5170  5962  6408  7443  8251  9034
## [33]  9886 10524 11278 11730 12442 13141
Day <- 1:(length(Infected))
N <- 10276617 # população de Portugal do INE

#setwd("D:/Downloads/work_python_and_R/_My_Blogs_/mpmendespt.github.io/_drafts")

Putting this into R code:

SIR <- function(time, state, parameters) {
  par <- as.list(c(state, parameters))
  with(par, {
    dS <- -beta/N * I * S
    dI <- beta/N * I * S - gamma * I
    dR <- gamma * I
    list(c(dS, dI, dR))
  })
}

To fit the model to the data we need two things: a solver for differential equations and an optimizer. To solve differential equations the function ode from the deSolve package (on CRAN) is an excellent choice, to optimize we will use the optim function from base R. Concretely, we will minimize the sum of the squared differences between the number of infected I at time t and the corresponding number of predicted cases by our model Î(t):

$RSS(\beta, \gamma )=\sum_{t}(I(t) - Î(t)^2$

Putting it all together:

library(deSolve)
init <- c(S = N-Infected[1], I = Infected[1], R = 0)
RSS <- function(parameters) {
  names(parameters) <- c("beta", "gamma")
  out <- ode(y = init, times = Day, func = SIR, parms = parameters)
  fit <- out[ , 3]
  sum((Infected - fit)^2)
}
 
Opt <- optim(c(0.5, 0.5), RSS, 
             method = "L-BFGS-B", 
             lower = c(0, 0), 
             upper = c(1, 1),
             control = list(parscale = c(10^-4,10^-4))) # optimize with some sensible conditions
Opt$message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
 
Opt_par <- setNames(Opt$par, c("beta", "gamma"))
Opt_par
##      beta     gamma 
## 1.0000000 0.7523632
##      beta     gamma
## 1.0000000 0.7523632

t <- 1:80 # time in days
fit <- data.frame(ode(y = init, times = t, func = SIR, parms = Opt_par))
col <- 1:4 # colour

old <- par(mfrow = c(1, 2))
# Plot Margins
par(mar = c(1, 2, 3, 1)) # Set the margin on all sides to 2
# par(mar = c(bottom, left, top, right))

matplot(fit$time, fit[ , 2:4], type = "l", xlab = "Day", ylab = "Number of subjects", lwd = 2, lty = 1, col = col)
matplot(fit$time, fit[ , 2:4], type = "l", xlab = "Day", ylab = "Number of subjects", lwd = 2, lty = 1, col = col, log = "y")
## Warning in xy.coords(x, y, xlabel, ylabel, log = log): 1 y value <= 0
## omitted from logarithmic plot
 
points(Day, Infected,  col = "blue")
#legend("bottomright", c("Susceptibles", "Infecteds", "Recovereds"), lty = 1, lwd = 2, col = col, inset = 0.05)
legend("bottomleft", c("Susceptibles", "Forcast Infecteds", "Forecast Recovereds", "Infecteds"), cex=0.9, lty = 1, lwd = 3, col = col, inset = c(0.19, 0.01), box.col="green")

title("SIR model COVID-19 Portugal", outer = TRUE, line = -2)

plot of chunk unnamed-chunk-12

We can now extract some interesting statistics. One important number is the so-called basic reproduction number (also basic reproduction ratio) $R_0$ (pronounced “R naught”) which basically shows how many healthy people get infected by a sick person on average: $R_0 = \frac{\beta}{\gamma}$

par(old)
 
R0 <- setNames(Opt_par["beta"] / Opt_par["gamma"], "R0")
R0
##       R0 
## 1.329145
##       R0 
## 1.329145
 
height_pand <- fit[fit$I == max(fit$I), "I", drop = FALSE] # height of pandemic
height_pand
##           I
## 56 344799.7
##           I
## 56	344780.1

height_day <- as.integer(row.names(height_pand))

max_infected <- max(fit$I)
max_infected
## [1] 344799.7
## [1] 344799

max_infected / 5 # severe cases
## [1] 68959.93
## 1] 68959

max_infected_day <- from_date + height_day # height of pandemic
max_infected_day
## [1] "2020-04-27"
## [1] "2020-04-27"
 
max_infected * 0.06 # cases with need for intensive care
## [1] 20687.98
## [1] 20687
 
max_deaths <- max(fit$I) * 0.02 # max deaths with supposed 2% fatality rate
max_deaths
## [1] 6895.993
## [1] 6895

According to this model, the height of a possible pandemic would be reached by 2020-04-27 (56 days after it started). About 344 800 people would be infected by then, which translates to about 68 960 severe cases, about 20 688 cases in need of intensive care and up to 6 896 deaths.
Those are the numbers our model produces and nobody knows whether they are correct while everybody hopes they are not.
Do not panic! All of this is hopefully (probably!) false. When you play along with the above model you will see that the fitted parameters are far from stable…

References

MRI image segmentation

Magnetic Resonance Imaging (MRI) is a medical image technique used to sense the irregularities in human bodies. Segmentation technique for Magnetic Resonance Imaging (MRI) of the brain is one of the method used by radiographer to detect any abnormality happened specifically for brain.

In digital image processing, segmentation refers to the process of splitting observe image data to a serial of non-overlapping important homogeneous region. Clustering algorithm is one of the process in segmentation.
Clustering in pattern recognition is the process of partitioning a set of pattern vectors in to subsets called clusters.

There are various image segmentation techniques based on clustering. One example is the K-means clustering.

Image Segmentation

Let’s try the Hierarchial clustering with an MRI image of the brain.
The healthy data set consists of a matrix of intensity values.

healthy = read.csv("healthy.csv", header=FALSE)
healthyMatrix = as.matrix(healthy)
str(healthyMatrix)
##  num [1:566, 1:646] 0.00427 0.00855 0.01282 0.01282 0.01282 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:646] "V1" "V2" "V3" "V4" ...
# Plot image
image(healthyMatrix,axes=FALSE,col=grey(seq(0,1,length=256)))

plot of chunk unnamed-chunk-1

To use hierarchical clustering we first need to convert the healthy matrix to a vector. And then we need to compute the distance matrix.

# Hierarchial clustering
healthyVector = as.vector(healthyMatrix)
distance = dist(healthyVector, method = "euclidean")
## Error: cannot allocate vector of size 498.0 Gb

R gives us an error that seems to tell us that our vector is huge, and R cannot allocate enough memory.

Let’s see the structure of the healthy vector.

str(healthyVector)
##  num [1:365636] 0.00427 0.00855 0.01282 0.01282 0.01282 ...
n <- 365636

The healthy vector has 365636 elements. Let’s call this number n. For R to calculate the pairwise distances, it would need to calculate n*(n-1)/2 and store them in the distance matrix.
This number is 6.6844659 × 1010. So we cannot use hierarchical clustering.

Now let’s try use the k-means clustering algorithm, that aims at partitioning the data into k clusters, in a way that each data point belongs to the cluster whose mean is the nearest to it.

# Specify number of clusters.
# Our clusters would ideally assign each point in the image to a tissue class or a particular substance, for instance, grey matter or white matter, and so on. And these substances are known to the medical community. So setting the number of clusters depends on exactly what you're trying to extract from the image.
k = 5

# Run k-means
set.seed(1)
KMC = kmeans(healthyVector, centers = k, iter.max = 1000)
str(KMC)
## List of 9
##  $ cluster     : int [1:365636] 3 3 3 3 3 3 3 3 3 3 ...
##  $ centers     : num [1:5, 1] 0.4818 0.1062 0.0196 0.3094 0.1842
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:5] "1" "2" "3" "4" ...
##   .. ..$ : NULL
##  $ totss       : num 5775
##  $ withinss    : num [1:5] 96.6 47.2 39.2 57.5 62.3
##  $ tot.withinss: num 303
##  $ betweenss   : num 5472
##  $ size        : int [1:5] 20556 101085 133162 31555 79278
##  $ iter        : int 2
##  $ ifault      : int 0
##  - attr(*, "class")= chr "kmeans"
# Extract clusters
healthyClusters = KMC$cluster
KMC$centers[2]
## [1] 0.1061945

Analyze the clusters

To output the segmented image we first need to convert the vector healthy clusters to a matrix.
We will use the dimension function, that takes as an input the healthyClusters vector. We turn it into a matrix using the combine function, the number of rows, and the number of columns that we want.

# Plot the image with the clusters
dim(healthyClusters) = c(nrow(healthyMatrix), ncol(healthyMatrix))

image(healthyClusters, axes = FALSE, col=rainbow(k))

plot of chunk unnamed-chunk-5

Now we will use the healthy brain vector to analyze a brain with a tumor

The tumor.csv file corresponds to an MRI brain image of a patient with oligodendroglioma, a tumor that commonly occurs in the front lobe of the brain. Since brain biopsy is the only definite diagnosis of this tumor, MRI guidance is key in determining its location and geometry.

Now, we will apply the k-means clustering results that we found using the healthy brain image on the tumor vector. To do this we use the flexclust package.
The flexclust package contains the object class KCCA, which stands for K-Centroids Cluster Analysis. We need to convert the information from the clustering algorithm to an object of the class KCCA.

# Apply to a test image
 
tumor = read.csv("tumor.csv", header=FALSE)
tumorMatrix = as.matrix(tumor)
tumorVector = as.vector(tumorMatrix)

# Apply clusters from before to new image, using the flexclust package
#install.packages("flexclust")
library(flexclust)

KMC.kcca = as.kcca(KMC, healthyVector)
tumorClusters = predict(KMC.kcca, newdata = tumorVector)

# Visualize the clusters
dim(tumorClusters) = c(nrow(tumorMatrix), ncol(tumorMatrix))

f <- function(m) t(m)[,nrow(m):1] # function to rotate our matrix 90 degrees clockwise
tumorClusters <- f(tumorClusters) # rotation achieved by our function


image((tumorClusters), axes = FALSE, col=rainbow(k), useRaster=TRUE)

plot of chunk unnamed-chunk-6

The tumor is the abnormal substance here that is highlighted in red that was not present in the healthy MRI image.

References

Singular Value Decomposition and Image Processing

The singular value decomposition (SVD) is a factorization of a real or complex matrix. It has many useful applications in signal processing and statistics.

Singular Value Decomposition

SVD is the factorization of a \( m \times n \) matrix \( Y \) into three matrices as:

With:

  • \( U \) is an \( m\times n \) orthogonal matrix
  • \( V \) is an \( n\times n \) orthogonal matrix
  • \( D \) is an \( n\times n \) diagonal matrix

In R The result of svd(X) is actually a list of three components named d, u and v, such that Y = U %*% D %*% t(V).

Video about SVD

</embed>

Example

# cleanup
rm(list=ls())

dat <- seq(1,36,2)

Y <- matrix(dat,ncol=6)
Y
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    7   13   19   25   31
## [2,]    3    9   15   21   27   33
## [3,]    5   11   17   23   29   35
# Apply SVD to get U, V, and D
s <- svd(Y)
U <- s$u
V <- s$v
D <- diag(s$d) ##turn it into a matrix
  • we can reconstruct Y
Yhat <- U %*% D %*% t(V)
Yhat
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    7   13   19   25   31
## [2,]    3    9   15   21   27   33
## [3,]    5   11   17   23   29   35
resid <- Y - Yhat
max(abs(resid))
## [1] 2.309264e-14

Image processing

  • Load the image and convert it to a greyscale:
# source("http://bioconductor.org/biocLite.R")
# biocLite("ripa", dependencies=TRUE)
# biocLite("rARPACK", dependencies=TRUE)
# install.packages('devtools')
# library(devtools)
# install_github('ririzarr/rafalib')

# Load libraries
library(rARPACK)
library(ripa)    #  function "imagematrix"
library(EBImage)
library(jpeg)
library(png)
#
library(rafalib)
mypar2()
#
# Read the image
img <- readImage("images/pansy.jpg") 
dim(img)
## [1] 465 600   3
display(img, method = "raster")

plot of chunk unnamed-chunk-3

# Convert image to greyscale
r <- imagematrix(img, type = "grey")

# display(r, method = "raster", all=TRUE)
display(r, method = "raster")

plot of chunk unnamed-chunk-3

dim(r)
## [1] 465 600
str(r)
##  imagematrix [1:465, 1:600] 0.345 0.282 0.29 0.306 0.29 ...
##  - attr(*, "type")= chr "grey"
##  - attr(*, "class")= chr [1:2] "imagematrix" "matrix"
  • Apply SVD to get U, V, and D
# Apply SVD to get u, v, and d
r.svd <- svd(r)

u <- r.svd$u
v <- r.svd$v
d <- diag(r.svd$d)
dim(d)
## [1] 465 465
## [1] 465 465
  • Plot the magnitude of the singular values
# check svd$d values 
# Plot the magnitude of the singular values
sigmas = r.svd$d # diagonal matrix (the entries of which are known as singular values)
plot(1:length(r.svd$d), r.svd$d, xlab="i-th r.svd$d", ylab="r.svd$d",  main="Singular Values");

plot of chunk unnamed-chunk-5

plot(1:length(r.svd$d), cumsum(r.svd$d) / sum(r.svd$d), main="Cumulative Percent of Total Sigmas");

plot of chunk unnamed-chunk-5 Not that, the total of the first n singular values divided by the sum of all the singular values is the percentage of “information” that those singular values contain. If we want to keep 90% of the information, we just need to compute sums of singular values until we reach 90% of the sum, and discard the rest of the singular values.

# first approximation
u1 <- as.matrix(u[-1, 1])
v1 <- as.matrix(v[-1, 1])
d1 <- as.matrix(d[1, 1])
l1 <- u1 %*% d1 %*% t(v1)
l1g <- imagematrix(l1, type = "grey")
#plot(l1g, useRaster = TRUE)
display(l1g, method = "raster", all=TRUE)

plot of chunk unnamed-chunk-6

# more approximation
depth <- 5
us <- as.matrix(u[, 1:depth])
vs <- as.matrix(v[, 1:depth])
ds <- as.matrix(d[1:depth, 1:depth])
ls <- us %*% ds %*% t(vs)
lsg <- imagematrix(ls, type = "grey")

## Warning: Pixel values were automatically clipped because of range over.

#plot(lsg, useRaster = TRUE)
display(lsg, method = "raster")

plot of chunk unnamed-chunk-6

# more approximation
depth <- 20
us <- as.matrix(u[, 1:depth])
vs <- as.matrix(v[, 1:depth])
ds <- as.matrix(d[1:depth, 1:depth])
ls <- us %*% ds %*% t(vs)
lsg <- imagematrix(ls, type = "grey")

## Warning: Pixel values were automatically clipped because of range over.

#plot(lsg, useRaster = TRUE)
display(lsg, method = "raster")

plot of chunk unnamed-chunk-6

Image Compression with the SVD

Here we continue to show how the SVD can be used for image compression (as we have seen above).

factorize = function(m, k){
  r = svds(m[, , 1], k);
  g = svds(m[, , 2], k);
  b = svds(m[, , 3], k);
  return(list(r = r, g = g, b = b));
}

recoverimg = function(lst, k){
  recover0 = function(fac, k){
    dmat = diag(k);
    diag(dmat) = fac$d[1:k];
    m = fac$u[, 1:k] %*% dmat %*% t(fac$v[, 1:k]);
    m[m < 0] = 0;
    m[m > 1] = 1;
    return(m);
  }

  r = recover0(lst$r, k);
  g = recover0(lst$g, k);
  b = recover0(lst$b, k);
  m = array(0, c(nrow(r), ncol(r), 3));
  m[, , 1] = r;
  m[, , 2] = g;
  m[, , 3] = b;
  return(m);
}

rawimg <- readJPEG("images/pansy.jpg");

lst = factorize(rawimg, 100);

neig = c(1, 5, 20, 50, 100);
for(i in neig){
  m = recoverimg(lst, i);
  writeJPEG(m, sprintf("images/svd_%d.jpg", i), 0.95);
  #display(m, method = "raster")
  fname <- sprintf("images/svd_%d.jpg", i)
  display(readImage(fname), title="svd_%d", method = "raster")
}
  • Original image
    plot of chunk unnamed-chunk-3
  • Singluar Value k = 1
    plot of chunk unnamed-chunk-7
  • Singluar Value k = 5
    plot of chunk unnamed-chunk-7
  • Singluar Value k = 20
    plot of chunk unnamed-chunk-7
  • Singluar Value k = 50
    plot of chunk unnamed-chunk-7
  • Singluar Value k = 100
    plot of chunk unnamed-chunk-7

  • Analysis

With only 10% of the real data we are able to create a very good approximation of the real data.

References

Image Processing and Spatial linear transformations

We can think of an image as a function, f, from (or a 2D signal):

  • f (x,y) gives the intensity at position (x,y)

Realistically, we expect the image only to be defined over a rectangle, with a finite range:
f: [a,b]x[c,d] -> [0,1]

A color image is just three functions pasted together. We can write this as a “vector-valued” function:

  • Computing Transformations

If you have a transformation matrix you can evaluate the transformation that would be performed by multiplying the transformation matrix by the original array of points.

Examples of Transformations in 2D Graphics

In 2D graphics Linear transformations can be represented by 2x2 matrices. Most common transformations such as rotation, scaling, shearing, and reflection are linear transformations and can be represented in the 2x2 matrix. Other affine transformations can be represented in a 3x3 matrix.

Rotation

For rotation by an angle θ clockwise about the origin, the functional form is \( x’ = xcosθ + ysinθ \)
and \( y’ = − xsinθ + ycosθ \). Written in matrix form, this becomes:

Scaling

For scaling we have \( x’; = s_x \cdot x \) and \( y’; = s_y \cdot y \). The matrix form is:

Shearing

For shear mapping (visually similar to slanting), there are two possibilities.
For a shear parallel to the x axis has \( x’; = x + ky \) and \( y’; = y \) ; the shear matrix, applied to column vectors, is:

A shear parallel to the y axis has \( x’; = x \) and \( y’; = y + kx \) , which has matrix form:

Image Processing

The package EBImage is an R package which provides general purpose functionality for the reading, writing, processing and analysis of images.

# source("http://bioconductor.org/biocLite.R")
# biocLite()
# biocLite("EBImage")
library(EBImage)

# Reading Image
img <- readImage("images/lena_std.tif")

#str(img1)
dim(img)
## [1] 512 512   3
display(img, method = "raster")

plot of chunk unnamed-chunk-1

Image Properties

Images are stored as multi-dimensional arrays containing the pixel intensities. All EBImage functions are also able to work with matrices and arrays.

print(img)
## Image 
##   colorMode    : Color 
##   storage.mode : double 
##   dim          : 512 512 3 
##   frames.total : 3 
##   frames.render: 1 
## 
## imageData(object)[1:5,1:6,1]
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.8862745 0.8862745 0.8862745 0.8862745 0.8862745 0.8901961
## [2,] 0.8862745 0.8862745 0.8862745 0.8862745 0.8862745 0.8901961
## [3,] 0.8745098 0.8745098 0.8745098 0.8745098 0.8745098 0.8901961
## [4,] 0.8745098 0.8745098 0.8745098 0.8745098 0.8745098 0.8705882
## [5,] 0.8862745 0.8862745 0.8862745 0.8862745 0.8862745 0.8862745
  • Adjusting Brightness
img1 <- img + 0.2
img2 <- img - 0.3
display(img1, method = "raster")

plot of chunk unnamed-chunk-3

display(img2, method = "raster")

plot of chunk unnamed-chunk-3

  • Adjusting Contrast
img1 <- img * 0.5
img2 <- img * 2
display(img1, method = "raster")

plot of chunk unnamed-chunk-4

display(img2, method = "raster")

plot of chunk unnamed-chunk-4

  • Gamma Correction
img1 <- img ^ 4
img2 <- img ^ 0.9
img3 <- (img + 0.2) ^3
display(img1, method = "raster")

plot of chunk unnamed-chunk-5

display(img2, method = "raster")

plot of chunk unnamed-chunk-5

display(img3, method = "raster")

plot of chunk unnamed-chunk-5

  • Cropping Image
img1 <-img[(1:400)+100, 1:400,]
display(img1, method = "raster")

plot of chunk unnamed-chunk-6

Spatial Transformation

Spatial image transformations are done with the functions resize, rotate, translate and the functions flip and flop to reflect images.

Next we show the functions flip, flop, rotate and translate:

y <- flip(img)
display(y, title='flip(img)', method = "raster")

plot of chunk unnamed-chunk-7

y = flop(img) 
display(y, title='flop(img)', method = "raster")

plot of chunk unnamed-chunk-7

y <- rotate(img, 30) 
display(y, title='rotate(img, 30)', method = "raster")

plot of chunk unnamed-chunk-7

y <- translate(img, c(120, -20)) 
display(y, title='translate(img, c(120, -20))', method = "raster")

plot of chunk unnamed-chunk-7

All spatial transforms except flip and flop are based on the general affine transformation.

Linear transformations using the function affine:

  • Horizontal flip
m <- matrix(c(-1, 0, 0, 1, 512, 0), nrow=3,  ncol=2, byrow = TRUE)
m  # Horizontal flip
##      [,1] [,2]
## [1,]   -1    0
## [2,]    0    1
## [3,]  512    0
display(affine(img, m),  method = "raster")

plot of chunk unnamed-chunk-8

  • Horizontal shear
m <- matrix(c(1, 1/2, 0, 1, 0, -100), nrow=3,  ncol=2, byrow = TRUE)
m  # horizontal shear  r = 1/2 
##      [,1]   [,2]
## [1,]    1    0.5
## [2,]    0    1.0
## [3,]    0 -100.0
display(affine(img, m),  method = "raster")

plot of chunk unnamed-chunk-9

  • Rotation by π/6
m <- matrix(c(cos(pi/6), -sin(pi/6), sin(pi/6) , cos(pi/6), -100, 100), nrow=3,  ncol=2, byrow = TRUE)
m  # Rotation by π/6
##              [,1]        [,2]
## [1,]    0.8660254  -0.5000000
## [2,]    0.5000000   0.8660254
## [3,] -100.0000000 100.0000000
display(affine(img, m),  method = "raster")

plot of chunk unnamed-chunk-10

  • Squeeze mapping with r=3/2
m <- matrix(c(3/2, 0, 0, 2/3, 0, 100), nrow=3,  ncol=2, byrow = TRUE)
m  # Squeeze mapping with r=3/2
##      [,1]        [,2]
## [1,]  1.5   0.0000000
## [2,]  0.0   0.6666667
## [3,]  0.0 100.0000000
display(affine(img, m),  method = "raster")

plot of chunk unnamed-chunk-11

  • Scaling by a factor of 3/2
m <- matrix(c(3/2, 0, 0, 3/2, -100, -100), nrow=3,  ncol=2, byrow = TRUE)
m  # Scaling by a factor of 3/2
##        [,1]   [,2]
## [1,]    1.5    0.0
## [2,]    0.0    1.5
## [3,] -100.0 -100.0
display(affine(img, m),  method = "raster")

plot of chunk unnamed-chunk-12

  • Scaling horizontally by a factor of 1/2
m <- matrix(c(1/2, 0, 0, 1, 0, 0), nrow=3,  ncol=2, byrow = TRUE)
m  # scale a figure horizontally  r = 1/2 
##      [,1] [,2]
## [1,]  0.5    0
## [2,]  0.0    1
## [3,]  0.0    0
display(affine(img, m),  method = "raster")

plot of chunk unnamed-chunk-13

References

<<<<<<< HEAD