COVID-19 outbreak: the case of Portugal(Update)

COVID-19: a data epidemic

With the outbreak of COVID-19 one thing that is certain is that never before a virus has gone so much viral on the internet. Especially, a lot of data about the spread of the virus is going around.

In the space of a few weeks, complex terms like “group immunity” (herd immunity) or “flattening the curve” started to circulate on social networks.

Here we will try an approach that demonstrates the limitations of the model from the previous post The importance of “flattening the curve”.
This post is based on Contagiousness of COVID-19 Part I: Improvements of Mathematical Fitting (Guest Post), this post compared to the previous one shows that the fitting of the data with the SIR model is tricky, the general problem is that the fitting-algorithm is not always finding it’s way to the best solution.

Why doesn’t the algorithm converge to the optimal solution?

There are two main reasons why the model is not converging well.

Early stopping of the algorithm

The first reason is that the optim algorithm is stopping too early before it has found the right solution.

Ill-conditioned problem

The second reason for the bad convergence behaviour of the algorithm is that the problem is ill-conditioned.

The latest data from:

https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series

Currently the calculation of R0 by the model of the previous post

gives: R0 = 1.2777406

R0 by the current model

The current model gives: R0 = 1.0620105 as we see the result is quite different.

Some differences of the SIR model

This SIR model uses the parameters $K = \beta - \gamma$ and $R_0 = \frac{\beta}{\gamma}$

Final conclusions

Based on this model the peak of the epidemic has already happened on day 2020-04-12 with a maximum of 17 870.
This is all we want, but it doesn’t mean it is correct. Just because we use a mathematical model does not mean that our conclusions/predictions are trustworthy. We need to challenge the premises which are the underlying data and models.

The importance of “flattening the curve”

Lessons from the 1918 Spanish flu pandemic

The world of 2020 is vastly different from 1918, the year Spanish flu began to spread around the world. By 1920, Spanish flu is thought to have claimed the lives of up to 100 million people. But, as science writer and journalist Laura Spinney notes, many of the public health measures were similar to measures governments are taking today.

This chart of the 1918 Spanish flu shows why social distancing works

In 1918, the city of Philadelphia threw a parade that killed thousands of people. Ignoring warnings of influenza among soldiers preparing for World War I, the march to support the war effort drew 200,000 people who crammed together to watch the procession. Three days later, every bed in Philadelphia’s 31 hospitals was filled with sick and dying patients, infected by the Spanish flu. By the end of the week, more than 4,500 were dead in an outbreak that would claim as many as 100 million people worldwide. By the time Philadelphia’s politicians closed down the city, it was too late.

Proceedings of the National Academy of Sciences

A different story played out in St. Louis, just 900 miles away. Within two days of detecting its first cases among civilians, the city closed schools, playgrounds, libraries, courtrooms, and even churches. Work shifts were staggered and streetcar ridership was strictly limited. Public gatherings of more than 20 people were banned. The extreme measures—now known as social distancing, which is being called for by global health agencies to mitigate the spread of the novel coronavirus—kept per capita flu-related deaths in St. Louis to less than half of those in Philadelphia, according to a 2007 paper in the Proceedings of the National Academy of Sciences.

What The 1918 Flu Pandemic Teaches Us About The Coronavirus Outbreak

John Barry, professor at the Tulane University School of Public Health and Tropical Medicine. Author of “The Great Influenza: The Story of the Deadliest Pandemic in History.(@johnmbarry)
Jack Beatty, On Point news analyst. (@JackBeattyNPR)

The COVID-19 curve

Reducing the basic reproduction number by drastically reducing contacts or quickly isolating infectious diseases also reduces the size of the outbreak. Using a simple model to illustrate this point.

The SIR model

The SIR model is one of the simplest compartmental models, and many models are derivatives of this basic form. The model consists of three compartments: S for the number of susceptible, I for the number of infectious, and R for the number of recovered or deceased (or immune) individuals. Each member of the population typically progresses from susceptible to infectious to recovered. This can be shown as a flow diagram in which the boxes represent the different compartments and the arrows the transition between compartments, i.e.

To model the dynamics of the outbreak we need three differential equations, one for the change in each group, where $\beta$ is the parameter that controls the transition between S and I and $\gamma$ which controls the transition between and :

Putting this into R code:

To fit the model to the data we need two things: a solver for differential equations and an optimizer. To solve differential equations the function ode from the deSolve package (on CRAN) is an excellent choice, to optimize we will use the optim function from base R. Concretely, we will minimize the sum of the squared differences between the number of infected I at time t and the corresponding number of predicted cases by our model Î(t):

$RSS(\beta, \gamma )=\sum_{t}(I(t) - Î(t)^2$

Putting it all together:

We can now extract some interesting statistics. One important number is the so-called basic reproduction number (also basic reproduction ratio) $R_0$ (pronounced “R naught”) which basically shows how many healthy people get infected by a sick person on average: $R_0 = \frac{\beta}{\gamma}$

According to this model, the height of a possible pandemic would be reached by 2020-04-28 (57 days after it started). About 330 698 people would be infected by then, which translates to about 66 140 severe cases, about 19 842 cases in need of intensive care and up to 6 614 deaths.
Those are the numbers our model produces and nobody knows whether they are correct while everybody hopes they are not.
Do not panic! All of this is hopefully (probably!) false. When you play along with the above model you will see that the fitted parameters are far from stable…

The importance of “flattening the curve”

Lessons from the 1918 Spanish flu pandemic

The world of 2020 is vastly different from 1918, the year Spanish flu began to spread around the world. By 1920, Spanish flu is thought to have claimed the lives of up to 100 million people. But, as science writer and journalist Laura Spinney notes, many of the public health measures were similar to measures governments are taking today.

This chart of the 1918 Spanish flu shows why social distancing works

In 1918, the city of Philadelphia threw a parade that killed thousands of people. Ignoring warnings of influenza among soldiers preparing for World War I, the march to support the war effort drew 200,000 people who crammed together to watch the procession. Three days later, every bed in Philadelphia’s 31 hospitals was filled with sick and dying patients, infected by the Spanish flu. By the end of the week, more than 4,500 were dead in an outbreak that would claim as many as 100 million people worldwide. By the time Philadelphia’s politicians closed down the city, it was too late.

Proceedings of the National Academy of Sciences

A different story played out in St. Louis, just 900 miles away. Within two days of detecting its first cases among civilians, the city closed schools, playgrounds, libraries, courtrooms, and even churches. Work shifts were staggered and streetcar ridership was strictly limited. Public gatherings of more than 20 people were banned. The extreme measures—now known as social distancing, which is being called for by global health agencies to mitigate the spread of the novel coronavirus—kept per capita flu-related deaths in St. Louis to less than half of those in Philadelphia, according to a 2007 paper in the Proceedings of the National Academy of Sciences.

What The 1918 Flu Pandemic Teaches Us About The Coronavirus Outbreak

John Barry, professor at the Tulane University School of Public Health and Tropical Medicine. Author of “The Great Influenza: The Story of the Deadliest Pandemic in History.(@johnmbarry)
Jack Beatty, On Point news analyst. (@JackBeattyNPR)

The COVID-19 curve

Reducing the basic reproduction number by drastically reducing contacts or quickly isolating infectious diseases also reduces the size of the outbreak. Using a simple model to illustrate this point.

The SIR model

The SIR model is one of the simplest compartmental models, and many models are derivatives of this basic form. The model consists of three compartments: S for the number of susceptible, I for the number of infectious, and R for the number of recovered or deceased (or immune) individuals. Each member of the population typically progresses from susceptible to infectious to recovered. This can be shown as a flow diagram in which the boxes represent the different compartments and the arrows the transition between compartments, i.e.

To model the dynamics of the outbreak we need three differential equations, one for the change in each group, where $\beta$ is the parameter that controls the transition between S and I and $\gamma$ which controls the transition between and :

Putting this into R code:

To fit the model to the data we need two things: a solver for differential equations and an optimizer. To solve differential equations the function ode from the deSolve package (on CRAN) is an excellent choice, to optimize we will use the optim function from base R. Concretely, we will minimize the sum of the squared differences between the number of infected I at time t and the corresponding number of predicted cases by our model Î(t):

$RSS(\beta, \gamma )=\sum_{t}(I(t) - Î(t)^2$

Putting it all together:

We can now extract some interesting statistics. One important number is the so-called basic reproduction number (also basic reproduction ratio) $R_0$ (pronounced “R naught”) which basically shows how many healthy people get infected by a sick person on average: $R_0 = \frac{\beta}{\gamma}$

According to this model, the height of a possible pandemic would be reached by 2020-04-27 (56 days after it started). About 344 800 people would be infected by then, which translates to about 68 960 severe cases, about 20 688 cases in need of intensive care and up to 6 896 deaths.
Those are the numbers our model produces and nobody knows whether they are correct while everybody hopes they are not.
Do not panic! All of this is hopefully (probably!) false. When you play along with the above model you will see that the fitted parameters are far from stable…

MRI image segmentation

Magnetic Resonance Imaging (MRI) is a medical image technique used to sense the irregularities in human bodies. Segmentation technique for Magnetic Resonance Imaging (MRI) of the brain is one of the method used by radiographer to detect any abnormality happened specifically for brain.

In digital image processing, segmentation refers to the process of splitting observe image data to a serial of non-overlapping important homogeneous region. Clustering algorithm is one of the process in segmentation.
Clustering in pattern recognition is the process of partitioning a set of pattern vectors in to subsets called clusters.

There are various image segmentation techniques based on clustering. One example is the K-means clustering.

Image Segmentation

Let’s try the Hierarchial clustering with an MRI image of the brain.
The healthy data set consists of a matrix of intensity values.

To use hierarchical clustering we first need to convert the healthy matrix to a vector. And then we need to compute the distance matrix.

R gives us an error that seems to tell us that our vector is huge, and R cannot allocate enough memory.

Let’s see the structure of the healthy vector.

The healthy vector has 365636 elements. Let’s call this number n. For R to calculate the pairwise distances, it would need to calculate n*(n-1)/2 and store them in the distance matrix.
This number is 6.6844659 × 1010. So we cannot use hierarchical clustering.

Now let’s try use the k-means clustering algorithm, that aims at partitioning the data into k clusters, in a way that each data point belongs to the cluster whose mean is the nearest to it.

Analyze the clusters

To output the segmented image we first need to convert the vector healthy clusters to a matrix.
We will use the dimension function, that takes as an input the healthyClusters vector. We turn it into a matrix using the combine function, the number of rows, and the number of columns that we want.

Now we will use the healthy brain vector to analyze a brain with a tumor

The tumor.csv file corresponds to an MRI brain image of a patient with oligodendroglioma, a tumor that commonly occurs in the front lobe of the brain. Since brain biopsy is the only definite diagnosis of this tumor, MRI guidance is key in determining its location and geometry.

Now, we will apply the k-means clustering results that we found using the healthy brain image on the tumor vector. To do this we use the flexclust package.
The flexclust package contains the object class KCCA, which stands for K-Centroids Cluster Analysis. We need to convert the information from the clustering algorithm to an object of the class KCCA.

The tumor is the abnormal substance here that is highlighted in red that was not present in the healthy MRI image.

Singular Value Decomposition and Image Processing

The singular value decomposition (SVD) is a factorization of a real or complex matrix. It has many useful applications in signal processing and statistics.

Singular Value Decomposition

SVD is the factorization of a $m \times n$ matrix $Y$ into three matrices as:

With:

• $U$ is an $m\times n$ orthogonal matrix
• $V$ is an $n\times n$ orthogonal matrix
• $D$ is an $n\times n$ diagonal matrix

In R The result of svd(X) is actually a list of three components named d, u and v, such that Y = U %*% D %*% t(V).

</embed>

Example

• we can reconstruct Y

Image processing

• Load the image and convert it to a greyscale:

• Apply SVD to get U, V, and D
• Plot the magnitude of the singular values

Not that, the total of the first n singular values divided by the sum of all the singular values is the percentage of “information” that those singular values contain. If we want to keep 90% of the information, we just need to compute sums of singular values until we reach 90% of the sum, and discard the rest of the singular values.

Image Compression with the SVD

Here we continue to show how the SVD can be used for image compression (as we have seen above).

• Original image
• Singluar Value k = 1
• Singluar Value k = 5
• Singluar Value k = 20
• Singluar Value k = 50
• Singluar Value k = 100

• Analysis

With only 10% of the real data we are able to create a very good approximation of the real data.

Image Processing and Spatial linear transformations

We can think of an image as a function, f, from $\pmb R^2 \rightarrow R$ (or a 2D signal):

• f (x,y) gives the intensity at position (x,y)

Realistically, we expect the image only to be defined over a rectangle, with a finite range:
f: [a,b]x[c,d] -> [0,1]

A color image is just three functions pasted together. We can write this as a “vector-valued” function:

• Computing Transformations

If you have a transformation matrix you can evaluate the transformation that would be performed by multiplying the transformation matrix by the original array of points.

Examples of Transformations in 2D Graphics

In 2D graphics Linear transformations can be represented by 2x2 matrices. Most common transformations such as rotation, scaling, shearing, and reflection are linear transformations and can be represented in the 2x2 matrix. Other affine transformations can be represented in a 3x3 matrix.

Rotation

For rotation by an angle θ clockwise about the origin, the functional form is $x’ = xcosθ + ysinθ$
and $y’ = − xsinθ + ycosθ$. Written in matrix form, this becomes:
$% $

Scaling

For scaling we have $x’; = s_x \cdot x$ and $y’; = s_y \cdot y$. The matrix form is:
$% $

Shearing

For shear mapping (visually similar to slanting), there are two possibilities.
For a shear parallel to the x axis has $x’; = x + ky$ and $y’; = y$ ; the shear matrix, applied to column vectors, is:
$% $

A shear parallel to the y axis has $x’; = x$ and $y’; = y + kx$ , which has matrix form:
$% $

Image Processing

The package EBImage is an R package which provides general purpose functionality for the reading, writing, processing and analysis of images.

Image Properties

Images are stored as multi-dimensional arrays containing the pixel intensities. All EBImage functions are also able to work with matrices and arrays.

• Gamma Correction

• Cropping Image

Spatial Transformation

Spatial image transformations are done with the functions resize, rotate, translate and the functions flip and flop to reflect images.

Next we show the functions flip, flop, rotate and translate:

All spatial transforms except flip and flop are based on the general affine transformation.

Linear transformations using the function affine:

• Horizontal flip

• Horizontal shear ${m} = \left[ \begin{array}{cc} 1, 1/2 \cr \\ 0, 1 \end{array} \right]$

• Rotation by π/6 ${m} = \left[ \begin{array}{cc} cos(pi/6), -sin(pi/6) \cr \\ sin(pi/6), cos(pi/6) \end{array} \right]$

• Squeeze mapping with r=3/2 ${m} = \left[ \begin{array}{cc} 3/2, 0 \cr \\ 0, 2/3 \end{array} \right]$

• Scaling by a factor of 3/2 ${m} = \left[ \begin{array}{cc} 3/2, 0 \cr \\ 0, 3/2 \end{array} \right]$

• Scaling horizontally by a factor of 1/2 ${m} = \left[ \begin{array}{cc} 1/2, 0 \cr \\ 0, 1 \end{array} \right]$