Home     About     Archive     Links

MRI image segmentation

Magnetic Resonance Imaging (MRI) is a medical image technique used to sense the irregularities in human bodies. Segmentation technique for Magnetic Resonance Imaging (MRI) of the brain is one of the method used by radiographer to detect any abnormality happened specifically for brain.

In digital image processing, segmentation refers to the process of splitting observe image data to a serial of non-overlapping important homogeneous region. Clustering algorithm is one of the process in segmentation.
Clustering in pattern recognition is the process of partitioning a set of pattern vectors in to subsets called clusters.

There are various image segmentation techniques based on clustering. One example is the K-means clustering.

Image Segmentation

Let's try the Hierarchial clustering with an MRI image of the brain.
The healthy data set consists of a matrix of intensity values.

healthy = read.csv("healthy.csv", header=FALSE)
healthyMatrix = as.matrix(healthy)
str(healthyMatrix)
##  num [1:566, 1:646] 0.00427 0.00855 0.01282 0.01282 0.01282 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:646] "V1" "V2" "V3" "V4" ...
# Plot image
image(healthyMatrix,axes=FALSE,col=grey(seq(0,1,length=256)))

plot of chunk unnamed-chunk-1

To use hierarchical clustering we first need to convert the healthy matrix to a vector. And then we need to compute the distance matrix.

# Hierarchial clustering
healthyVector = as.vector(healthyMatrix)
distance = dist(healthyVector, method = "euclidean")
## Error: cannot allocate vector of size 498.0 Gb

R gives us an error that seems to tell us that our vector is huge, and R cannot allocate enough memory.

Let's see the structure of the healthy vector.

str(healthyVector)
##  num [1:365636] 0.00427 0.00855 0.01282 0.01282 0.01282 ...
n <- 365636

The healthy vector has 365636 elements. Let's call this number n. For R to calculate the pairwise distances, it would need to calculate n*(n-1)/2 and store them in the distance matrix.
This number is 6.6844659 × 1010. So we cannot use hierarchical clustering.

Now let's try use the k-means clustering algorithm, that aims at partitioning the data into k clusters, in a way that each data point belongs to the cluster whose mean is the nearest to it.

# Specify number of clusters.
# Our clusters would ideally assign each point in the image to a tissue class or a particular substance, for instance, grey matter or white matter, and so on. And these substances are known to the medical community. So setting the number of clusters depends on exactly what you're trying to extract from the image.
k = 5

# Run k-means
set.seed(1)
KMC = kmeans(healthyVector, centers = k, iter.max = 1000)
str(KMC)
## List of 9
##  $ cluster     : int [1:365636] 3 3 3 3 3 3 3 3 3 3 ...
##  $ centers     : num [1:5, 1] 0.4818 0.1062 0.0196 0.3094 0.1842
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:5] "1" "2" "3" "4" ...
##   .. ..$ : NULL
##  $ totss       : num 5775
##  $ withinss    : num [1:5] 96.6 47.2 39.2 57.5 62.3
##  $ tot.withinss: num 303
##  $ betweenss   : num 5472
##  $ size        : int [1:5] 20556 101085 133162 31555 79278
##  $ iter        : int 2
##  $ ifault      : int 0
##  - attr(*, "class")= chr "kmeans"
# Extract clusters
healthyClusters = KMC$cluster
KMC$centers[2]
## [1] 0.1061945

Analyze the clusters

To output the segmented image we first need to convert the vector healthy clusters to a matrix.
We will use the dimension function, that takes as an input the healthyClusters vector. We turn it into a matrix using the combine function, the number of rows, and the number of columns that we want.

# Plot the image with the clusters
dim(healthyClusters) = c(nrow(healthyMatrix), ncol(healthyMatrix))

image(healthyClusters, axes = FALSE, col=rainbow(k))

plot of chunk unnamed-chunk-5

Now we will use the healthy brain vector to analyze a brain with a tumor

The tumor.csv file corresponds to an MRI brain image of a patient with oligodendroglioma, a tumor that commonly occurs in the front lobe of the brain. Since brain biopsy is the only definite diagnosis of this tumor, MRI guidance is key in determining its location and geometry.

Now, we will apply the k-means clustering results that we found using the healthy brain image on the tumor vector. To do this we use the flexclust package.
The flexclust package contains the object class KCCA, which stands for K-Centroids Cluster Analysis. We need to convert the information from the clustering algorithm to an object of the class KCCA.

# Apply to a test image
 
tumor = read.csv("tumor.csv", header=FALSE)
tumorMatrix = as.matrix(tumor)
tumorVector = as.vector(tumorMatrix)

# Apply clusters from before to new image, using the flexclust package
#install.packages("flexclust")
library(flexclust)

KMC.kcca = as.kcca(KMC, healthyVector)
tumorClusters = predict(KMC.kcca, newdata = tumorVector)

# Visualize the clusters
dim(tumorClusters) = c(nrow(tumorMatrix), ncol(tumorMatrix))

f <- function(m) t(m)[,nrow(m):1] # function to rotate our matrix 90 degrees clockwise
tumorClusters <- f(tumorClusters) # rotation achieved by our function


image((tumorClusters), axes = FALSE, col=rainbow(k), useRaster=TRUE)

plot of chunk unnamed-chunk-6

The tumor is the abnormal substance here that is highlighted in red that was not present in the healthy MRI image.

References

Singular Value Decomposition and Image Processing

The singular value decomposition (SVD) is a factorization of a real or complex matrix. It has many useful applications in signal processing and statistics.

Singular Value Decomposition

SVD is the factorization of a \( m \times n \) matrix \( Y \) into three matrices as:

$$\mathbf{Y = UDV^\top}$$

With:

  • \( U \) is an \( m\times n \) orthogonal matrix
  • \( V \) is an \( n\times n \) orthogonal matrix
  • \( D \) is an \( n\times n \) diagonal matrix

In R The result of svd(X) is actually a list of three components named d, u and v, such that Y = U %*% D %*% t(V).

Video about SVD

Example

# cleanup
rm(list=ls())

dat <- seq(1,36,2)

Y <- matrix(dat,ncol=6)
Y
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    7   13   19   25   31
## [2,]    3    9   15   21   27   33
## [3,]    5   11   17   23   29   35
# Apply SVD to get U, V, and D
s <- svd(Y)
U <- s$u
V <- s$v
D <- diag(s$d) ##turn it into a matrix
  • we can reconstruct Y
Yhat <- U %*% D %*% t(V)
Yhat
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    7   13   19   25   31
## [2,]    3    9   15   21   27   33
## [3,]    5   11   17   23   29   35
resid <- Y - Yhat
max(abs(resid))
## [1] 2.309264e-14

Image processing

  • Load the image and convert it to a greyscale:
# source("http://bioconductor.org/biocLite.R")
# biocLite("ripa", dependencies=TRUE)
# biocLite("rARPACK", dependencies=TRUE)
# install.packages('devtools')
# library(devtools)
# install_github('ririzarr/rafalib')

# Load libraries
library(rARPACK)
library(ripa)    #  function "imagematrix"
library(EBImage)
library(jpeg)
library(png)
#
library(rafalib)
mypar2()
#
# Read the image
img <- readImage("images/pansy.jpg") 
dim(img)
## [1] 465 600   3
display(img, method = "raster")

plot of chunk unnamed-chunk-3

# Convert image to greyscale
r <- imagematrix(img, type = "grey")

# display(r, method = "raster", all=TRUE)
display(r, method = "raster")

plot of chunk unnamed-chunk-3

dim(r)
## [1] 465 600
str(r)
##  imagematrix [1:465, 1:600] 0.345 0.282 0.29 0.306 0.29 ...
##  - attr(*, "type")= chr "grey"
##  - attr(*, "class")= chr [1:2] "imagematrix" "matrix"
  • Apply SVD to get U, V, and D
# Apply SVD to get u, v, and d
r.svd <- svd(r)

u <- r.svd$u
v <- r.svd$v
d <- diag(r.svd$d)
dim(d)
## [1] 465 465
## [1] 465 465
  • Plot the magnitude of the singular values
# check svd$d values 
# Plot the magnitude of the singular values
sigmas = r.svd$d # diagonal matrix (the entries of which are known as singular values)
plot(1:length(r.svd$d), r.svd$d, xlab="i-th r.svd$d", ylab="r.svd$d",  main="Singular Values");

plot of chunk unnamed-chunk-5

plot(1:length(r.svd$d), cumsum(r.svd$d) / sum(r.svd$d), main="Cumulative Percent of Total Sigmas");

plot of chunk unnamed-chunk-5 Not that, the total of the first n singular values divided by the sum of all the singular values is the percentage of "information" that those singular values contain. If we want to keep 90% of the information, we just need to compute sums of singular values until we reach 90% of the sum, and discard the rest of the singular values.

# first approximation
u1 <- as.matrix(u[-1, 1])
v1 <- as.matrix(v[-1, 1])
d1 <- as.matrix(d[1, 1])
l1 <- u1 %*% d1 %*% t(v1)
l1g <- imagematrix(l1, type = "grey")
#plot(l1g, useRaster = TRUE)
display(l1g, method = "raster", all=TRUE)

plot of chunk unnamed-chunk-6

# more approximation
depth <- 5
us <- as.matrix(u[, 1:depth])
vs <- as.matrix(v[, 1:depth])
ds <- as.matrix(d[1:depth, 1:depth])
ls <- us %*% ds %*% t(vs)
lsg <- imagematrix(ls, type = "grey")

## Warning: Pixel values were automatically clipped because of range over.

#plot(lsg, useRaster = TRUE)
display(lsg, method = "raster")

plot of chunk unnamed-chunk-6

# more approximation
depth <- 20
us <- as.matrix(u[, 1:depth])
vs <- as.matrix(v[, 1:depth])
ds <- as.matrix(d[1:depth, 1:depth])
ls <- us %*% ds %*% t(vs)
lsg <- imagematrix(ls, type = "grey")

## Warning: Pixel values were automatically clipped because of range over.

#plot(lsg, useRaster = TRUE)
display(lsg, method = "raster")

plot of chunk unnamed-chunk-6

Image Compression with the SVD

Here we continue to show how the SVD can be used for image compression (as we have seen above).

factorize = function(m, k){
  r = svds(m[, , 1], k);
  g = svds(m[, , 2], k);
  b = svds(m[, , 3], k);
  return(list(r = r, g = g, b = b));
}

recoverimg = function(lst, k){
  recover0 = function(fac, k){
    dmat = diag(k);
    diag(dmat) = fac$d[1:k];
    m = fac$u[, 1:k] %*% dmat %*% t(fac$v[, 1:k]);
    m[m < 0] = 0;
    m[m > 1] = 1;
    return(m);
  }

  r = recover0(lst$r, k);
  g = recover0(lst$g, k);
  b = recover0(lst$b, k);
  m = array(0, c(nrow(r), ncol(r), 3));
  m[, , 1] = r;
  m[, , 2] = g;
  m[, , 3] = b;
  return(m);
}

rawimg <- readJPEG("images/pansy.jpg");

lst = factorize(rawimg, 100);

neig = c(1, 5, 20, 50, 100);
for(i in neig){
  m = recoverimg(lst, i);
  writeJPEG(m, sprintf("images/svd_%d.jpg", i), 0.95);
  #display(m, method = "raster")
  fname <- sprintf("images/svd_%d.jpg", i)
  display(readImage(fname), title="svd_%d", method = "raster")
}
  • Original image
    plot of chunk unnamed-chunk-3
  • Singluar Value k = 1
    plot of chunk unnamed-chunk-7
  • Singluar Value k = 5
    plot of chunk unnamed-chunk-7
  • Singluar Value k = 20
    plot of chunk unnamed-chunk-7
  • Singluar Value k = 50
    plot of chunk unnamed-chunk-7
  • Singluar Value k = 100
    plot of chunk unnamed-chunk-7

  • Analysis

With only 10% of the real data we are able to create a very good approximation of the real data.

References

Image Processing and Spatial linear transformations

We can think of an image as a function, f, from $$\pmb R^2 \rightarrow R$$ (or a 2D signal):

  • f (x,y) gives the intensity at position (x,y)

Realistically, we expect the image only to be defined over a rectangle, with a finite range:
f: [a,b]x[c,d] -> [0,1]

A color image is just three functions pasted together. We can write this as a “vector-valued” function:

$$\pmb {f(x,y)} = \bigg[ \begin{array} {c} & r(x,y) \cr \ & g(x,y) \cr \ & b(x,y) \end{array} \bigg] $$

  • Computing Transformations

If you have a transformation matrix you can evaluate the transformation that would be performed by multiplying the transformation matrix by the original array of points.

Examples of Transformations in 2D Graphics

In 2D graphics Linear transformations can be represented by 2x2 matrices. Most common transformations such as rotation, scaling, shearing, and reflection are linear transformations and can be represented in the 2x2 matrix. Other affine transformations can be represented in a 3x3 matrix.

Rotation

For rotation by an angle θ clockwise about the origin, the functional form is \( x' = xcosθ + ysinθ \)
and \( y' = − xsinθ + ycosθ \). Written in matrix form, this becomes:
$$ \begin{bmatrix} x' \cr \ y' \end{bmatrix} = \begin{bmatrix} \cos \theta & \sin\theta \cr \ -\sin \theta & \cos \theta \end{bmatrix} \begin{bmatrix} x \cr \ y \end{bmatrix} $$

Scaling

For scaling we have \( x' = s_x \cdot x \) and \( y' = s_y \cdot y \). The matrix form is:
$$ \begin{bmatrix} x' \cr \ y' \end{bmatrix} = \begin{bmatrix} s_x & 0 \cr \ 0 & s_y \end{bmatrix} \begin{bmatrix} x \cr \ y \end{bmatrix} $$

Shearing

For shear mapping (visually similar to slanting), there are two possibilities.
For a shear parallel to the x axis has \( x' = x + ky \) and \( y' = y \) ; the shear matrix, applied to column vectors, is:
$$ \begin{bmatrix} x' \cr \ y' \end{bmatrix} = \begin{bmatrix} 1 & k \cr \ 0 & 1 \end{bmatrix} \begin{bmatrix} x \cr \ y \end{bmatrix} $$

A shear parallel to the y axis has \( x' = x \) and \( y' = y + kx \) , which has matrix form:
$$ \begin{bmatrix} x' \cr \ y' \end{bmatrix} = \begin{bmatrix} 1 & 0 \cr \ k & 1 \end{bmatrix} \begin{bmatrix} x \cr \ y \end{bmatrix} $$

Image Processing

The package EBImage is an R package which provides general purpose functionality for the reading, writing, processing and analysis of images.

# source("http://bioconductor.org/biocLite.R")
# biocLite()
# biocLite("EBImage")
library(EBImage)

# Reading Image
img <- readImage("images/lena_std.tif")

#str(img1)
dim(img)
## [1] 512 512   3
display(img, method = "raster")

plot of chunk unnamed-chunk-1

Image Properties

Images are stored as multi-dimensional arrays containing the pixel intensities. All EBImage functions are also able to work with matrices and arrays.

print(img)
## Image 
##   colorMode    : Color 
##   storage.mode : double 
##   dim          : 512 512 3 
##   frames.total : 3 
##   frames.render: 1 
## 
## imageData(object)[1:5,1:6,1]
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.8862745 0.8862745 0.8862745 0.8862745 0.8862745 0.8901961
## [2,] 0.8862745 0.8862745 0.8862745 0.8862745 0.8862745 0.8901961
## [3,] 0.8745098 0.8745098 0.8745098 0.8745098 0.8745098 0.8901961
## [4,] 0.8745098 0.8745098 0.8745098 0.8745098 0.8745098 0.8705882
## [5,] 0.8862745 0.8862745 0.8862745 0.8862745 0.8862745 0.8862745
  • Adjusting Brightness
img1 <- img + 0.2
img2 <- img - 0.3
display(img1, method = "raster")

plot of chunk unnamed-chunk-3

display(img2, method = "raster")

plot of chunk unnamed-chunk-3

  • Adjusting Contrast
img1 <- img * 0.5
img2 <- img * 2
display(img1, method = "raster")

plot of chunk unnamed-chunk-4

display(img2, method = "raster")

plot of chunk unnamed-chunk-4

  • Gamma Correction
img1 <- img ^ 4
img2 <- img ^ 0.9
img3 <- (img + 0.2) ^3
display(img1, method = "raster")

plot of chunk unnamed-chunk-5

display(img2, method = "raster")

plot of chunk unnamed-chunk-5

display(img3, method = "raster")

plot of chunk unnamed-chunk-5

  • Cropping Image
img1 <-img[(1:400)+100, 1:400,]
display(img1, method = "raster")

plot of chunk unnamed-chunk-6

Spatial Transformation

Spatial image transformations are done with the functions resize, rotate, translate and the functions flip and flop to reflect images.

Next we show the functions flip, flop, rotate and translate:

y <- flip(img)
display(y, title='flip(img)', method = "raster")

plot of chunk unnamed-chunk-7

y = flop(img) 
display(y, title='flop(img)', method = "raster")

plot of chunk unnamed-chunk-7

y <- rotate(img, 30) 
display(y, title='rotate(img, 30)', method = "raster")

plot of chunk unnamed-chunk-7

y <- translate(img, c(120, -20)) 
display(y, title='translate(img, c(120, -20))', method = "raster")

plot of chunk unnamed-chunk-7

All spatial transforms except flip and flop are based on the general affine transformation.

Linear transformations using the function affine:

  • Horizontal flip

$${m} = \left[ \begin{array}{cc} \ -1 & 0 \cr \ \ 0 & 1 \end{array} \right] $$

$$ \begin{equation} Result = image * m \end{equation} $$

m <- matrix(c(-1, 0, 0, 1, 512, 0), nrow=3,  ncol=2, byrow = TRUE)
m  # Horizontal flip
##      [,1] [,2]
## [1,]   -1    0
## [2,]    0    1
## [3,]  512    0
display(affine(img, m),  method = "raster")

plot of chunk unnamed-chunk-8

  • Horizontal shear $${m} = \left[ \begin{array}{cc} 1, 1/2 \cr \ 0, 1 \end{array} \right] $$
m <- matrix(c(1, 1/2, 0, 1, 0, -100), nrow=3,  ncol=2, byrow = TRUE)
m  # horizontal shear  r = 1/2 
##      [,1]   [,2]
## [1,]    1    0.5
## [2,]    0    1.0
## [3,]    0 -100.0
display(affine(img, m),  method = "raster")

plot of chunk unnamed-chunk-9

  • Rotation by π/6 $${m} = \left[ \begin{array}{cc} cos(pi/6), -sin(pi/6) \cr \ sin(pi/6), cos(pi/6) \end{array} \right] $$
m <- matrix(c(cos(pi/6), -sin(pi/6), sin(pi/6) , cos(pi/6), -100, 100), nrow=3,  ncol=2, byrow = TRUE)
m  # Rotation by π/6
##              [,1]        [,2]
## [1,]    0.8660254  -0.5000000
## [2,]    0.5000000   0.8660254
## [3,] -100.0000000 100.0000000
display(affine(img, m),  method = "raster")

plot of chunk unnamed-chunk-10

  • Squeeze mapping with r=3/2 $${m} = \left[ \begin{array}{cc} 3/2, 0 \cr \ 0, 2/3 \end{array} \right] $$
m <- matrix(c(3/2, 0, 0, 2/3, 0, 100), nrow=3,  ncol=2, byrow = TRUE)
m  # Squeeze mapping with r=3/2
##      [,1]        [,2]
## [1,]  1.5   0.0000000
## [2,]  0.0   0.6666667
## [3,]  0.0 100.0000000
display(affine(img, m),  method = "raster")

plot of chunk unnamed-chunk-11

  • Scaling by a factor of 3/2 $${m} = \left[ \begin{array}{cc} 3/2, 0 \cr \ 0, 3/2 \end{array} \right] $$
m <- matrix(c(3/2, 0, 0, 3/2, -100, -100), nrow=3,  ncol=2, byrow = TRUE)
m  # Scaling by a factor of 3/2
##        [,1]   [,2]
## [1,]    1.5    0.0
## [2,]    0.0    1.5
## [3,] -100.0 -100.0
display(affine(img, m),  method = "raster")

plot of chunk unnamed-chunk-12

  • Scaling horizontally by a factor of 1/2 $${m} = \left[ \begin{array}{cc} 1/2, 0 \cr \ 0, 1 \end{array} \right] $$
m <- matrix(c(1/2, 0, 0, 1, 0, 0), nrow=3,  ncol=2, byrow = TRUE)
m  # scale a figure horizontally  r = 1/2 
##      [,1] [,2]
## [1,]  0.5    0
## [2,]  0.0    1
## [3,]  0.0    0
display(affine(img, m),  method = "raster")

plot of chunk unnamed-chunk-13

References

3D plot of a Klein Bottle

The Klein bottle was discovered in 1882 by Felix Klein and since then has joined the gallery of popular mathematical shapes.
The Klein bottle is a one-sided closed surface named after Klein. We note however this is not a continuous surface in 3-space as the surface cannot go through itself without a discontinuity.

#install.packages('animation', repos = 'http://rforge.net', type = 'source')
#install.packages("plot3D")

library(plot3D)
library(animation)

saveGIF({
  par(mai = c(0.1,0.1,0.1,0.1))
  r = 3
  for(i in 1:100){
    X <- seq(0, 2*pi, length.out = 200)
    #Y <- seq(-15, 6, length.out = 100)
    Y <- seq(0, 2*pi, length.out = 200)
    M <- mesh(X, Y)
    u <- M$x
    v <- M$y
    
    # x, y and z grids
    #   x <- (1.16 ^ v) * cos(v) * (1 + cos(u))
    #   y <- (-1.16 ^ v) * sin(v) * (1 + cos(u))
    #   z <- (-2 * 1.16 ^ v) * (1 + sin(u))

    ######   Klein bottle   ##########
    x <- (r + cos(u/2) * sin(v) - sin(u/2) * sin(2*v)) * cos(u)
    y <- (r + cos(u/2) * sin(v) - sin(u/2) * sin(2*v)) * sin(u)
    z <- sin(u/2) * sin(v) + cos(u/2) * sin(2*v)
    ##################################
    
    # full colored image
    surf3D(x, y, z, colvar = z, 
           col = ramp.col(col = c("red", "red", "orange"), n = 100),
           colkey = FALSE, shade = 0.5, expand = 1.2, box = FALSE, 
           phi = 35, theta = i, lighting = TRUE, ltheta = 560,
           lphi = -i)
  }
}, interval = 0.1, ani.width = 550, ani.height = 350)
## [1] TRUE

Here is the result from the code above:

plot of chunk unnamed-chunk-1

References

Klein Bottle
http://genedan.com/no-39-exploring-sage-as-an-alternative-to-mathematica/
Imaging maths - Inside the Klein bottle
Fun with surf3D function

Visualizing World Development Indicators

Fetching Data from World Bank

The Package WDI (world development indicators) is used to fetch data from WDI.
We can search the data as follow:

# Load required libraries
library('WDI')
library(ggplot2)

# Search
WDIsearch(string = 'GDP per capita')
##       indicator             
##  [1,] "GDPPCKD"             
##  [2,] "GDPPCKN"             
##  [3,] "NV.AGR.PCAP.KD.ZG"   
##  [4,] "NY.GDP.PCAP.CD"      
##  [5,] "NY.GDP.PCAP.KD"      
##  [6,] "NY.GDP.PCAP.KD.ZG"   
##  [7,] "NY.GDP.PCAP.KN"      
##  [8,] "NY.GDP.PCAP.PP.CD"   
##  [9,] "NY.GDP.PCAP.PP.KD"   
## [10,] "NY.GDP.PCAP.PP.KD.ZG"
## [11,] "SE.XPD.PRIM.PC.ZS"   
## [12,] "SE.XPD.SECO.PC.ZS"   
## [13,] "SE.XPD.TERT.PC.ZS"   
##       name                                                                 
##  [1,] "GDP per Capita, constant US$, millions"                             
##  [2,] "Real GDP per Capita (real local currency units, various base years)"
##  [3,] "Real agricultural GDP per capita growth rate (%)"                   
##  [4,] "GDP per capita (current US$)"                                       
##  [5,] "GDP per capita (constant 2000 US$)"                                 
##  [6,] "GDP per capita growth (annual %)"                                   
##  [7,] "GDP per capita (constant LCU)"                                      
##  [8,] "GDP per capita, PPP (current international $)"                      
##  [9,] "GDP per capita, PPP (constant 2005 international $)"                
## [10,] "GDP per capita, PPP annual growth (%)"                              
## [11,] "Expenditure per student, primary (% of GDP per capita)"             
## [12,] "Expenditure per student, secondary (% of GDP per capita)"           
## [13,] "Expenditure per student, tertiary (% of GDP per capita)"

Visualizing GDP per capita

Below we import the data on GDP per capita from years 1980 to 2013 from some countries and we show a simple plot of it: GDP per capita (constant 2000 US$).

dat <- WDI(indicator = c("NY.GDP.PCAP.KD"),    
           country = c("PT","ES", "US", "EU", "BR"),     
           start = 1980, end = 2013)
#
ggplot(dat, aes(year, NY.GDP.PCAP.KD, color = country)) + 
    geom_line() + geom_point() + 
    labs(x = "year", y = "GDP per capita") + 
    ggtitle("GDP per capita by Country (constant 2000 US$)")

plot of chunk unnamed-chunk-2

References

Map Visualization in R

Here I tried to produce some map visualization in R.

  • First using data from GADM database of Global Administrative Areas.
  • Second using the package RWorldMap,
  • Third using the package ggmap that allows visualizations of spatial data on maps retrieved from Google Maps, OpenStreetMap, etc., and
  • Fourth using the package RgoogleMaps allows you to plot data points on any kind of map you can imagine (terrain, satellite, hybrid).

1. Data from GADM database

Getting the spatial country data

library(geosphere) # use: gcIntermediate
library(RgoogleMaps); library(ggmap); library(rworldmap)
library(sp); library(maptools); require(RColorBrewer)

# PRT_adm2.RData download from http://www.gadm.org/
# load("data/PRT_adm2.RData") # Creates and stores in memory an object called ´gadm´
# or:
# con <- url("http://biogeo.ucdavis.edu/data/gadm2/R/PRT_adm1.RData")
# print(load(con))
# close(con)
# or:
# library(raster)
# port1 <- getData('GADM', country='PRT', level=1)

load("data/PRT_adm1.RData")
port1 <- get("gadm")

Plot the map

We can use the variable NAME_1 to plot the map:

# Convert the encoding to UTF-8 in order to avoid the problems with 'accents'.
# port1$NAME_1 <- as.factor(iconv(as.character(port1$NAME_1), , "UTF-8"))
port1$NAME_1 <- as.factor(as.character(port1$NAME_1))

spplot(port1, "NAME_1", 
       col.regions = colorRampPalette(brewer.pal(12, "Set3"))(18), 
       col = "white",
       #cex=0.4,
       xlim = range(-10,-6), ylim = range(36.9,42.2), asp = 1.0
       #scales=list(draw=TRUE)
       )  # Plot the 'NAME_1' form the 'port1' object.

plot of chunk unnamed-chunk-2

And next we created another variable called rainfall in the data.frame, we store random values in that variable:

set.seed(333)
port1$rainfall<-rnorm(length(port1$NAME_1),mean=50,sd=15) #random rainfall value allocation

spplot(port1,"rainfall",col.regions = rev(terrain.colors(port1$rainfall)),
       scales=list(draw=TRUE),
       main="Rainfall (simulated) in Portugal")

plot of chunk unnamed-chunk-3

2. RWorldMap

Rworldmap is a package for visualising global data, referenced by country. It provides maps as spatial polygons.

newmap <- getMap(resolution = "low")
plot(newmap)

plot of chunk unnamed-chunk-4

By Changing the xlim and ylim arguments of the plot function we can limit the display to just Europe.

plot(newmap, xlim = c(-20, 59),  ylim = c(35, 71),  asp = 1 )

plot of chunk unnamed-chunk-5

Geocoding

The geocode function from the ggmap package finds the coordinates of a location using Google Maps. Thus, finding the coordinates of the Extreme points of Europe can be done by:

europe.limits <- geocode(c("CapeFligely,RudolfIsland,Franz Josef Land,Russia", 
                           "Gavdos,Greece", "Faja Grande,Azores",
                           "SevernyIsland,Novaya Zemlya,Russia"))
europe.limits
##         lon      lat
## 1  60.64878 80.58823
## 2  24.08464 34.83469
## 3 -31.26192 39.45479
## 4  56.00000 74.00000

So we can display only the europe as follow, by modifying the xlim and ylim arguments:

plot(newmap, xlim = range(europe.limits$lon),  ylim = range(europe.limits$lat), asp = 1)

plot of chunk unnamed-chunk-7

3. ggmap

  • Fetching a Map
# geocodes
geoCodes <- geocode("Portela,2230")
geoCodes
##         lon      lat
## 1 -8.250595 39.61002
#         lon      lat
# 1 -8.250595 39.61002
ggmap(
    get_googlemap(
        center=c(geoCodes$lon,geoCodes$lat), #Long/lat of centre
        zoom=14, 
        maptype='satellite', #also hybrid/terrain/roadmap
        scale = 2), #resolution scaling, 1 (low) or 2 (high)
    size = c(600, 600), #size of the image to grab
    extent='device', #can also be "normal" etc
    darken = 0) #you can dim the map when plotting on top

plot of chunk unnamed-chunk-9

#ggsave ("images/map1.png", dpi = 200) #this saves the output to a file
  • Plotting on a Map

You can plot any [x,y, +/- z] information you’d like on top of a ggmap, so long as x and y correspond to longitudes and latitudes within the bounds of the map you have fetched. To plot on top of the map you must first make your map a variable and add a geom layer to it.

#Generate some data
#         lat        lon
# Colmeal 39.609025, -8.245660
# Portela 39.610447, -8.248321
# Cabeça_Ruiva 39.606380, -8.259393
# Ilha_do_Lombo 39.609124, -8.271237
long = c(-8.245660, -8.248321, -8.259393, -8.271237)
lat = c(39.609025, 39.610447, 39.606380, 39.609124)
who = c("Colmeal", "Portela", "Cabeça Ruiva", "Ilha do Lombo")
data = data.frame (long, lat, who)

map = ggmap(
    get_googlemap(
        center=c(geoCodes$lon,geoCodes$lat), 
        zoom=14, 
        maptype='hybrid', 
        scale = 2), 
        size = c(600, 600),
        extent='normal', 
        darken = 0)

map + geom_point (
        data = data,
        aes (
            x = long, 
            y = lat, 
            fill = factor (who)
            ), 
        pch = 21, 
        colour = "white", 
        size = 6
        ) +

    scale_fill_brewer (palette = "Set1", name = "Local") +

    #for more info on these type ?theme()  
 theme ( 
        legend.position = c(0.05, 0.05), # put the legend INSIDE the plot area
     legend.justification = c(0, 0),
        legend.background = element_rect(colour = F, fill = "white"),
        legend.key = element_rect (fill = F, colour = F),
        panel.grid.major = element_blank (), # remove major grid
     panel.grid.minor = element_blank (),  # remove minor grid
     axis.text = element_blank (), 
        axis.title = element_blank (),
        axis.ticks = element_blank ()
        ) 

plot of chunk unnamed-chunk-10

#ggsave (""images/map2.png"", dpi = 200)

4. RgoogleMaps

'RgoogleMaps' allows you to plot data points on any kind of map you can imagine (terrain, satellite, hybrid).

lat <- c(37,42) #define our map's ylim
lon <- c(-9,-6) #define our map's xlim
lat <- c(37,42) #define our map's ylim
lon <- c(-12,-6) #define our map's xlim
center = c(mean(lat), mean(lon))  #tell what point to center on
zoom <- 7  #zoom: 1 = furthest out (entire globe), larger numbers = closer in

# get maps from Google
terrmap <- GetMap(center=center, zoom=zoom, maptype= "terrain", destfile = "terrain.png") 
# visual options: 
#   maptype = c("roadmap", "mobile", "satellite", "terrain", "hybrid", "mapmaker-roadmap", "mapmaker-hybrid")
PlotOnStaticMap(terrmap)

plot of chunk unnamed-chunk-11

# Sard <- geocode("Sardoal, PT") # find coordinates
# Sard
#         lon      lat
# 1 -8.161277 39.53752

# using maps to plot routes
rt1 = route(from = "Lisbon", to = "Castelo Branco", mode = "driving")
rt2 = route(from = "Lisbon", to = "Sardoal", mode = "driving")
rt3 = route(from = "Abrantes", to = "Castelo Branco", mode = "driving")

PortugalMap <- qmap("Portugal", zoom = 8, color = "bw")

PortugalMap + geom_leg(aes(x = startLon, y = startLat, xend = endLon, yend = endLat),
                       color = "blue", data = rt1) +
    geom_leg(aes(x = startLon, y = startLat, xend = endLon, yend = endLat),
                       color = "black", data = rt2) +
    geom_leg(aes(x = startLon, y = startLat, xend = endLon, yend = endLat),
                       color = "red", data = rt3)

plot of chunk unnamed-chunk-11

References