In several occasions I needed to generate synthetic data with a desired level of spatial autocorrelation. Here I will show how to generate as many such fields as we need by using R, an open-source port of the S language for statical analysis. I will concentrate on two alternative ways of generating spatially correlated random fields (commonly known as unconditional Gaussian simulation), using the libraries
1. Unconditional Gaussian simulation using gstat
Spatial modellers commonly use the term unconditional Gaussian simulation to refer to the process of generating spatially correlated random fields. For comparison purposes, the Figure 0 shows an example of a random field with no spatial correlation. The value of the measured property at one given cell is completely independent of the values of that property at neighbouring cells. This situation is very seldom (or I perhaps should better say never) found in nature, since spatially distributed variables always have a certain level of autocorrelation, i.e. co-variance between neighbours. This is often mentioned as the ‘geographical law’, stating that closer locations tend to have similar properties.
Generating spatially correlated random fields is interesting because it makes it possible testing different issues related to the statistical analysis of spatial data.
1.1. Generating the spatial field
We are going to use the
gstat library, so we start by loading it:
We create a 100 x 100 grid, and we convert it into a data frame (xyz structure) by taking all possible combinations of the x and y coordinates:
xy <- expand.grid(1:100, 1:100)
We give names to the variables:
names(xy) <- c('x','y')
Defining the spatial model and performing the simulations.
Second, we define the spatial model as a gstat object:
g.dummy <- gstat(formula=z~1, locations=~x+y, dummy=T, beta=1, model=vgm(psill=0.025, range=5, model='Exp'), nmax=20)
formula defines the dependent variable (z) as a linear model of independent variables. For ordinary and simple kriging we can use the formula
z~1; for simple kriging it is necessary to define a beta parameter too (see below); for universal kriging, if
z is linearly dependent on
y use the formula
z~x+y. We are using simple kriging here.
locations define the data coordinates, e.g.
~x+y in our case here.
dummy is a logical value, and it needs to be TRUE for unconditional simulation.
beta is used only for simple kriging, and is a vector with the trend coefficients (including an intercept); if no independent variables are defined the model only contains an intercept, i.e. the simple kriging mean.
model defines the variogram model, as defined by a call to
vgm allows defining the (partial) sill, range and nugget paramaters, as well as the variogram model type (e.g. exponential, gaussian, spherical, etc). Anisotropy can also be used.
nmax defines the number of nearest observations that should be used for a kriging prediction or simulation.
Now we are ready to make as many simulations as we like based on the gstat object (four simulations in this example):
yy <- predict(g.dummy, newdata=xy, nsim=4)
1.2. Displaying the simulations
To see one realisation of the simulations:
gridded(yy) = ~x+y spplot(obj=yy)
spplot, from the library
sp, provides lattice (trellis) plot methods for spatial data with attributes. It’s only compulsory parameter is
obj, which must point to an object of class Spatial; gstat objects belong to this class, so there’s no need to do anything extra. It is possible to show all four simulations in a single trellis plot:
1.3. Complete code
I include the complete code for convenience:
# unconditional simulations on a 100 x 100 grid using gstat library(gstat) # create structure xy <- expand.grid(1:100, 1:100) names(xy) <- c("x","y") # define the gstat object (spatial model) g.dummy <- gstat(formula=z~1, locations=~x+y, dummy=T, beta=1, model=vgm(psill=0.025,model="Exp",range=5), nmax=20) # make four simulations based on the stat object yy <- predict(g.dummy, newdata=xy, nsim=4) # show one realization gridded(yy) = ~x+y spplot(yy) # show all four simulations: spplot(yy)
By modifying the range parameter in the variogram model it is possible to control the degree of spatial correlation. For example, by setting it at 15 instead of 5 we get a random field with a ‘coarser’ autocorrelation (Figure 2).
g.dummy <- gstat(formula=z~1, locations=~x+y, dummy=T, beta=1, model=vgm(psill=0.025, range=15, model='Exp'), nmax=20)
For including a linear trend surface in the simulation we can perform a universal kriging. For doing so it is necessary to specify it in the formula parameter as
~1+x+y, and coefficients for the x and y components need to be specified in the beta parameter. For example, the following defines a model with a spatial trend in the y dimension (Figure 3):
g.dummy <- gstat(formula=z~1+y, locations=~y, dummy=T, beta=c(1,0,0.005), model=vgm(psill=0.025, range=15, model='Exp'), nmax=20)
The following code defines a model with a trend in both dimensions:
g.dummy <- gstat(formula=z~1+x+y, locations=~x+y, dummy=T, beta=c(1,0.01,0.005), model=vgm(psill=0.025, range=15, model='Exp'), nmax=20)
2. Unconditional Gaussian simulation using fields
Spatially correlated random fields can also be generated using the
fields library. I put an example code here, and leave the details to the reader to investigate.
library(fields) # load Maunga Whau volcano (Mt Eden) elevation dataset (matrix format) data(volcano) # reduce size volcano2 <- volcano[10:55, 14:51] filled.contour(volcano2, color.palette=terrain.colors, asp=1) cols <- length(volcano2[1,]) rows <- length(volcano2[,1]) # create dataframe (xyz format) X <- rep(1:cols, each=rows) Y <- rep(1:rows, cols) Z <- as.vector(volcano2) volcano.df <- data.frame(X,Y,Z,cellid=1:cols*rows) attach(volcano.df) quilt.plot(Y,X,Z,nrow=rows,ncol=cols,add=F) # create a spatial autocorrelation signature # coordinate list coords <- data.frame(X,Y) # distance matrix dist <- as.matrix(dist(coords)) # create a correlation structure (exponential) str <- -0.1 # strength of autocorrelation, inv. proportional to str omega1 <- exp(str*dist) # calculate correlation weights, and invert weights matrix weights <- chol(solve(omega1)) weights_inv <- solve(weights) # create an autocorrelated random field set.seed(1011) error <- weights_inv %*% rnorm(dim(dist)) quilt.plot(Y,X,error,nrow=rows,ncol=cols,add=F) # create a variable as a linear function of the elevation a <- 10 b <- 0.5 Z2 <- a + b*Z quilt.plot(Y,X,Z2,nrow=rows,ncol=cols,add=F) # add the autocorrelated error to the new variable Z3 <- Z2 + error quilt.plot(Y,X,Z3,nrow=rows,ncol=cols,add=F) # export data (xyz format) write.table(data.frame(X,Y,Z,Z2,Z3), file="data.txt", row.names=F, sep="\t")