Storing MCMC output in R with mcmcOutput

HOME Different packages have different ways of storing output from MCMC runs: (1) mcmc.list, (2) sims.matrix, (3) sims.array, (4) sims.list (details below). Each has its own strengths, and some output objects include more than copy of the results in different formats. R2WinBUGS output has (2), (3) and (4) while jagsUI output has (1) and (4). With many parameters monitored, keeping multiple copies of the MCMC chains is not a good use of R's limited memory.

In this post I will describe a class for storing MCMC chains which offers the advantages of each of the above while only storing one copy. This is already implemented in the mcmcOutput package, and I propose to include it in the next versions of saveJAGS and wiqid, with the intention of making it the default for wiqid's MCMC functions. It's in the CRAN package mcmcOutput, which you can install with install.packages("mcmcOutput").

Overview of existing formats

We'll describe each of these using as an example MCMC output for the salamanders data set; see the foot of this page for the code.

The model has three parameters:

  • psi : the probability of occupancy, scalar;
  • p : probability of detection, a 2 x 2 matrix, but only p[1,1] and p[2,2] are defined in the model, so this is a ragged array;
  • z : a vector of length 39, one value for each site.

The JAGS output has values for 42 nodes; p[1,2] and p[2,1] are not included. Three MCMC chains each have 2000 draws after burn-in and thinning.

1) mcmc.list: A list with three components, one for each chain. Each component is a 2000 x 42 matrix. Attributes include time series data for the chain generation: start, end, and thin.

2) sims.matrix: The matrices in mcmc.list are 'stacked' to form a 6000 x 42 matrix. It's used in R2WinBUGS output and returned when as.matrix is applied to an mcmc.list object. A good format if the allocation of draws to chains is not relevant.

3) sims.array A 3-dimensional 2000 x 3 x 42 array, as used in R2WinBUGS output. Good when you need to deal separately with chains.

4) sims.list A list with one component for each parameter. For our example:

  • sims.list$psi is a vector of length 6000.
  • sims.list$p is an array with dimensions 6000 x 2 x 2; p[, 1, 2] and p[, 2, 1] are filled with NAs.
  • sims.list$z is a 6000 x 39 matrix.

This format allows you to use the $ operator to extract individual parameters. But ragged arrays mean that large numbers of NAs are inserted.

The mcmcOutput class

The basic data structure is a matrix similar to sims.matrix above (a 6000 x 42 matrix in our example), but with specific attributes and customised extractor methods.

The following 2 attributes are always present for a valid mcmcOutput object:

nChains : The number of chains; this allows the matrix to be converted to sims.array format.

simsList : A list with one component for each parameter, containing the relevant column numbers. For our example:

  • simsList$psi = 1, scalar
  • simsList$p is a 2 x 2 matrix
        2  NA
        NA  3
  • simsList$z is a vector with values 4 to 42

The simsList attribute simplifies the extraction of parameters with the $ operator.

Other attributes may be present depending on the source of the MCMC chains.

Extractor methods

The class has the following customised S3 extractor methods:

$ : The object behaves as a sims.list and yields the appropriate array for the parameter. For example, mcmcOutput$p returns a 6000 x 2 x 2 array. Ragged arrays will have columns with NAs inserted.

Behaviour with the [ extractor method depends on the number of indices provided:

  • With 1 index, eg. mcmcOutput[2:3] or mcmcOutput["p[1,1]"] it behaves as a data frame and returns the column(s) for the nodes.
     
  • With 2 indices, eg. mcmcOutput[1:5, 2:3] or mcmcOutput[1:5, "p[1,1]"] it behaves as a sims.matrix and returns a matrix or vector with the requested elements; in this example, the first 5 values from the first chain.
     
  • With 3 indices, eg. mcmcOutput[1:5, 2, 2:3] or mcmcOutput[1:5, 2, "p[1,1]"] it behaves as a sims.array and returns - in our example - the first 5 elements in the 2nd chain.

Currently there is no custom method for the [[ extractor; I'm not sure what might be useful.

Other methods

  • print displays brief information about the object in the Console. It does not produce a summary of the MCMC values or the diagnostics, thus avoiding flooding the Console with numbers; if you want a summary, use summary.
     
  • summary displays a few lines of information including an overview of Rhat, MCEpc and n.eff diagnostics. It returns a data frame with summary information for each node, and this will appear in the Console unless assigned to an object or passed to View. Columns for mean and SD are always included, but the user can select which other summaries to include; the default has a 95% highest density interval, Rhat and MCEpc.
     
  • window allows you to discard values at the beginning of each chain as additional burn-in (or at the end, which is less useful), and to do additional thinning to reduce the size of the object.
     
  • plot displays for each node a trace plot and a density plot with separate curves for each chain. It allows selection of parameters and chains to plot, and the diagnostic statistics to display.

Functions to manipulate and display output

In addition to the methods above, the package has the following functions:

  • postPlot produces histograms or density plots for each node, with optional display of mean or mode, CRI, comparison value, and Region of Practical Equivalence.
     
  • diagPlot is equivalent to plot above, but works for any MCMC output object which can be coerced to mcmcOutput class.
     
  • sumryList produces a list of summary statistics, each being a list with one component for each parameter. This allows calls such as foo$mean$p, which will produce an array of means with the appropriate structure; if p is a 2 x 2 matrix, the output will also be a 2 x 2 matrix.

 

Code for the example

# If necessary:
# devtools::install_github("mikemeredith/mcmcOutput") # need 0.0.0.9009 or later
library(mcmcOutput)
data(mcmcListExample)
str(mcmcListExample)

# convert to class mcmcOutput
mco <- mcmcOutput(mcmcListExample)
str(mco)
print(mco)
summary(mco)
View(summary(mco))

mcos <- sumryList(mco)
mcos$mean$p
mcos$MCEpc$psi

# Extract with "$"
p <- mco$p
str(p)
p[1:5,,]  # Elements of p not defined in the model are filled with NAs

# "[" with one index
head(mco[4:5])
head(mco[c("z[35]", "z[39]")])

# "[" with two indices
mco[1:5, "psi"]  # First 5 values for psi (chain #1)

# "[" with three indices
mco[1:5, 2, "psi"] # First 5 values for psi in chain #2

# Plotting methods/functions
plot(mco)
diagPlot(mcmcListExample) # using the original mcmc.list object
postPlot(mco)
postPlot(mco, compVal=0) # show proportion above/below 0.
 
Updated 10 June 2020 by Mike Meredith