Different packages have different ways of storing output from
MCMC runs: (1) |
(details below). Each has its own strengths, and some output
objects include more than copy of the results in different
R2WinBUGS output has (2), (3) and (4)
jagsUI output has (1) and (4). With many
parameters monitored, keeping multiple copies of the MCMC chains
is not a good use of R's limited memory.
In this post I will
describe a class for storing MCMC chains which offers the
advantages of each of the above while only storing one copy.
This is already implemented in the
package, and I propose to include it in the next versions of
wiqid, with the intention of making it the default
wiqid's MCMC functions. It's in the CRAN
which you can install with
Overview of existing formats
We'll describe each of these using as an example MCMC output
for the salamanders data set; see the foot of this page for the
The model has three parameters:
psi : the probability of
p : probability of
detection, a 2 x 2 matrix, but only
defined in the model, so this is a ragged array;
z : a vector of length 39,
one value for each site.
The JAGS output has values for 42 nodes;
p[2,1] are not included. Three MCMC chains each have 2000
draws after burn-in and
mcmc.list: A list with
three components, one for each chain. Each component is a 2000 x
42 matrix. Attributes include time series data for the chain
generation: start, end, and thin.
sims.matrix: The matrices in
mcmc.list are 'stacked' to form a
6000 x 42 matrix. It's used in
R2WinBUGS output and
as.matrix is applied to an
mcmc.list object. A good format if the allocation of
draws to chains is not relevant.
sims.array A 3-dimensional 2000 x 3 x 42 array, as used in
output. Good when you need to deal separately with chains.
sims.list A list with one
component for each parameter. For our example:
sims.list$psi is a vector of
sims.list$p is an array with
dimensions 6000 x 2 x 2;
p[, 1, 2]
p[, 2, 1] are filled
sims.list$z is a 6000 x 39
This format allows you to use the
$ operator to
extract individual parameters. But ragged arrays mean that large
numbers of NAs are inserted.
The basic data structure is a matrix similar to
sims.matrix above (a 6000 x 42 matrix in our example),
but with specific attributes and customised extractor methods.
The following 2 attributes are always present for a valid
nChains : The number of chains; this
allows the matrix to be converted to
simsList : A list with one component for
each parameter, containing the relevant column
numbers. For our example:
simsList$psi = 1, scalar
simsList$p is a 2 x 2 matrix
is a vector with values 4 to 42
simsList attribute simplifies the extraction
of parameters with the $ operator.
Other attributes may be present depending on the source of
the MCMC chains.
The class has the following customised S3 extractor methods:
$ : The object behaves as a
sims.list and yields the appropriate array for the
parameter. For example,
returns a 6000 x 2 x 2 array. Ragged arrays will have columns
with NAs inserted.
Behaviour with the [ extractor method depends on the
number of indices provided:
- With 1 index, eg.
mcmcOutput["p[1,1]"] it behaves as a data frame and
returns the column(s) for the nodes.
- With 2 indices, eg.
mcmcOutput[1:5, 2:3] or
it behaves as a
returns a matrix or vector with the requested elements; in this
example, the first 5 values from the first chain.
- With 3 indices, eg.
mcmcOutput[1:5, 2, 2:3] or
mcmcOutput[1:5, 2, "p[1,1]"] it behaves as a
sims.array and returns - in our example - the first 5
elements in the 2nd chain.
Currently there is no custom method for the [[
extractor; I'm not sure what might be useful.
print displays brief
information about the object in the Console. It does not
produce a summary of the MCMC values or the diagnostics,
thus avoiding flooding the Console with numbers; if you want
a summary, use
summary displays a few lines
of information including an overview of
n.eff diagnostics. It
returns a data frame with summary information for each node,
and this will appear in the Console unless assigned to an
object or passed to
View. Columns for mean and
SD are always included, but the user can select which other
summaries to include; the default has a 95% highest density
window allows you to discard
values at the beginning of each chain as additional burn-in
(or at the end, which is less useful), and to do additional
thinning to reduce the size of the object.
plot displays for each node
a trace plot and a density plot with separate curves for
each chain. It allows selection of parameters and chains to
plot, and the diagnostic statistics to display.
Functions to manipulate and display output
In addition to the methods above, the package has the
postPlot produces histograms
or density plots for each node, with optional display of
mean or mode, CRI, comparison value, and
Region of Practical Equivalence.
diagPlot is equivalent to
plot above, but works for any MCMC output
object which can be coerced to
sumryList produces a list of summary
statistics, each being a list with one component for each
parameter. This allows calls such as
which will produce an array of means with the appropriate
p is a 2 x 2 matrix, the output
will also be a 2 x 2 matrix.
Code for the example
# If necessary:
# devtools::install_github("mikemeredith/mcmcOutput") # need 0.0.0.9009 or later
# convert to class mcmcOutput
mco <- mcmcOutput(mcmcListExample)
mcos <- sumryList(mco)
# Extract with "$"
p <- mco$p
p[1:5,,] # Elements of p not defined in the model are filled with NAs
# "[" with one index
# "[" with two indices
mco[1:5, "psi"] # First 5 values for psi (chain #1)
# "[" with three indices
mco[1:5, 2, "psi"] # First 5 values for psi in chain #2
# Plotting methods/functions
diagPlot(mcmcListExample) # using the original mcmc.list object
postPlot(mco, compVal=0) # show proportion above/below 0.