## A simple Gibbs sampler |
||

HOME |
As discussed in the last
post, conjugate distributions provide an easy way to
calculate posterior distributions for a single parameter, such
as detection of a species during a single visit to a site where
it is present. If we have more than one unknown parameter in our
model - as with a simple occupancy model, where we have
detection and occupancy - we may still be able to use conjugacy
via a Gibbs sampler.
Gibbs sampling works if we can describe the posterior for each parameter if we know all the other parameters in the model. We can see this with the salamanders data that we used for maximum likelihood estimation in a previous post. In this study, \(S = 39\) sites were visited on \(K = 5\) occasions and searched for salamanders. Salamanders were observed at \(x_{obs} = 18\) sites, but detection is low (at 12 sites they were detected on only 1 occasion out of 5), so we suspect that the actual number of occupied sites is \(x > x_{obs}\). The total number of detections was \(d = 30\). If we knew the actual number of occupied sites, \(x\), then the occupancy likelihood is binomial with \(x\) successes in \(S\) trials, and the detection likelihood is binomial with \(d\) successes in \(x \times K\) trials (note that only visits to occupied sites count as trials). But we don't know \(x\). If we knew occupancy, \(\psi\), and detection, \(\pi\), we could calculate the conditional probability that a site where salamanders were not observed was actually occupied, ie, \(\psi_0 =\) Pr(occupied | not detected). The number of additional occupied sites - sites occupied but no detections - \(x_{add}\), is described by a binomial probability with \((S -x_{obs})\) trials with probability of success \(\psi_0\). And \(x = x_{obs} + x_{add}\). But we don't know \(\psi\) and \(\pi\). ## Gibbs samplingThis is the Gibbs sampling algorithm:
The R code to do this is at the foot of the page. The animation below shows how this works for the salamander data. I used uniform Beta(1, 1) priors and 0.5 as the starting values for both occupancy and detection.
As we see, the sampler quickly moves away from the starting
point, and generates a series of points in an ellipse centered
at about \(\psi = 0.6, \pi = 0.25\). You can download an MP4
version of the animation
## Gibbs sampler outputThe first few values in each of the output vectors are affected by the arbitrarily-chosen starting point, so we discard the initial values, referred to as "burn-in". Although only 1000 values are shown in the plot above, I did 10,100 iterations, and we'll discard the first 100 values:
Now we have a pair of vectors of 10,000 random draws from the
posterior distributions for \(\psi\) and \(\pi\). We can plot
histograms and calculate means and credible intervals. If you
have the
Without
...and the same for So far we have only looked at We can get the correlation coefficient with
And we can use Ben Bolker's
Now let's look in more detail at the output vectors. ## MCMC - Markov chain Monte CarloThe
And here are the first 6 values in my vectors for \(\psi\) and \(\pi\).
As should be clear from the algorithm, the set of values in each row depends only on the previous row. What happened before that is irrelevant. This serial dependency shows up in the trace plots, though it's not obvious from just 6 rows of values. This type of sequence is called a The values are The draws are random, but not independent. They are
autocorrelated, the autocorrelation decreasing as the lag
(the distance between the values) increases. We can see
autocorrelation plots with the
The correlation between each value and the one
immediately before (lag = 1) is high, about 0.65 for \(\psi\)
and 0.50 for \(\pi\), then declines to be very small at a lag of
about 15. Lack of independence means that our sample is
effectively smaller than it seems. The
So although the sample size is 10,000, the effective size is much less. A thousand or so is fine for reliable point estimates, but not for 95% credible intervals. These depend on a small number of very high or very low values, and a single 'clump' of high or low values can affect the estimate. For good 95% CrIs you need effective sample sizes of 10,000 or more.
Markov chain Monte Carlo (MCMC) methods revolutionised
Bayesian analysis when the first software package, BUGS
(Bayesian analysis Using Gibbs Sampling), became available in
the mid-1990s. As Gilks et al (1994) put it, "Gibbs sampling
succeeds because it reduces the problem of dealing
simultaneously with a large number of intricately related
unknown parameters and missing data into a much simpler problem
of dealing with one unknown quantity at a time, sampling each
from its In a future post, I'll look at using a "random walk" to generate an MCMC sample. This does not involve conjugate priors. ## The R code for the Gibbs samplerHere's the code for the Gibbs sampler for the salamanders analysis:
The code for everything on this page, including the animated plot, is
## ReferencesGilks, W. R., Thomas, A., Spielgelhalter, D. J. (1994) A
language and program for complex Bayesian modelling | |

Updated 27 Feb 2015 by Mike Meredith |