Burn... or adapt?
Most runs of MCMC chains for Bayesian estimation begin with an
adapt phase, when the samplers used to produce the chains are
tuned for best performance. I recently did a long run for a
complex model, and the output was flagged with "convergence
failure". In fact the problem was caused by cutting short the
adapt phase. The default adapt phase for the R package |
I was running a model for Kéry & Royle
(2016) section 11.7.2 with JAGS from R via the
Looking at the trace plots, it's clear
that the problem isn't convergence - the seven chains shown are
all in the right part of the parameter space; increasing burn-in
will not improve matters. But we have poor "mixing": the chains
are moving in tiny steps, taking many iterations to cover the
full range. Autocorrelation is huge even after thinning, with some effective sample
I repeated the run with
In fact, I got 1500 values out per chain (after
thinning by 10): I had forgotten that
Why is it a problem now?
We have been using MCMC since the first BUGS software was released back in 1989, but users have not paid much attention to adaptation, focusing on burn-in and chain length.
WinBUGS and OpenBUGS take care of adaptation, and will not let you set Monitors to extract the chains until internal tuning criteria have been met. Adaptation was a check-box in the corner on the Update widget, which we likely only noticed when we tried to set Monitors and couldn't because it was still checked... and we couldn't uncheck it, we just had to run more updates. Those programs silently (sneakily?) used the early burn-in iterations for adaptation.
JAGS separates out the concepts of adaptation and burn-in, and allows the user to terminate adaptation before the tuning criteria are met. Turning off adaptation too soon is now a real danger.
A sensible strategy would be to use all the burn in
iterations for tuning. This is adopted by the
Since we first started using BUGS software, the reported burn-in iterations have included the adaptation phase, so no major change is needed to reporting styles. So I might report the second analysis with the phrase: "I retained 1,500 values per chain, after discarding 5,000 for adaptation and burn in and thinning by 10."
| Updated 27 Dec 2016 by Mike Meredith|