How does hierarchical modelling help in getting sensible estimates for
cases where very few data are available? We'll explore that
here, using the concept of "pooling" data from multiple cases.
Suppose we set out camera traps in our study area with the
intention of using the proportion of days when we detected our
target species as an index of abundance. This is sometimes
called the Relative Abundance Index, RAI; I'm not sure how
useful it is in general, but let's assume it works for the
species we are interested in. Our toy data set has only 9 cameras, and they were put out
for 2 months. Four of the cameras operated properly for all 60
days, but four of them failed after only a few days
and one didn't work at all. Can we draw useful inferences about
the sites with hardly any data?
Here are
the results, with the number of nights operational, n, number of
nights with detections, y, and a naive estimate of RAI = y/n.
n <- c(60, 60, 60, 60, 7, 6, 4, 3, 0)
y <- c(42, 38, 21, 19, 3, 5, 0, 2, 0)
names(n) <- LETTERS[1:9]
cbind(n=n, y=y, R=round(y/n, 2)
n y R
A 60 42 0.70
B 60 38 0.63
C 60 21 0.35
D 60 19 0.32
E 7 3 0.43
F 6 5 0.83
G 4 0 0.00
H 3 2 0.67
I 0 0 NaN
Based on this, we can be pretty sure that
abundance is higher at sites A and B than at sites C and D. But
what do we make of sites E to H? We don't have enough
data to say very much; in particular, it would be rash to
conclude that the species was absent (abundance = 0) at site G
based on just 4 nights of data. And we can't say anything about
site I.
Pooling the data
One way to get around the problem of sites
with few data - or no data at all, as with site I - is to pool
the data from all the sites. In this case we have a total of 260
trap nights with detection on 130:
sum(y)/sum(n)
[1] 0.5
So now we conclude that RAI = 0.5 for all the sites. Hmm,
well, that solves the problem for site I, and it may be a better
estimate than for the unpooled data for E to H, but it ignores
the clear differences among sites A to D. This is not a good
solution.
Partial pooling
We need a solution that takes account of the overall
estimate, 0.5, but also makes use of the information in the data
we have, whether it's a lot or just a little.
The first 4 sites have lots of data. The high unpooled
estimates for A and B might be slightly too high and the low
estimates for C and D slightly too low, but we don't want to
change them very much.
We have few data for sites E to H, so the estimates there
should be close to the overall estimate, but still we'd want it
to be a bit higher of F and H and a bit lower for E and
especially G.
And for site I, we'd go with the overall estimate, as it's
the only information available. That's nice in principle.
Let's see how we calculate defensible estimates. Fitting the
models
You shouldn't be surprised that calculating estimates of
parameters involves fitting models! And we'll do this in a
Bayesian framework using JAGS. No pooling
Without resorting to pooling, we can't say anything about
site I, so we fit the model to just the first 8 sites. Here's the
JAGS code:
modeltxt <- "
model{
for(i in 1:8) {
# observation model
y[i] ~ dbin(R[i], n[i])
# prior for each R[i]
R[i] ~ dbeta(1, 1)
}
}"
writeLines(modeltxt, con="nopooling.jags")
We are treating each trap night as a trial with a success if
the species is detected. The estimated RAI is simply the
probability of success, R , for which we use a uniform prior.
We run the code (we don't need to supply initial values) and look at diagnostic plots:
jdata <- list(y=y, n=n)
wanted <- "R"
( outnp <- jags(jdata, NULL, wanted, "nopooling.jags", DIC=FALSE,
n.burn=100, n.iter=1100, n.chains=3) )
wiqid::diagPlot(outnp)
The diagnostic plots look ok, and here are the results:
mean sd 2.5% 50% 97.5% overlap0 f Rhat n.eff
R[1] 0.694 0.059 0.574 0.696 0.801 FALSE 1 1.003 853
R[2] 0.628 0.061 0.507 0.629 0.744 FALSE 1 1.002 1378
R[3] 0.354 0.060 0.241 0.352 0.478 FALSE 1 1.002 3000
R[4] 0.321 0.057 0.214 0.320 0.437 FALSE 1 1.002 1667
R[5] 0.440 0.157 0.149 0.436 0.742 FALSE 1 1.003 720
R[6] 0.753 0.144 0.443 0.776 0.968 FALSE 1 1.000 3000
R[7] 0.163 0.141 0.004 0.123 0.526 FALSE 1 1.009 277
R[8] 0.590 0.206 0.171 0.603 0.926 FALSE 1 1.003 1185
The means are all pretty close to the y/n estimates, except
for R[7] , site G, where y/n = 0/4. Check the diagnostic plots
for R[7] and you'll see that the density peaks at 0 and then
declines rather slowly; you could easily get a result of 0 out
of 4 with true R = 0.2.
Total pooling
With total pooling (also known as "complete pooling"), we
just estimate one value for the RAI which (we assume) applies to
all the sites. We create the JAGS model for that, where we now
include all nine sites, and then run the model using the same
data file: modeltxt <- "
model{
for(i in 1:9) {
# observation model
y[i] ~ dbin(R, n[i])
}
# prior for R
R ~ dbeta(1, 1)
}"
writeLines(modeltxt, con="totpooling.jags")
( outtp <- jags(jdata, NULL, wanted, "totpooling.jags", DIC=FALSE,
n.burn=100, n.iter=1100, n.chains=3) )
wiqid::diagPlot(outtp)
mean sd 2.5% 50% 97.5% overlap0 f Rhat n.eff
R 0.5 0.031 0.439 0.501 0.559 FALSE 1 1 3000
Partial pooling
We need to have separate R 's for each of the sites, but each
R must take account of what's happening at the other sites too,
so the model is a good deal more sophisticated. In fact, it's a
multi-level or hierarchical model. Let's show the JAGS code and
then discuss the details:
modeltxt <- "
model{
for(i in 1:9) {
# observation model
y[i] ~ dbin(R[i], n[i])
# prior for R[i]
logitR[i] ~ dnorm(mu, tau)
R[i] <- ilogit(logitR[i])
}
# hyperpriors for mu and tau
Rmean ~ dbeta(1, 1)
mu <- logit(Rmean)
sd ~ dunif(0, 10)
tau <- sd^-2
}"
writeLines(modeltxt, con="parpooling.jags")
Instead of each R being drawn from it's own Beta(1, 1) prior,
the R 's have a single, informative prior, which contains
information from all the sites - the overall RAI and the
variation among sites. There are various ways to set up this
single prior, but since we are familiar with logit models, I used a normal distribution for
logitR which is then converted to R
with ilogit . The normal distribution has mean
mu and precision tau , the same for all R 's.
So what are the values for mu and tau ?
We don't know, but we can estimate them from the data: this is
another "higher" level model. But we need priors for this
model, sometimes known as hyperpriors because they are priors
for priors. The overall RAI, Rmean , is a
probability, so I use a Beta(1,1) prior for that, then get
mu from logit(Rmean) . I use a uniform prior
for the spread (sd ) of the R's then
convert that to precision, tau . We use the same data but we want to extract
Rmean and sd .
Mixing is not so good with hierarchical models, so I increased
the number of iterations.
wanted <- c("R", "Rmean", "sd")
( outpp <- jags(jdata, NULL, wanted, "parpooling.jags", DIC=FALSE,
n.burn=100, n.iter=10100, n.chains=3) )
wiqid::diagPlot(outpp)
mean sd 2.5% 50% 97.5% overlap0 f Rhat n.eff
R[1] 0.683 0.059 0.562 0.685 0.793 FALSE 1 1.000 15451
R[2] 0.622 0.061 0.501 0.623 0.737 FALSE 1 1.000 30000
R[3] 0.362 0.061 0.248 0.360 0.485 FALSE 1 1.000 15232
R[4] 0.332 0.059 0.223 0.330 0.452 FALSE 1 1.000 30000
R[5] 0.454 0.145 0.178 0.453 0.741 FALSE 1 1.000 30000
R[6] 0.686 0.149 0.382 0.694 0.940 FALSE 1 1.000 17285
R[7] 0.268 0.161 0.014 0.255 0.601 FALSE 1 1.000 30000
R[8] 0.565 0.181 0.209 0.567 0.904 FALSE 1 1.000 9748
R[9] 0.495 0.237 0.046 0.495 0.943 FALSE 1 1.000 9977
Rmean 0.493 0.110 0.263 0.495 0.710 FALSE 1 1.000 16488
sd 1.169 0.637 0.429 1.021 2.733 FALSE 1 1.006 2083
We have a sensible value for Rmean , though the CrI is pretty
wide as it's based on only 4 sites with good data. We also have
an estimate for site I, R[9] , which is close to
Rmean but with
an even wider CrI. Let's compare the results of the different
methods as a table and a plot:
res <- cbind(n=n,
y=y,
naive = y/n,
np = c(outnp$mean$R, NA),
tp = rep(outtp$mean$R, 9),
pp = outpp$mean$R)
round(res, 2)
n y naive np tp pp
A 60 42 0.70 0.69 0.5 0.68
B 60 38 0.63 0.63 0.5 0.62
C 60 21 0.35 0.35 0.5 0.36
D 60 19 0.32 0.32 0.5 0.33
E 7 3 0.43 0.44 0.5 0.46
F 6 5 0.83 0.75 0.5 0.69
G 4 0 0.00 0.16 0.5 0.26
H 3 2 0.67 0.59 0.5 0.57
I 0 0 NaN NA 0.5 0.49
This has done what we wanted: the sites with lots of data, A-D,
are almost unaffected, but those with few data, E-H, have been
pulled towards the overall mean. We've even got an estimate for
the site with no data, though with wide CrI; a key problem there
is the wide variability among sites: we don't know if site I is
as high as A or B or even higher, or as low as C or D or even
lower. |