Goodness-of-fit : SCR models

HOME In a previous post we looked at GOF tests for occupancy models. Here we'll look at similar tests for spatially-explicit capture-recapture data, and go on to other posterior predictive checks.

This post uses the results of analysis of data provided by Rajan Amin of the Zoological Society of London (ZSL). The study deployed 44 detectors for 130 days and detected 22 animals.

The 'p0(sex) sigma(sex)' model

Initial modelling showed a clear difference in p0 and sigma for males and females, but the fit was not good even after allowing for this.

We used the approach of Royle et al (2014, section 8.4.2), calculating 3 fit statistics, all using Freeman-Tukey discrepancies:

  • T1 : individual x detector frequencies, aggregating the binary data by animals and detectors
  • T2 : individual encounter frequencies, aggregating for each animal
  • T3 : detector frequencies, aggregating for each detector

The JAGS code for this model is shown below. The raw data are binary, 1/0 for each occasion, and we aggregate these so that Y is an animals x detectors matrix with the number of detections, augmented with extra rows of zeros to give M rows total; nOcc is a vector with the number of occasions each detector was active; sex is a vector of length M with the sex of detected animals and NA for undetected animals. The habitat mask and related constants are derived from makeJAGSmask. More explanations and comments follow the code.

model {
  omega ~ dbeta(1,1)  # existence parameter
  pi ~ dbeta(1,1)     # sex ratio

  for(t in 1:2){      # separate priors for male and female
    p0[t] ~ dbeta(1,1)
    sigma[t] ~ dunif(0, 30)        # in pixel units
    sigma2[t] <- 1/(2*sigma[t]^2)
  }

  for(i in 1:M){
    w[i] ~ dbern(omega)  # w=1 if the individual is available for detection
    sex[i] ~ dbern(pi)
    AC[i, 1] ~ dunif(1, upperLimit[1])
    AC[i, 2] ~ dunif(1, upperLimit[2])
    pOK[i] <- habMat[trunc(AC[i, 1]), trunc(AC[i, 2])] # habitat check
    OK[i] ~ dbern(pOK[i]) # OK[i] = 1, the ones trick

    for(j in 1:nTraps){
      d2[i,j] <- (AC[i,1] - trapMat[j,1])^2 + (AC[i,2] - trapMat[j,2])^2
      p[i,j]<- p0[sex[i]+1] * exp(-sigma2[sex[i]+1]*d2[i,j])
      Y[i,j] ~ dbin(p[i,j] * w[i], nOcc[j])
      # components for fit diagnostics
      Ysim[i,j] ~ dbin(p[i,j] * w[i], nOcc[j]) # simulated
      Yexp[i,j] <- p[i,j] * w[i] * nOcc[j]     # expected
      # components for T1
      err1obs[i,j] <- (sqrt(Y[i,j]) - sqrt(Yexp[i,j]))^2
      err1sim[i,j] <- (sqrt(Ysim[i,j]) - sqrt(Yexp[i,j]))^2
    }
    # components for T2
    err2obs[i] <- (sqrt(sum(Y[i,])) - sqrt(sum(Yexp[i,])))^2
    err2sim[i] <- (sqrt(sum(Ysim[i,])) - sqrt(sum(Yexp[i,])))^2
  }
  # components for T3
  for(j in 1:nTraps){
    err3obs[j] <- (sqrt(sum(Y[,j])) - sqrt(sum(Yexp[,j])))^2
    err3sim[j] <- (sqrt(sum(Ysim[,j])) - sqrt(sum(Yexp[,j])))^2
  }

  # Fit diagnostics totals
  T1obs <- sum(err1obs)
  T1sim <- sum(err1sim)
  T2obs <- sum(err2obs)
  T2sim <- sum(err2sim)
  T3obs <- sum(err3obs)
  T3sim <- sum(err3sim)

  # Derived parameters
  N[1] <- sum(w * (1-sex))      # number of females...
  N[2] <- sum(w * sex)          # ... males
  D <- N /area * 1e8            # convert m2 to 100 km2
  p0pc <- p0 * 100              # p0 as a percentage
  sigmakm <- sigma * pixelWidth # sigma in km
}

The model is a standard SCR model with a half-normal detection function and binomial likelihood for the aggregated detection data. To this we have added code to estimate Freeman-Tukey discrepancies between the expected number of detections and (a) the observed detections (err.obs) and (b) a simulated number of detections based on the current values of the parameters at each step of the MCMC chain (err.sim). We ran the model with jagsUI::jags and assigned the output to out.

GOF p-value plots

We used jagsUI:pp.check to plot the discrepancies and produce the P values:

All three GOF plots are bad, but it's not clear from these what we should do to improve the model fit. So next we tried comparing the simulated data, Ysim in the model code, to the observed data with a number of biologically relevant criteria.

Observed values and posterior predictive values

We extract the MCMC chains for Ysim with the following code:

Ysim <- out$sims.list$Ysim
dim(Ysim)
[1] 1000  80  44

Here I had done some fairly brutal thinning to reduce the number of draws saved to 1000 to keep object sizes small. Each step of the MCMC process generates simulated detection data with 80 rows for the individual animals (including the all-zero detection histories added) and 44 columns for the traps. This corresponds to the Yobs matrix, with 22 rows for animals detected and 44 columns for traps.

We'll begin by plotting the observed and posterior predictions for the total number of detections and the number of animals detected:

# 1. total detections
( obs <- sum(Yobs) )
147
sims <- apply(Ysim, 1, sum)
hist(sims, xlim=range(sims, obs), main="Total detections")
abline(v=obs, col='red', lwd=3)
legend('topright', legend=paste("p =", round(mean(sims > obs),2)), bty='n')

# 2. animals detected
( obs <- dim(Yobs)[1] )
22
getNcaps <- function(x) sum(rowSums(x) > 0)
sims <- apply(Ysim, 1, getNcaps)
hist(sims, xlim=range(sims, obs), main="Animals detected")
abline(v=obs, col='red', lwd=3)
legend('topright', legend=paste("p =", round(mean(sims > obs),2)), bty='n')

Nothing notable here. Next we'll look at some statistics related to detector performance. The code to plot the histograms is identical to the above and is not shown.

# 3. total detectors visited
getTrapsVisited <- function(x) {
  visitsPerTrap <- colSums(x)
  sum(visitsPerTrap > 0) }
( obs <- getTrapsVisited(Yobs) )
31
sims <- apply(Ysim, 1, getTrapsVisited)
... [plotting code omitted]

# 4. max visits per detector
getMaxVisitsperTrap <- function(x) {
  visitsPerTrap <- colSums(x)
  max(visitsPerTrap) }
( obs <- getMaxVisitsperTrap(Yobs) )
20
sims <- apply(Ysim, 1, getMaxVisitsperTrap)
... [plotting code omitted]

# 5. max detections for 1 animal at 1 detector
( obs <- max(Yobs) )
13
sims <- apply(Ysim, 1, max)
... [plotting code omitted]

Now here are some good clues: lots of detectors are not picking up animals, and those that do detect more than the model predicts. This points to heterogeneity among traps. One last plot:

# 6. max detections for 1 animal overall
getMaxCaps <- function(x) {
  capsPerAnimal <- rowSums(x)
  max(capsPerAnimal)}
( obs <- getMaxCaps(Yobs) )
23
sims <- apply(Ysim, 1, getMaxCaps)
... [plotting code omitted]

Again this is consistent with trap heterogeneity.

The 'p0(sex+het) sigma(sex)' model

We'll try a model with a random effect for probability of detection, p0. We modify the code for P0 to have a different value for each detector, with males and females being above or below the average. The priors for p0 and sigma now look like this:

  for(t in 1:2){      # separate priors for male and female
    sigma[t] ~ dunif(0, 30)        # in pixel units
    sigma2[t] <- 1/(2*sigma[t]^2)
  }

  for(j in 1:nTraps) {
    lp0.trap[j] ~ dnorm(mu, tau)
    logit(p0[j,1]) <- lp0.trap[j] + lp0.sex
    logit(p0[j,2]) <- lp0.trap[j] - lp0.sex  # average sex effect is zero.
  }
  mean.p0 ~ dbeta(1,1)
  mu <- logit(mean.p0)
  sig.p0 ~ dunif(0,5)
  tau <- 1/sig.p0^2
  lp0.sex ~ dnorm(0, 0.1)

And the line for p[i, j] becomes:

        p[i,j] <- p0[j, sex[i]+1] * exp(-sigma2[sex[i]+1]*d2[i,j])

GOF p-value plots

The GOF P-values are now fine, so we would conclude that the model is a good fit to the data. But let's look at the other statistics we plotted.

Observed values and posterior predictive values

These plots show no evidence of lack of fit. Our diagnosis of detector heterogeneity seems to have been correct and no other problems are evident.

Updated 5 May 2020 by Mike Meredith