Goodnessoffit : Occupancy models 

HOME 
We talk about "fitting a model" to data, but how well does our
model actually fit? Assessing model fit, or GoodnessOfFit (GOF)
checks are important. Sometimes
the fit is embarrassingly poor, often an incentive to improve the model.
GOF measures rarely indicate how to improve, and we'll look at
more specific checks in a later post. GOF looks at the fit of the model to the same data set used to estimate parameters. We are not concerned here with selecting models with the best predictive ability. In general, the fit improves as we add more parameters, and GOF is usually done with the full model, the one with the maximum number of parameters. Discrepancy measures for binomial observationsNotation: \(y\) = observed number of successes, \(k\) = number of trials, \(p\) = probability of success. Note that the discrepancy measures used here can't be used with a single trial, ie, 0/1 data, we need to aggregate trials. For example: \(k\) = 5 visits to an occupied site when \(p\) = 0.7, expected number of times detected = \(pk\) = 3.5. The data, \(y\), are integers, the expected value is continuous, hence the difference will not be zero. Chisquared: \[\chi^2 = \frac{(y  pk)^2}{pk}\] If \(pk\) is small, \(\chi^2\) explodes, so one site can cause a huge discrepancy. If you were running a chisquared test, you would manually pool results from sites so as to have reasonable \(pk\) values, but we can't do that programmatically. Also we may need to add a small amount to the denominator to avoid dividebyzero errors: Kéry & Royle (2016, p193) add 0.0001, Kéry & Schaub (2012, p398) add 0.5; with no mathematical justification for this, it smacks of fudge. Pearson's chisquared: \[\chi^2 = \frac{(y  pk)^2}{pk(1p)}\] This explodes if p = 0 or 1. Tobler et al (2015) use the square root of this, but with 0.001 added to p in the denominator to prevent dividebyzero errors. FreemanTukey: \((\sqrt{y}  \sqrt{pk})^2\) Avoids explosions and one or a few sites dominating the overall result. Eg, Kéry & Schaub (2012, p224), King et al (2010, p139). This is my favourite. Deviance: Calculated from likelihood = \( {\rm I\!P}(y  {\rm parameters})\) A good fit results in a high likelihood. Deviance = 2*log(likelihood). Eg King et al (2010, p138, 396). With multiple sites or animals, we calculate the discrepancy for each and add them up to get a total discrepancy. Assessing discrepancy:The discrepancy is not going to be zero  what values are good/bad? To assess that, we generate many simulated data sets from the fitted model and calculate the discrepancy for each of these, then see where the discrepancy for our actual data set compares. If the actual value lies near the middle of the simulated values (the solid line in the plot below), we are happy with the fit. But if the actual value is out in the tails of the simulated distribution, the fit is poor (the dashed lines).
We can calculate the pvalue, ie, the probability of obtaining a discrepancy at least as large as the observed discrepancy if the model fits the data. Values near 0.5 indicate a good fit; values above 0.9 or below 0.1, a bad fit. The histogram above was generated using the ML estimates from a frequentist fit, which ignores uncertainty in the parameter estimates. For a Bayesian fit, we incorporate uncertainty by simulating a "posterior predictive" data set for each step in the MCMC chain. We then have MCMC chains for simulated and actual discrepancies; we can compare these in a scatter plot and calculate a pvalue, sometimes referred to as a "Bayesian pvalue". Implementation in JAGSWe'll use an occupancy model to illustrate this, with the data for willow tits with site covariates from the Swiss Breeding Bird Survey, the same data used by Hooten & Hobbs (2015), except that they used only the first 200 rows, removing most of the high elevation sites. We will use the site covariates and ignore the survey covariates for the moment. The following R code will download and prepare the data:
FreemanTukey discrepancyWith the model matrix parameterization, the JAGS code with FreemanTukey discrepancy GOF estimates (saved in "site_covs.jags") looks like this:
We prepare the necessary bits and pieces and run the null model (intercept only) and the full model with jagsUI:
The posterior predictive check plot^{1} shows that the discrepancies
for the observed data are generally larger than for the
simulated data, but the fit here is not bad. We would expect the
model with more parameters to be a better fit to the data, but
the difference here is not clear  the difference of 0.01 in the
Pvalues could be due to MCMC error. It could be that our
discrepancy measure is not picking up the difference in fit for
Notice that the simulated data depend on
Resampling DevianceFor this we calculate the loglikelihood for each site with this code:
followed by
The deviance measure is clearly picking up a difference between the full model and the rest, though the full model has a P value which is almost too big. Calculating GOF after the JAGS runIt's sometimes convenient to calculate GOF
statistics after doing the JAGS run; this is the case when the
run takes a long time and you did not include GOF in the model.
Here we'll calculate the FreemanTukey discrepancy starting from
the posterior distribution of We can use the same model code as above, deleting  or just ignoring  the GOF code highlighted in blue, but we need to monitor z:
After running the model, we do this:
1 I'm currently using the devel version of jagsUI from Github, which displays the correct P value. Older versions of jagsUI showed 1  P. 
Updated 18 Aug 2019 by Mike Meredith 