Uniform priors for logistic model coefficients

HOME If you are estimating a probability based on covariates you will likely use a logistic model. That includes practically all our occupancy, SCR and survival models. I like to use a uniform prior for regression coefficients, usually dunif(-5, 5),  at least for preliminary runs and exploration, as it makes some problems easier to spot in diagnostic plots. And ease of spotting is important if you have dozens or hundreds of parameters to check.

Uniform priors are difficult to justify - how come a value of 4.999 is perfectly plausible but 5.0001 is impossible? - so should probably be replaced with a prior that tails off for final analysis and reporting or publication, but they are good for diagnosis.

An example

Let's take a simple example, a regression with two covariates: x1 is a two-level categorical variable coded as 0/1 and x2 is continuous with mean = 0 and SD = 1; y is a 0/1 response variable. The model I want to fit is:
y[i] ~ dbern(p[i])
logit(p[i]) <- b0 + b1 * x1[i] + b2 * x2[i]

I run this first with fairly narrow normal priors, dnorm(0, 0.1), corresponding to variance =10, SD = 3.16, as +/-6 on the logit scale are very close to 0 or 1 on the probability scale.

Here are the diagnostic plots:

A quick look at these and they seem ok. But there are problems if you look harder, and they become clear with dunif(-5, 5) priors:

Now it's clear that b1 is bumping up against the +5 limit; the posterior mean with the normal prior was around 8 - very high - but not immediately obvious. Checking the data shows that y = 1 whenever x1 = 1, and y = 0 when x1 = 0. This is a case of separation.

Obviously b2 has a problem too, the values are just being drawn from the prior. How did that happen? In this case because the + b2 * x2[i] term is not in the model! Maybe someone messed up the indexing or simply used the wrong model file, but the posterior density in the upper set of plots looks fine.

It can also happen if there are no data for the case in question. For example, for a two-species occupancy model where the dominant species is detected at every site, you can't estimate occupancy of the subordinate species when the dominant species is absent. But with a normal prior you will get a plausible-looking posterior.

Compare the two sets of plots below:

The first set look pretty good, but the last are obviously a mess. But the posteriors plotted are just drawn from the priors and are unaffected by the data. This happened because the response in the data list passed to JAGS was y, but in the model code it was Y.


Updated 1 June 2019 by Mike Meredith