I've recently been asked about coding categorical covariates in
JAGS. You can do it in the usual frequentist manner with dummy
variables for all but one of the variables, but JAGS allows
other, more easily interpreted ways of coding, especially if there
is only one
categorical covariate in the model and it
is a logistic or log regression. Many of our wildlife analyses
have
logistic models under the hood. In R, the class factor
is used for categorical variables. A factor has a levels
attribute with the names of the categories, in alphabetical
order by default, and the values are stored as integers giving
the level number.
A toy example
I'll generate some simulated data for a logistic regression
model (as that generalises to a lot of our other models). We
have four players with different degrees of skill and a
continuous covariate which does nothing.
playerNames < c("Andy", "Bob", "Chan", "Dora")
skill < c(0.3, 0.6, 0.7, 0.8)
set.seed(2017)
player.char < sample(playerNames, size=200, replace=TRUE, prob=c(1,3,2,2))
player < factor(player.char)
summary(player)
Andy Bob Chan Dora
25 84 52 39
success < rbinom(200, 1, p=skill[as.integer(player)])
fluff < rnorm(200)
tapply(success, player, mean) # Check: should be close to 'skill'
Andy Bob Chan Dora
0.2800000 0.5833333 0.7115385 0.8461538
I've used character strings for the player names and then
for the levels of the player factor (they appear in
the summary), which I think is much safer than using numbers for
categories. A factor can easily be converted to integers if
necessary, as in the call to rbinom above. Note
that the number of observations per category varies, something
that happens often with real data.
Frequentist analysis with glm
We'll quickly run a glm analysis for the full model:
glm(success ~ player + fluff, family='binomial')
Call: glm(formula = success ~ player + fluff, family = "binomial")
Coefficients:
(Intercept) playerBob playerChan playerDora fluff
0.9589 1.2927 1.8726 2.6850 0.1198
Degrees of Freedom: 199 Total (i.e. Null); 195 Residual
Null Deviance: 263.6
Residual Deviance: 239.2 AIC: 249.2
Notice that we have coefficients for Bob, Chan and Dora, but
not for Andy. The intercept refers to Andy's plays and the other
coefficients give the difference between Andy and each of the
other players. It isn't possible to have an intercept and a
coefficient for each player, as they would be confounded: you
could add any value you liked to the intercept provided you
subtracted the same value from the player coefficients and you'd
get exactly the same result.
The null model  no covariates
To begin with, let's run a JAGS model with no covariates. In
this case we need to estimate a single probability of success,
p,
and do not need the logit link.
modelText < "
model {
for(i in 1:length(success)) {
success[i] ~ dbern(p)
}
# Priors
p ~ dbeta(1,1)
}"
writeLines(modelText, con="modelNull.txt")
Here I'm using a Beta prior, which is the appropriate
distribution for a probability, as it is restricted to the range
of 0 to 1. Beta(1,1) is a uniform prior with little information, and no one is likely to
question its choice, except to suggest a more informative prior.
And a more informative prior could easily be coded with, say,
p ~ dbeta(2,3) . Many people use Uniform(0,1) for a
uniform prior instead of Beta(1,1); that's mathematically
equivalent, but is not recognised as a probability by JAGS.
JAGSdata < list(success=success)
wanted < "p"
library(jagsUI)
outNull < jags(JAGSdata, NULL, wanted, "modelNull.txt",
n.chains=3, n.adapt=100, n.iter=2000)
outNull
...
mean sd 2.5% 50% 97.5% overlap0 f Rhat n.eff
p 0.629 0.033 0.563 0.629 0.692 FALSE 1 1.000 3668
...
plot(outNull)
This works fine and the plots (not shown) indicate good
mixing and convergence.
One categorical covariate
If we have only one covariate and that covariate is
categorical, the simplest strategy is to estimate a value for
p for each category. We do not use the logit link,
and dealing directly with p simplifies
interpretation and discussion of priors.
modelText < "
model {
for(i in 1:length(success)) {
success[i] ~ dbern(p[player[i]])
}
# Priors
for(i in 1:4) {
p[i] ~ dbeta(1,1)
}
}"
writeLines(modelText, con="modelPlayer.txt")
Again I'm using a Beta(1,1) prior, which could easily be replaced by an
informative beta prior, a different one for each player if you
wanted.
JAGSdata < list(success=success, player=as.integer(player))
str(JAGSdata)
List of 2
$ success: int [1:200] 0 1 0 0 1 1 0 1 1 1 ...
$ player : int [1:200] 1 3 3 2 4 4 2 3 3 2 ...
wanted < "p"
The factor player was converted to integers for
JAGS, with Andy = 1, Bob = 2, etc. This is used as the index
into the p vector so that the right p
is used for each observation in the success vector.
outPlayer < jags(JAGSdata, NULL, wanted, "modelPlayer.txt",
n.chains=3, n.adapt=100, n.iter=2000)
outPlayer
...
mean sd 2.5% 50% 97.5% overlap0 f Rhat n.eff
p[1] 0.299 0.087 0.144 0.294 0.487 FALSE 1 1.001 1627
p[2] 0.582 0.053 0.479 0.582 0.684 FALSE 1 1.000 3484
p[3] 0.705 0.061 0.578 0.708 0.817 FALSE 1 1.000 6000
p[4] 0.830 0.058 0.706 0.835 0.927 FALSE 1 1.001 2855
...
# Compare with the proportion of successes for each player:
tapply(success, player, mean)
Andy Bob Chan Dora
0.2800000 0.5833333 0.7115385 0.8461538
The posterior means are close to the actual proportion of
successes in the data, though pulled somewhat towards 0.5 as we
would expect  with a uniform prior, the posterior mode
would be equal to the observed proportion of successes; the
posterior distribution is skewed, so the mean is
closer to 0.5.
More than one covariate
With more than one covariate we do need to use the logit
link. There are several strategies, but with only one categorical covariate, we can keep it
simple by using a different intercept for each category.
The intercept for each player is the logit(probability of
success) when fluff = 0 , which is actually mean of
fluff. So we might be able to provide a sensible prior for
probability of success. In the code below, I've specified a
uniform prior, Beta(1,1), for the intercepts on the probability
scale, then converted to logit for use in the regression. You
could use an informative Beta prior for each player if you
wished.
One intercept per category
modelText < "
model {
for(i in 1:length(success)) {
logit(p[i]) < bPlayer[player[i]] + bFluff * fluff[i]
success[i] ~ dbern(p[i])
}
# Priors
for(i in 1:4) {
pPlayer[i] ~ dbeta(1,1) # prior specified on probability scale...
bPlayer[i] < logit(pPlayer[i]) # ... then converted to logit scale.
}
bFluff ~ dunif(5,5)
}"
writeLines(modelText, con="model1.txt")
Here I'm using my favourite uniform priors, dunif(5,5) , for logistic regression
coefficients when continuous covariates are standardised (fluff was
generated with mean = 0 and SD = 1). These may not be appropriate for your
reported analysis, but it is easy to see if the prior is constraining the
posterior.
JAGSdata < list(success=success, player=as.integer(player), fluff=fluff)
str(JAGSdata)
List of 3
$ success: int [1:200] 0 1 0 0 1 1 0 1 1 1 ...
$ player : int [1:200] 1 3 3 2 4 4 2 3 3 2 ...
$ fluff : num [1:200] 0.48 0.688 0.201 0.101 0.577 ...
wanted < c("bPlayer", "bFluff")
out1 < jags(JAGSdata, NULL, wanted, "model1.txt",
n.chains=3, n.adapt=100, n.iter=2000)
out1
...
mean sd 2.5% 50% 97.5% overlap0 f Rhat n.eff
bPlayer[1] 1.013 0.448 1.924 0.998 0.157 FALSE 0.989 1.002 5123
bPlayer[2] 0.337 0.221 0.095 0.338 0.764 TRUE 0.936 1.002 1195
bPlayer[3] 0.943 0.316 0.345 0.931 1.588 FALSE 0.999 1.000 6000
bPlayer[4] 1.835 0.466 1.011 1.811 2.827 FALSE 1.000 1.000 6000
bFluff 0.123 0.166 0.196 0.122 0.448 TRUE 0.771 1.000 6000
...
The bPlayer coefficients are on the logit scale,
so we need to convert to probabilities (for the cases where
fluff = 0 ) for comparison with the
known probabilities:
logitP1 < out1$sims.list$bPlayer
P1 < plogis(logitP1)
colMeans(P1)
[1] 0.275 0.582 0.715 0.853
One category as reference category
The method above won't work if you have more than one
categorical covariate. An alternative is to use the strategy
adopted by default by glm : the model has an intercept which
refers to one of the categories, and coefficients for the
difference between this reference category and each of the
others. We adapt our code by adding the intercept, b0 ,
and setting the coefficient for the reference category to zero.
modelText < "
model {
for(i in 1:length(success)) {
logit(p[i]) < b0 + bPlayer[player[i]] + bFluff * fluff[i]
success[i] ~ dbern(p[i])
}
# Priors
b0 ~ dunif(5,5)
bPlayer[1] < 0 # First level is reference category
for(i in 2:4) {
bPlayer[i] ~ dunif(5,5) # Difference between each player and the reference player
}
bFluff ~ dunif(5,5)
}"
writeLines(modelText, con="model2.txt")
Here we might say that b0 is the intercept for
the reference player, so we can provide a sensible prior for
that, eg, p0 ~ dbeta(1,1) ; b0 < logit(p0) . But I
have no intuitive feel for how you would choose priors for the
differences (on the logit scale) among players.
The data list, JAGSdata , is the same and we add
b0 to the parameters in wanted . The
output looks like this:
mean sd 2.5% 50% 97.5% overlap0 f Rhat n.eff
b0 0.974 0.444 1.891 0.970 0.117 FALSE 0.988 1.018 147
bPlayer[1] 0.000 0.000 0.000 0.000 0.000 FALSE 1.000 NA 1
bPlayer[2] 1.312 0.495 0.366 1.309 2.294 FALSE 0.996 1.014 172
bPlayer[3] 1.916 0.540 0.881 1.904 3.017 FALSE 1.000 1.011 255
bPlayer[4] 2.773 0.634 1.563 2.749 4.068 FALSE 1.000 1.008 276
bFluff 0.124 0.165 0.196 0.123 0.447 TRUE 0.773 1.001 2184
Notice that the effective sample sizes, n.eff , are very low  we ran 3 chains
of 2000 iterations = 6000 total  and the trace plots suggest
poor mixing. The main reason for this is that the reference
category, Andy, has relatively few observations, only 25 out of
200. That means that it's difficult to estimate the intercept
and hence difficult to estimate the other coefficients.
We can improve things by making Bob, who has the most
observations, the reference category. That could be done by changing
the JAGS code (set bPlay[2] < 0 ) but it's easier
to change the data by releveling the player factor so that
Bob comes first:
player_Bob < relevel(player, ref="Bob")
summary(player_Bob)
Bob Andy Chan Dora
84 25 52 39
JAGSdata < list(success=success, player=as.integer(player_Bob), fluff=fluff)
We run the model with the new data and get:
mean sd 2.5% 50% 97.5% overlap0 f Rhat n.eff
b0 0.337 0.222 0.078 0.329 0.783 TRUE 0.941 1.000 4153
bPlay[1] 0.000 0.000 0.000 0.000 0.000 FALSE 1.000 NA 1
bPlay[2] 1.349 0.517 2.431 1.332 0.387 FALSE 0.998 1.000 6000
bPlay[3] 0.599 0.383 0.151 0.593 1.364 TRUE 0.942 1.000 4634
bPlay[4] 1.467 0.503 0.510 1.453 2.502 FALSE 0.999 1.001 2687
bFluff 0.124 0.166 0.199 0.123 0.450 TRUE 0.770 1.001 2864
Effective sample sizes are much better and the diagnostic
plots now look good.
To get the estimated skill levels (with
fluff = 0 ) we need to add the intercept, b0 ,
to the bPlayer coefficients:
logitP2 < out2$sims.list$bPlayer + out2$sims.list$b0
P2 < plogis(logitP2)
colMeans(P2)
[1] 0.581 0.276 0.714 0.850
... remembering that Bob is now first in the list and Andy
is second.
Categorical coefficients sum to 0
Choosing a reference category can make interpretation of the
output difficult, especially if there are several categorical
covariates, each with its own reference category. An alternative
is to have a (nonzero) coefficient for each category but to
ensure that they add up to 0  or, if you prefer, the mean is 0.
Here's the JAGS model:
modelText < "
model {
for(i in 1:length(success)) {
logit(p[i]) < b0 + bPlayer[player[i]] + bFluff * fluff[i]
success[i] ~ dbern(p[i])
}
# Priors
b0 ~ dunif(5,5)
for(i in 1:4) {
temp[i] ~ dnorm(0, 0.01)
}
bPlayer < temp  mean(temp)
bFluff ~ dunif(5,5)
}"
writeLines(modelText, con="model3.txt")
The bPlayer coefficients are generated in two
steps: first we get a set of temp values drawn from
broad normal distributions, then we subtract the mean from
each. This time I'm not using my dunif priors for
temp , as they will in fact be bumping up against
the limits. The normal prior with mean 0 has the effect of
keeping them within a reasonable range.
The details of running the model are the same, and here is
the output:
mean sd 2.5% 50% 97.5% overlap0 f Rhat n.eff
b0 0.531 0.187 0.174 0.528 0.912 FALSE 0.998 1.000 6000
bPlayer[1] 1.506 0.377 2.279 1.498 0.803 FALSE 1.000 1.001 3560
bPlayer[2] 0.189 0.244 0.674 0.190 0.291 TRUE 0.787 1.002 2120
bPlayer[3] 0.406 0.296 0.164 0.401 0.986 TRUE 0.919 1.000 6000
bPlayer[4] 1.289 0.381 0.597 1.277 2.102 FALSE 1.000 1.001 1909
bFluff 0.124 0.170 0.207 0.122 0.453 TRUE 0.768 1.000 6000
We can check that the bPlayer coefficients sum to
0 or very near to zero:
tst < rowSums(out3$sims.list$bPlayer)
range(tst)
[1] 1.065814e14 1.065814e14
This time the individual bPlayer coefficients tell
us the difference in skill for each player from the mean of all
players (on the logit scale). The mean with
fluff = 0 is given by the intercept. So to recover
estimates of skill we need to add the intercept:
logitP3 < out3$sims.list$bPlayer + out3$sims.list$b0
P3 < plogis(logitP3)
colMeans(P3)
[1] 0.283 0.584 0.714 0.852
Notice that we are using the "mean of all players" here; we
do not take into account that the lessskilled Andy only makes a
small number of attempts. It would be possible to use a weighted
mean, with weights proportional to the number of attempts, if
that was needed.
