When analysing data, we are trying to make inferences about a
population based on a sample. We often want to predict the value
of some variable of interest for individuals or sites not
included in the sample, based on known covariates. Information
criteria (such as AIC or AICc) aim to select the best model for
making accurate predictions. The first step in model selection
is based on our biological knowledge: we only include models
which are biologically plausible in the set we fit to the data.
But we often find that several plausible models give similar
values for the information criterion and there is doubt as to
which model we should use for prediction. A good solution then
is to generate predictions for several models and use a weighted
average.
Which models to include?
Some people include all the models in the candidate set,
which will likely do little harm, but a bit of thought can help.
As an example, we'll use a simulated data set in the devel
version of wiqid on GitHub:
remotes::install_github("mikemeredith/wiqid")
library(wiqid)
data(toves)
str(toves)
# 'data.frame': 200 obs. of 7 variables:
# $ y1: int 0 0 0 0 0 0 0 0 0 0 ...
# $ y2: int 0 0 0 0 0 0 0 0 0 0 ...
# $ y3: int 1 0 0 0 0 0 0 0 0 0 ...
# $ y4: int 0 0 0 0 0 0 0 0 0 0 ...
# $ x1: num 6.83 2.44 4.5 2.29 8.64 3.11 0.74 8.32 8.74 1.39 ...
# $ x2: num 8.43 6.11 10.26 6 5.16 ...
# $ x3: num 0.145 0.406 0.39 0.629 1.312 ...
The first 4 columns in the toves data frame are
detection/non-detection data for 4 visits to the 200 sites in
the sample. There are 3 covariates: x1 might be
elevation in 1000m, x2 the square root of distance
from settlement, and x3 the basal area of large trees.
We'll fit all 8 possible models using wiqid::occSS and look at the AICc of each:
allMod <- allCombinations(response="psi", covars=c("x1", "x2", "x3"))
models <- vector("list", 8)
for(j in 1:8)
models[[j]] <- occSS(DH, as.formula(allMod[[j]]), data=toves)
AICcs <- sapply(models, AICc)
names(AICcs) <- allMod
AICcs <- sort(AICcs)
data.frame(AICc = AICcs, delta = AICcs - min(AICcs))
# AICc delta
# psi ~ x1 + x2 632.8277 0.0000000
# psi ~ x1 633.0500 0.2223035
# psi ~ x1 + x2 + x3 634.5528 1.7251486
# psi ~ x1 + x3 634.6857 1.8579871
# psi ~ 1 668.5309 35.7032110
# psi ~ x2 670.0420 37.2142704
# psi ~ x3 670.3150 37.4872842
# psi ~ x2 + x3 671.8358 39.0081545
The models without x1 all have huge deltaAICc,
we don't need to consider those further.
The models with x3 all have a higher AICc than
the corresponding model without x3 . Here x3
is a "hitchhiker", those models are near the top of the AIC
table thanks to other variables in the model which are doing all
the work, but they are contributing nothing themselves.
Anderson (2008, p.65) calls such variables "pretending
variables". We can eliminate the x3 models from the
set.
The model with x1 and x2 is better than the model with x1
only, but the difference in AICc is small, much less than 2. So
we can't be sure that the top model is really the best. With a
slightly different sample, we could find that the x1-only model
is better. Model-averaged predictions will be more accurate than
those from either model alone.
Do model-average predictions
The wiqid package now has a function to produce
model-averaged predictions from multiple models:
predictAvg . Use ?predictAvg to see details.
First we refit the two models we are interested in:
m.1 <- occSS(DH, psi ~ x1, data=toves)
m.12 <- occSS(DH, psi ~ x1 + x2, data=toves)
AICtable(AICc(m.1, m.12))
# df AICc Delta ModelLik ModelWt
# m.12 4 632.828 0.000 1.000 0.528
# m.1 3 633.050 0.222 0.895 0.472
As an example, we'll get model-averaged predictions of
psi for the first 6 sites in the data set. You could of
course do all of them, or you could get predictions for all the
sites in the whole study area and produce a map. We just need to
get the relevant data into a data frame and pass it to
predictAvg as newdata :
newdata <- toves[1:6, ]
( psi.ma <- predictAvg(list(m.1, m.12), newdata, "psi", type="response") )
# est SE lowCI uppCI
# 1 0.8515831 0.13036260 0.4318010 0.9774376
# 2 0.2937862 0.07341358 0.1721334 0.4542412
# 3 0.6673564 0.22612654 0.2141285 0.9365958
# 4 0.2748484 0.06956039 0.1605406 0.4289563
# 5 0.9134818 0.07464093 0.6238355 0.9853414
# 6 0.3079147 0.07877618 0.1773493 0.4786724
Now let's compare graphically the model-averaged predictions
with those from the individual models.
psi.1 <- predict(m.1, newdata, parameter="psi", type="response")
psi.12 <- predict(m.12, newdata, parameter="psi", type="response")
# Plot point estimates and CIs
plot(1:10, MA[,'est'], xlab="Site number", ylab="psi", pch=16, cex=1.5,
las=1, ylim=0:1, xlim=c(0.5, 10.5))
arrows(1:10, MA[,'lowCI'], 1:10, MA[,'uppCI'], angle=90, length=0.03, code=3, lwd=2)
# Add values from psi.1 and psi.12
points(1:10 - 0.2, psi.1[,1], col='red')
arrows(1:10 - 0.2, psi.1[,3], 1:10 - 0.2, psi.1[,4],
angle=90, length=0.03, code=3, col='red')
points(1:10 + 0.2, psi.12[,1], pch=2, col='blue')
arrows(1:10 + 0.2, psi.12[,3], 1:10 + 0.2, psi.12[,4],
angle=90, length=0.03, code=3, col='blue')
As we would expect, the model-averaged predictions are
approximately half-way between the predictions of the individual
models (half-way as the weights of the 2 models, 0.53 and 0.47,
are almost equal). The confidence intervals are generally wider,
especially when the individual models give different
predictions; this reflects uncertainty in which model is best as
well as in the individual estimates of the parameters.
Plotting relationships
Another example, using genuinely new data rather than a
subset of the old data, produces plots showing the relationship
between the response and the two covariates.
Here's the code to do that:
range(toves$x1)
# [1] 0.07 9.99
x1 <- seq(0, 10, , 100)
range(toves$x2)
# [1] 0.46 10.26
x2 <- seq(0, 10, , 100)
newdata <- expand.grid(x1, x2)
colnames(newdata) <- c("x1", "x2")
head(newdata)
# x1 x2
# 1 0.0000000 0
# 2 0.1010101 0
# 3 0.2020202 0
# 4 0.3030303 0
# 5 0.4040404 0
# 6 0.5050505 0
# Get model averaged predictions
psi.ma <- predictAvg(list(m.1, m.12), newdata, "psi", type="response")
mat.ma <- matrix(psi.ma[, 'est'], 100, 100)
# Get predictions from the individual models
psi.1 <- predict(m.1, newdata, parameter="psi", type="response")
mat.1 <- matrix(psi.1[, 'est'], 100, 100)
psi.12 <- predict(m.12, newdata, parameter="psi", type="response")
mat.12 <- matrix(psi.12[, 'est'], 100, 100)
# Do the image plots
x11(width=7, height=2.73)
par(mfrow = c(1,3))
image(x1, x2, mat.1, asp=1, main="psi ~ x1")
box()
image(x1, x2, mat.ma, asp=1, main="Model average psi")
box()
image(x1, x2, mat.12, asp=1, main="psi ~ x1 + x2")
box()
Don't model-average parameters
This is frequently done and is the default for some software
packages. A key requirement is that "the parameter being
considered for model averaging must mean the same thing in each
model" (Anderson,
2008, p110). But a coefficient in a linear model is the rate
of change of the response for a unit change in the predictor
conditional on what other predictors are in the model.
Change the "other predictors" and you change the meaning of the
coefficient. See
Cade (2015) for the mathematical details. This is a
problem with real data sets because predictors will be
correlated. And it can be problematic even if the usual measures
of multicollinearity look ok. For example, we might have
elevation and distance from road as covariates. These are
correlated, because roads are generally built in the valleys.
Elevation might in truth have negligible effect on occupancy,
but distance from road is important. A model with elevation but
not road will show that elevation is important, not because the
species prefers high elevations but because sites at high
elevations are further from roads. The elevation covariate
provides information about distance from road. See
here
for an example where rainfall and temperature are highly
correlated, as is often the case in mountainous terrain. This
does not affect predictions of the response variable, so use
model averaging for predictions but not for model parameters. |