Model averaging with wiqid output

HOME When analysing data, we are trying to make inferences about a population based on a sample. We often want to predict the value of some variable of interest for individuals or sites not included in the sample, based on known covariates. Information criteria (such as AIC or AICc) aim to select the best model for making accurate predictions.

The first step in model selection is based on our biological knowledge: we only include models which are biologically plausible in the set we fit to the data. But we often find that several plausible models give similar values for the information criterion and there is doubt as to which model we should use for prediction. A good solution then is to generate predictions for several models and use a weighted average.

Which models to include?

Some people include all the models in the candidate set, which will likely do little harm, but a bit of thought can help.

As an example, we'll use a simulated data set in the devel version of wiqid on GitHub:

remotes::install_github("mikemeredith/wiqid")
library(wiqid)
data(toves)
str(toves)
# 'data.frame':   200 obs. of  7 variables:
#  $ y1: int  0 0 0 0 0 0 0 0 0 0 ...
#  $ y2: int  0 0 0 0 0 0 0 0 0 0 ...
#  $ y3: int  1 0 0 0 0 0 0 0 0 0 ...
#  $ y4: int  0 0 0 0 0 0 0 0 0 0 ...
#  $ x1: num  6.83 2.44 4.5 2.29 8.64 3.11 0.74 8.32 8.74 1.39 ...
#  $ x2: num  8.43 6.11 10.26 6 5.16 ...
#  $ x3: num  0.145 0.406 0.39 0.629 1.312 ...

The first 4 columns in the toves data frame are detection/non-detection data for 4 visits to the 200 sites in the sample. There are 3 covariates: x1 might be elevation in 1000m, x2 the square root of distance from settlement, and x3 the basal area of large trees.

We'll fit all 8 possible models using wiqid::occSS and look at the AICc of each:

allMod <- allCombinations(response="psi", covars=c("x1", "x2", "x3"))
models <- vector("list", 8)
for(j in 1:8)
  models[[j]] <- occSS(DH, as.formula(allMod[[j]]), data=toves)
AICcs <- sapply(models, AICc)
names(AICcs) <- allMod
AICcs <- sort(AICcs)
data.frame(AICc = AICcs, delta = AICcs - min(AICcs))
#                        AICc      delta
# psi ~ x1 + x2      632.8277  0.0000000
# psi ~ x1           633.0500  0.2223035
# psi ~ x1 + x2 + x3 634.5528  1.7251486
# psi ~ x1 + x3      634.6857  1.8579871
# psi ~ 1            668.5309 35.7032110
# psi ~ x2           670.0420 37.2142704
# psi ~ x3           670.3150 37.4872842
# psi ~ x2 + x3      671.8358 39.0081545

The models without x1 all have huge deltaAICc, we don't need to consider those further.

The models with x3 all have a higher AICc than the corresponding model without x3. Here x3 is a "hitchhiker", those models are near the top of the AIC table thanks to other variables in the model which are doing all the work, but they are contributing nothing themselves. Anderson (2008, p.65) calls such variables "pretending variables". We can eliminate the x3 models from the set.

The model with x1 and x2 is better than the model with x1 only, but the difference in AICc is small, much less than 2. So we can't be sure that the top model is really the best. With a slightly different sample, we could find that the x1-only model is better. Model-averaged predictions will be more accurate than those from either model alone.

Do model-average predictions

The wiqid package now has a function to produce model-averaged predictions from multiple models: predictAvg. Use ?predictAvg to see details.

First we refit the two models we are interested in:

m.1 <- occSS(DH, psi ~ x1, data=toves)
m.12 <- occSS(DH, psi ~ x1 + x2, data=toves)
AICtable(AICc(m.1, m.12))
#      df    AICc Delta ModelLik ModelWt
# m.12  4 632.828 0.000    1.000   0.528
# m.1   3 633.050 0.222    0.895   0.472

As an example, we'll get model-averaged predictions of psi for the first 6 sites in the data set. You could of course do all of them, or you could get predictions for all the sites in the whole study area and produce a map. We just need to get the relevant data into a data frame and pass it to predictAvg as newdata:

newdata <- toves[1:6, ]

( psi.ma <- predictAvg(list(m.1, m.12), newdata, "psi", type="response") )
#         est         SE     lowCI     uppCI
# 1 0.8515831 0.13036260 0.4318010 0.9774376
# 2 0.2937862 0.07341358 0.1721334 0.4542412
# 3 0.6673564 0.22612654 0.2141285 0.9365958
# 4 0.2748484 0.06956039 0.1605406 0.4289563
# 5 0.9134818 0.07464093 0.6238355 0.9853414
# 6 0.3079147 0.07877618 0.1773493 0.4786724

Now let's compare graphically the model-averaged predictions with those from the individual models.

psi.1 <- predict(m.1, newdata, parameter="psi", type="response")
psi.12 <- predict(m.12, newdata, parameter="psi", type="response")

# Plot point estimates and CIs
plot(1:10, MA[,'est'], xlab="Site number", ylab="psi", pch=16, cex=1.5, 
    las=1, ylim=0:1, xlim=c(0.5, 10.5))
arrows(1:10, MA[,'lowCI'], 1:10, MA[,'uppCI'], angle=90, length=0.03, code=3, lwd=2)
# Add values from psi.1 and psi.12
points(1:10 - 0.2, psi.1[,1], col='red')
arrows(1:10 - 0.2, psi.1[,3], 1:10 - 0.2, psi.1[,4],
    angle=90, length=0.03, code=3, col='red')
points(1:10 + 0.2, psi.12[,1], pch=2, col='blue')
arrows(1:10 + 0.2, psi.12[,3], 1:10 + 0.2, psi.12[,4],
    angle=90, length=0.03, code=3, col='blue')

As we would expect, the model-averaged predictions are approximately half-way between the predictions of the individual models (half-way as the weights of the 2 models, 0.53 and 0.47, are almost equal). The confidence intervals are generally wider, especially when the individual models give different predictions; this reflects uncertainty in which model is best as well as in the individual estimates of the parameters.

Plotting relationships

Another example, using genuinely new data rather than a subset of the old data, produces plots showing the relationship between the response and the two covariates.

Here's the code to do that:

range(toves$x1)
# [1] 0.07 9.99
x1 <- seq(0, 10, , 100)
range(toves$x2)
# [1]  0.46 10.26
x2 <- seq(0, 10, , 100)
newdata <- expand.grid(x1, x2)
colnames(newdata) <- c("x1", "x2")
head(newdata)
#          x1 x2
# 1 0.0000000  0
# 2 0.1010101  0
# 3 0.2020202  0
# 4 0.3030303  0
# 5 0.4040404  0
# 6 0.5050505  0

# Get model averaged predictions
psi.ma <- predictAvg(list(m.1, m.12), newdata, "psi", type="response")
mat.ma <- matrix(psi.ma[, 'est'], 100, 100)

# Get predictions from the individual models
psi.1 <- predict(m.1, newdata, parameter="psi", type="response")
mat.1 <- matrix(psi.1[, 'est'], 100, 100)
psi.12 <- predict(m.12, newdata, parameter="psi", type="response")
mat.12 <- matrix(psi.12[, 'est'], 100, 100)

# Do the image plots
x11(width=7, height=2.73)
par(mfrow = c(1,3))
image(x1, x2, mat.1, asp=1, main="psi ~ x1")
box()
image(x1, x2, mat.ma, asp=1, main="Model average psi")
box()
image(x1, x2, mat.12, asp=1, main="psi ~ x1 + x2")
box()

Don't model-average parameters

This is frequently done and is the default for some software packages. A key requirement is that "the parameter being considered for model averaging must mean the same thing in each model" (Anderson, 2008, p110). But a coefficient in a linear model is the rate of change of the response for a unit change in the predictor conditional on what other predictors are in the model. Change the "other predictors" and you change the meaning of the coefficient. See Cade (2015) for the mathematical details.

This is a problem with real data sets because predictors will be correlated. And it can be problematic even if the usual measures of multicollinearity look ok.

For example, we might have elevation and distance from road as covariates. These are correlated, because roads are generally built in the valleys. Elevation might in truth have negligible effect on occupancy, but distance from road is important. A model with elevation but not road will show that elevation is important, not because the species prefers high elevations but because sites at high elevations are further from roads. The elevation covariate provides information about distance from road.

See here for an example where rainfall and temperature are highly correlated, as is often the case in mountainous terrain.

This does not affect predictions of the response variable, so use model averaging for predictions but not for model parameters.

Updated 10 December 2020 by Mike Meredith