In a recent post I described the
limits on floating points numbers and the effect on
probabilities close to 0 or 1. This can cause errors in
manipulating probabilities, in particular in calculating
likelihoods, which can become very small. We get around that by
working with logs of probabilities. Multiplication is then easy
(just add the logs), but addition and subtraction are more
difficult. I proposed functions to add probabilities and to
calculate 1  p avoiding underflow or overflow. In the course
of applying these ideas to maximum likelihood estimation in the
wiqid package, a couple of new issues came up. We
will revisit logSumExp and develop new functions to
add together two vectors and to do matrix multiplication.
Summing probabilities
In the previous post we saw that converting log(p) back to
p
and adding [ie, log(sum(exp(log_p))) ] doesn't work
properly if the values of p are too small, and developed
the
following function to do this:
logSumExp < function(log_p) {
p_max < which.max(log_p)
log1p(sum(exp(log_p[p_max]  max(log_p)))) + max(log_p)
}
With our example data we have:
log_p < 745:760
exp(log_p) == 0
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[11] TRUE TRUE TRUE TRUE TRUE TRUE
logSumExp(log_p)
[1] 744.5413
Often we are dealing with probabilities equal to zero, for
example the probability that there are no frogs at a pond given
that we have seen frogs there. The function suggested above
works fine if there are a few zero values in the probabilities
and hence some Inf values in log_p .
But it fails if all the probabilities are zero:
p0 < rep(0, 5)
( log_p0 < log(p0) )
[1] Inf Inf Inf Inf Inf
logSumExp(log_p0)
[1] NaN
Recall that the trick is to subtract the largest value in
log_p from all the others, equivalent to dividing
all the probabilities by the largest value. If the largest is
zero, we get a dividebyzero error. If all the probabilities
are zero, clearly the sum is also zero, so we can avoid the
error with:
logSumExp < function(log_p) {
if(max(log_p) == Inf)
return(Inf)
p_max < which.max(log_p)
log1p(sum(exp(log_p[p_max]  max(log_p)))) + max(log_p)
}
logSumExp(log_p0)
[1] Inf
This check adds slightly to the execution time, but avoids
mysterious errors deep in the works.
Adding two vectors
I often need to add together two vectors of probabilities,
both stored as logs. For example, one vector applies to sites
given they are occupied, the other given they are not occupied.
Let's generate some data to play with:
( p1 < (1:9) / 10 )
[1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
( p2 < runif(9) * (1  p1) )
[1] 0.13170373 0.27850206 0.57787359 0.29211162 0.08092575
[6] 0.15465695 0.22973600 0.16997316 0.05788608
( add < p1 + p2 )
[1] 0.2317037 0.4785021 0.8778736 0.6921116 0.5809257
[6] 0.7546570 0.9297360 0.9699732 0.9578861
log_p1 < log(p1)
log_p2 < log(p2)
I could use logSumExp for each pair in a loop, but
that's inefficient. We can use the same trick  subtract the
largest, add up, then add back the largest  but using the
pmax and pmin functions:
logAddExp < function(logp1, logp2) {
bigger < pmax(logp1, logp2)
smaller < pmin(logp1, logp2)
log1p(exp(smaller  bigger)) + bigger
}
( logadd < logAddExp(log_p1, log_p2) )
[1] 1.46229573 0.73709475 0.13025268 0.36800804 0.54313233
[6] 0.28149200 0.07285460 0.03048688 0.04302642
all.equal(logadd, log(add))
[1] TRUE
This works well unless there are zeros in both the
probability vectors that you want to add up, then we get the
same dividebyzero error that we had with logSumExp :
p1[3] < 0
p2[3] < 0
log_p1 < log(p1)
log_p2 < log(p2)
( logadd < logAddExp(log_p1, log_p2) )
[1] 1.46229573 0.73709475 NaN 0.36800804 0.54313233
[6] 0.28149200 0.07285460 0.03048688 0.04302642
all.equal(logadd, log(add))
[1] "'is.NA' value mismatch: 0 in current 1 in target"
We can fix this by just returning Inf when the biggest value
is Inf . In fact, in my work, zeros turn up frequently, and we
can avoid the overhead of exp and log1p (which are
computationally expensive) if one of the vectors is zero; if the
smaller is Inf , the log(sum) is equal to
bigger :
logAddExp < function(logp1, logp2) { bigger < pmax(logp1, logp2)
smaller < pmin(logp1, logp2)
fix < smaller > Inf
bigger[fix] < log1p(exp(smaller[fix]  bigger[fix])) + bigger[fix]
return(bigger)
}
( logadd < logAddExp(log_p1, log_p2) )
[1] 1.46229573 0.73709475 Inf 0.36800804 0.54313233
[6] 0.28149200 0.07285460 0.03048688 0.04302642
all.equal(logadd, log(p1 + p2))
[1] TRUE
Matrix multiplication
The likelihood calculation for multiseason occupancy model
involves a series of matrix multiplications, alternating between
colonisation/extinction matrices and capture probability
matrices. (Thanks to Darryl MacKenzie for the elegant
algorithm.) But we could get into trouble with probabilities
close to 0 or 1, so maybe we should work with log(probabilities)
here too. How do we do matrix multiplication with logs of
matrices?
Let's first review how matrix multiplication works:
# Example matrices (not probabilities)
(A < matrix(1:6, 3, 2))
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
(B < matrix(11:18, 2, 4, byrow=TRUE))
[,1] [,2] [,3] [,4]
[1,] 11 12 13 14
[2,] 15 16 17 18
A %*% B
[,1] [,2] [,3] [,4]
[1,] 71 76 81 86
[2,] 97 104 111 118
[3,] 123 132 141 150
Check that we know how each element of the resulting matrix
is calculated:
X < matrix(NA, nrow(A), ncol(B))
for(i in 1:nrow(X))
for(j in 1:ncol(X))
X[i,j] < sum(A[i, ] * B[, j])
all.equal(X, A %*% B)
[1] TRUE
Now let's try that with logs. We'll need to add together the
logs and then use logSumExp instead of sum .
Here's our matrix multiplication function:
logMatMultExp < function(logA, logB) {
logX < matrix(NA_real_, nrow(logA), ncol(logB))
for(i in 1:nrow(logX))
for(j in 1:ncol(logX))
logX[i,j] < logSumExp(logA[i, ] + logB[, j])
return(logX)
}
Let's try it out:
( logA < log(A) )
[,1] [,2]
[1,] 0.0000000 1.386294
[2,] 0.6931472 1.609438
[3,] 1.0986123 1.791759
( logB < log(B) )
[,1] [,2] [,3] [,4]
[1,] 2.397895 2.484907 2.564949 2.639057
[2,] 2.708050 2.772589 2.833213 2.890372
log(A %*% B)
[,1] [,2] [,3] [,4]
[1,] 4.262680 4.330733 4.394449 4.454347
[2,] 4.574711 4.644391 4.709530 4.770685
[3,] 4.812184 4.882802 4.948760 5.010635
all.equal(logMatMultExp(logA, logB), log(A %*% B))
[1] TRUE
And now with values that look like they might be
probabilities, and we'll include 0 and 1:
A < matrix(sample(seq(0, 1, length=6)), 3, 2)
( logA < log(A) )
[,1] [,2]
[1,] 1.6094379 0.9162907
[2,] 0.5108256 0.2231436
[3,] 0.0000000 Inf
B < matrix(sample(seq(0, 1, length=8)), 2, 4)
( logB < log(B) )
[,1] [,2] [,3] [,4]
[1,] 0.5596158 0.8472979 0.3364722 0
[2,] 0.1541507 1.2527630 1.9459101 Inf
log(A %*% B)
[,1] [,2] [,3] [,4]
[1,] 0.78275934 1.6094379 1.6094379 1.6094379
[2,] 0.02817088 0.7221347 0.6109091 0.5108256
[3,] 0.55961579 0.8472979 0.3364722 0.0000000
all.equal(logMatMultExp(logA, logB), log(A %*% B))
[1] TRUE
We saw the objects with lots of zeros can cause problems;
let's try doing this with a matrix of zeros:
A < matrix(0, 3, 2)
( logA < log(A) )
[,1] [,2]
[1,] Inf Inf
[2,] Inf Inf
[3,] Inf Inf
A %*% B
[,1] [,2] [,3] [,4]
[1,] 0 0 0 0
[2,] 0 0 0 0
[3,] 0 0 0 0
all.equal(logMatMultExp(logA, logB), log(A %*% B))
[1] TRUE
And it even works if both matrices are all zeros.
Final touches
Let's include a check that the matrices are conformable, ie,
the columns of A match the rows of B, so that we get a proper
error message instead of a wrong result and loads of warnings:
logMatMultExp < function(logA, logB) {
if(ncol(logA) != nrow(logB))
stop("nonconformable matrices")
logX < matrix(NA_real_, nrow(logA), ncol(logB))
for(i in 1:nrow(logX))
for(j in 1:ncol(logX))
logX[i,j] < logSumExp(logA[i, ] + logB[, j])
return(logX)
}
Finally, let's try it out with values which
do overflow, though we can't check that we're getting the right
answers:
logp1 < plogis(seq(50, 50, length=6), log.p=TRUE)
logA < matrix(logp1, 3, 2)
logp2 < plogis(sample(seq(50, 50, length=8)), log.p=TRUE)
logB < matrix(logp2, 2, 4)
logMatMultExp(logA, logB)
[,1] [,2] [,3] [,4]
[1,] 4.539939e05 0.0008355769 49.99926 4.539890e05
[2,] 4.940512e10 0.0007901781 37.14364 9.388489e14
[3,] 4.939352e10 0.0007447453 17.14369 3.086479e16
