## Likelihood and Maximum Likelihood Estimation |
|||||||||||||

HOME |
I'm planning a series of posts looking at what happens under the
hood when we analyse a data set using some of the estimation
functions in the `wiqid` package. I'll focus mainly on Bayesian
methods, but this first post will look at the likelihood, which
is used for both Bayesian analysis and maximum likelihood
estimation.We'll use a simple occupancy model. It has just two parameters and both must between 0 and 1. That means that we can plot all possible combinations of the two parameters in a simple two-dimensional graph. As we'll see we need to add a third dimension, but three is still manageable. ## The data set and modelThe data are detection/non-detection observation for
Blue Ridge two-line salamanders (
We assume that salamanders are present or absent at a site on all 5 occasions, and they were seen on < 5 occasions because probability of detection is < 1. We also assume that salamanders are not recorded when they are not present, ie, we have false absences but no false presences. Salamanders were seen at 18 sites out of 39 (46%), but many of the remaining 21 sites will have salamanders present too. We want to estimate the
probability that a site is occupied by salamanders, allowing for
the effect of imperfect detection. We need to jointly estimate
probability of occupancy (\(\psi\)) and
probability of detecting salamanders on a single visit if they
are present (\(\pi\)). The simplest model assumes that \(\psi\) is the same for all sites and We'll see a lot of probabilities in the next few pages, so I'll refer to the parameters as "occupancy" and "detection", and drop the "probability of..." bit. ## The likelihoodThe likelihood is the term for the probability of observing the actual data, \(\bf y\), given the model and specific parameter values, in this case \(\Pr({\bf y} | \psi, \pi)\). We can calculate this for each site with:
The first term is the
probability that the site is occupied AND we detected
salamanders on
Because this situation is very simple, we can actually plot the whole likelihood surface, for all values of (\(\psi\)) and (\(\pi\)) from 0 to 1. Using steps of 0.005 will give us a nice-looking plot, but we'll avoid the values 0 and 1, as the data would be impossible in those cases
Now we do a double loop to fill a 200 x 200 matrix with the values for the likelihood:
Likelihoods can get very small (ie, close to zero), as you
can see. There is a limit to the smallest number your computer
can handle: do Do a plot of the "terrain" and add contours:
The likelihood surface has an elongated "hill" in the
lower-right portion of the plot. We can find the highest point
of the hill, and that will give us the
## Finding the maximum likelihoodWith our simple model with 2 parameters limited to (0, 1), we can calculate the likelihood for 40,000 possible combinations and find the highest. With more parameters and parameters not limited to (0, 1), we soon run to billions of combinations. The usual way of finding the maximum likelihood is to use a
algorithm which climbs to the top of the hill as directly as
possible. We can do that with the `psi` and`pi` are passed as a two-element vector;`optim` has trouble with very small numbers, so we use the*log*(likelihood);- since
`optim` finds the minimum instead of the maximum, the function returns the*negative*log(likelihood).
The two values reported as ## The Nelder–Mead simplex algorithmSo how does The default algorithm is the Nelder-Mead or simplex algorithm. With our example, we are working in 2 dimensions, so our simplex is a triangle, with each vertex defined by a value for \(\psi\) and a value for \(\pi\). We'll look at multidimensional simplices later. The plot below shows the lower-right portion of our likelihood surface with a starting triangle around (0.5, 0.5).
Here's the Nelder-Mead algorithm to find the maximum likelihood:
Let's see how this works for our example data set:
There are a couple of extension steps as the algorithm heads
for the ridge. It overshoots and there's a contraction step
before it moves up along the ridge. Once the top of the hill is
inside the triangle, a series of contraction steps close in on
the maximum value. You can download an mp4 file with the
animation The code for the
examples on this page is ## Some odd points## Higher dimension simplicesIf you have more then 2 parameters, your simplex will be more ... complex. With 3 parameters, it's a tetrahedron, which you can probably imagine being reflected. With more, visualization is no longer possible for most people. With 10 parameters, each vertex is defined by a value for each parameter, and the simplex has 11 vertices. The process of finding the vertex with the lowest likelihood and reflecting, extending or contracting is analogous. ## Markov chainsThe Nelder-Mead algorithm generates a first-order Markov chain. Each step depends only on the outcome of the previous step, and is not affected by any of the steps before that. Just like the game of snakes and ladders (or "chutes and ladders"), where you go next depends only on where you are now and the next die roll; it doesn't matter how many ladders you've climbed or how many snakes you slid down before getting to your present position. Though perhaps I should add that, unlike snakes and ladders, and unlike MCMC, there is nothing random about the Nelder-Mead algorithm. But MCMC is the topic of a future post... ## ReferencesMacKenzie, D.I., Nichols, J.D., Royle, A.J., Pollock, K.H., Bailey, L.L., & Hines, J.E. (2006) Occupancy estimation and modeling : inferring patterns and dynamics of species occurrence Elsevier Publishing. | ||||||||||||

Updated 19 Feb 2015 by Mike Meredith |