Mike Meredith's home page 


Welcome to my new home page  mmeredith.net  please update your bookmarks  
Storing MCMC output in R
with Different packages have different ways of storing output from
MCMC runs: (1) 
26 April 2020, updated 10 June 2020 
This post uses the results of analysis of data provided by Rajan Amin of the Zoological Society of London (ZSL). The study deployed 44 detectors for 130 days and detected 22 animals.
27 Feb 2020 
n.eff
, use MC errorA common way to do this is to calculate the
effective sample size (ESS), denoted n.eff
, and
check that it exceeds some ruleofthumb value such as 10,000.
This seems a long detour compared with looking directly at the
Monte Carlo error.
18 Feb 2020 
Code is meant to be read by humans. "Any fool can write code that a computer can understand. Good programmers write code that humans can understand." Martin Fowler.
24 Dec 2019 
runif
functionrunif
function. We use these a lot for simulations, and (my immediate
concern) for randomwalk MCMC samplers. The questions I wanted
to answer were:runif(n, 0, 1)
produce
values equal to 0 or 1? (Never.)runif
have two values that are exactly equal? (Yes.)set.seed
,
will I always get the same sequence of numbers? (Yes,
provided you also use the same RNG kind.)set.seed
,
will I get different sequences? (Not guaranteed.)
22 Dec 2019 
The answer will depend on the situation, and specifically on what are the consequences if the point estimate differs from the true value. We'll explore this with a simple example.
20 Nov 2019 
Ok, so we "do" the diagnostics, but what exactly are we looking for, and what messages to draw for fixing problems. I want a "field guide" to diagnostic checks, especially the plots which give us most information. This page is a start, but it doesn't include all the issues that I've seen but didn't capture images of the plots.
4 Aug 2019; updated 4 Sept 2020 
They provided the code in their supplementary materials, but that is tailored to their study of 26 species of fish at 113 sites in a river basin in Colorado. This is an attempt to simplify it and make it easier for other researchers to adapt it to their own study.
19 July 2019 
Suppose we set out camera traps in our study area with the intention of using the proportion of days when we detected our target species as an index of abundance. This is sometimes called the Relative Abundance Index, RAI; I'm not sure how useful it is in general, but let's assume it works for the species we are interested in. Our toy data set has only 9 cameras, and they were put out for 2 months. Four of the cameras operated properly for all 60 days, but four of them failed after only a few days and one didn't work at all. Can we draw useful inferences about the sites with hardly any data?
14 July 2019 
raster
' package has a plot method for Raster
objects that
has many options to customise the plot. Unfortunately the
documentation is in two places, and some arguments are not
documented at all. Here I'll cover the arguments which you can
use when plotting a RasterLayer
object.
12 July 2019 
2 July 2019 
21 June 2019 
GOF looks at the fit of the model to the same data set used to estimate parameters. We are not concerned here with selecting models with the best predictive ability. In general, the fit improves as we add more parameters, and GOF is usually done with the full model, the one with the maximum number of parameters.
21 June 2019; updated 18 Aug 2019 
dunif(5, 5)
,
at least for preliminary runs and exploration, as it makes some
problems easier to spot in diagnostic plots. And ease of
spotting is important if you have dozens or hundreds
of parameters to check.Uniform priors are difficult to justify  how come a value of 4.999 is perfectly plausible but 5.0001 is impossible?  so should probably be replaced with a prior that tails off for final analysis and reporting or publication, but they are good for diagnosis.
1 June 2019 
A key
point is that JAGS can use vectors and matrices (WinBUGS can't)
which makes your code run faster.
9 March 2019; updated 31 Dec 2019 
I am fortunate in having two colourblind colleagues who will quickly point out any plots I produce which aren't clear to them.
We talk about colourblindfriendly plots in our workshops, and here are some of the tips we pass on...
5 December 2018; updated 6 July 2020 
I've just been using Google to find basic information about key concepts in R, and found it difficult to find good sources. Many of the pages go into unnecessary detail, assume a background in programming in other languages, or use abstruse examples.
So some recommendations seem to be in order.
6 August 2018 
In this post I want to explore what is "closure" in SCR, why it is important, and what happens if it is violated. I'll also explore some ideas for open population models which can be used to mitigate closure issues.
26 May 2018 
overlap
packageoverlap
package (0.3.2) is now
available on CRAN, together with binary builds for Windows and
Mac. A minor fix for getBandWidth
means it works
more often with weird data sets, and there's a new function,
sunTime
, which maps clock time to sunrise and
sunset.Update 10 Sept 2019: See the
activity
package and this new paper for better
methods: Vazquez, C., Rowcliffe, J.M., Spoelstra, K., & Jansen,
P.A. (2019) Comparing diel activity patterns of wildlife across
latitudes and seasons: Time transformations using day length.
Methods in Ecology and Evolution,
doi: 10.1111/2041‐210X.13290.
Animal activity patterns are in fact aligned to daylight intensity  in particular, sunrise, sunset and zenith  not to the time on your clock or GPS unit or camera (Nouvellet et al, 2012). Since the times of these events vary with the time of year, using clock time instead of sun time can lead to wrong inferences. Nouvellet et al (2012) give the example of hunting behaviour of African wild dogs: impala are almost always taken before sunset and kudu after sunset, but this effect disappears if time is measured relative to 6pm instead of sunset.
12 May 2018; updated 10 Sept 2019 
saveJAGS
packagesaveJAGS
package. It has
certainly changed the way I do things. It's a wrapper for
rjags
which periodically saves the MCMC chains to files.
The advantages of that are:
3 April 2018, last updated 10 June 2020 
17 March 2018 
25 December 2017 
In R, the class factor
is used for categorical variables. A factor has a levels
attribute with the names of the categories, in alphabetical
order by default, and the values are stored as integers giving
the level number.
I'll generate some simulated data for a logistic regression model (as that generalises to a lot of our other models). We have four players with different degrees of skill and a continuous covariate which does nothing.
20 December 2017 
In the simplest model, the number available for detection at site \( i \), \( N_i \), is modelled as being drawn from a Poisson distribution with parameter \( \lambda \). This is the biological model. The observation model assumes that detection of each individual is a Bernoulli trial, individuals being detected with probability \( r \). The data do not show which individuals are detected, just whether the species was detected, and that is recorded if at least one individual is observed.
27 November 2017; updated 3 Sept 2018 
In the course
of applying these ideas to maximum likelihood estimation in the
wiqid
package, a couple of new issues came up. We
will revisit logSumExp
and develop new functions to
add together two vectors and to do matrix multiplication.
31 October 2017 
If each animal produces p signs per day and signs remain visible for t days, then sign density will be:
S = D x p x t
where D is the density of animals. If we can estimate p and t, we can calculate D from S. In this post I want to provide code for estimating the persistence time of signs for retrospective studies of decay.
7 October 2017 
The solution is to work with logarithms of probabilities instead of the actual values [log(p) instead of p], eg, we routinely work with loglikelihoods. Multiplying probabilities is then simply a matter of adding up the logs. But sometimes we need to add up probabilities or calculate the complement, 1  p, and we need to do that without falling in the 0 abyss or smashing into the 1 wall.
4 August 2017 
Here's a simple example: we have 3 sites, visited 4 times per year for 2 years. This is usually shoehorned into a table with 8 columns for the visits, like this:
occasion
site 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4
A 0 1 1 3 2 0 1 1
B 1 0 1 0 3 0 2 2
C 5 2 1 1 3 4 2 1
These look like counts, but the data could be detection/nondetection (0/1) data, wind speed at each visit, or the name of the observer.
24 February 2017 
This was based on simulations with just one set of parameter values, but it does suggest that projects with limited resources should consider using unpaired cameras, as this allows more locations to be sampled.
Update 5 Nov 2018: See this new paper by Ben Augustine et al (2018) Spatial capture–recapture with partial identity: An application to camera traps, Annals of Applied Statistics 12 (1) 6795
5 February 2017, updated 17 April 2018, 5 Nov 2018 
24 January 2017 
jagsUI
is only 100 iterations, which was woefully
inadequate for my model. We should be specifying big numbers for
n.adapt
and small
numbers, or even zero, for n.burnin
.
27 December 2016 
17 October 2016 
7 October 2016 
secr
package to the format needed to run in JAGS or WinBUGS/OpenBUGS.
[Rewritten 28 June 2019 to work with version 0.1.0 of the package.]
7 August 2016, rewritten 28 June 2019 
10 June 2016, updated 16 July 2018 
People seem to run into problems with different versions of the Mac OS, R, JAGS and the rjags package. The only way to stay sane is to use recent versions of all four.
From the Apple menu, choose About This Mac; the version number appears below the name. Note whether you have v. 10.11 (El Capitan) or later. If you have an earlier version, upgrade your OS before going further.
16 Jan 2016, updated 30 Oct 2017 
BEST
and wiqid
packages with the new
version of JAGS on Ubuntu. It took me a while, but I finally
found a simple way to do this which might be of interest to
others.I already had R and JAGS 3 installed, together with
the rjags
package version 3. Installing the rjags
package within
R (or updating it with update.packages()
) installs the new
version of rjags
, v.4, which requires JAGS 4 and throws an error
if it isn't found. But the Ubuntu repository still has JAGS 3,
so you cannot update JAGS with Ubuntu Software Center.
29 Dec 2015 
The idea for the sampler was developed by Nicholas Metropolis and colleagues in a paper in 1953. This was before the Gibbs sampler was proposed, but it uses the same idea of updating the parameters one by one. A better name would be "componentwise random walk Metropolis sampler". The rules for the random walk ensure that a large number of samples will be a good description of the posterior distribution.
5 March 2015 
Gibbs sampling works if we can describe the posterior for each parameter if we know all the other parameters in the model.
27 February 2015 
As our example, we'll use estimation of detection probability from data for repeat visits to a site which is known to be occupied by our target species. First, we'll describe the beta distribution, then see how that can be combined with our data. A discussion of priors will follow, and we'll finish with brief descriptions of conjugate priors for other types of data.
25 February 2015 
wiqid
package. I'll focus mainly on Bayesian
methods, but this first post will look at the likelihood, which
is used for both Bayesian analysis and maximum likelihood
estimation.We'll use a simple occupancy model. It has just two parameters and both must between 0 and 1. That means that we can plot all possible combinations of the two parameters in a simple twodimensional graph. As we'll see we need to add a third dimension, but three is still manageable.
12 February 2015 
Currently it has functions for estimating occupancy, abundance from closed captures, density from spatial capturerecaptures, and survival from markrecapture data, plus a slew of functions for species richness and alpha and beta diversity.
It is intended to be used for (1) simulations and bootstraps, (2) teaching, and (3) introducing Bayesian methods. And it should work on all platforms: Windows, Linux, and Mac.
5 January 2014 
23 December 2013 
secr
package take care of the former. Bayesian
analysis with the usual workhorses, WinBUGS, OpenBUGS and JAGS,
is straightforward if the traps are laid out in a large area of
homogenous habitat.Faced with patches of suitable habitat surrounded by inhospitable terrain, or a large extent of habitat punctuated with patches of nonhabitat, we had the choice of ML methods or one of the packages designed specifically for Bayesian SECR analysis, such as SPACECAP or SCRbayes. But then we are limited to the range of models provided by package authors: we don't have the flexibility to specify our own models that comes with WinBUGS, OpenBUGS or JAGS.
Here I present a way to incorporate patchy habitat into a BUGS/JAGS model specification.
22 Sept, updated 5 Nov 2013 
We start off with simple spinners representing a uniform distribution over a range from, say, 0 to 0.5. We discuss the problems of attaching a probability to an exact value, which leads to probability of a range of values and hence probability density.
15 September 2013 
Before the advent of SECR methods, putting all your traps into a single cluster with minimal perimeter length made sense, as you needed to estimate the area trapped animals came from to get a density. SECR estimates density directly, without needing to estimate area, so a single, large cluster may no longer be advantageous.
26 August 2013 
I've seen this asserted a couple of times, in particular in Tobler and Powell (2013, p.110), and I've myself drawn circular home ranges when discussing the interpretation of the capture parameters, but I don't think it is a necessary assumption.
Update: See the new paper by Murray Efford (2019) "Noncircular home ranges and the estimation of population density" Ecology, 100, e02580.
8 August 2013, updated 3 March 2019 
Capturerecapture methods (also know as markrecapture or capturemarkrecapture) have been used to estimate the size of animal populations for many years: the first software package for analysis of this kind of data, CAPTURE (Otis et al 1978), is now 35 years old. Early methods did not use the spatial component in the data, the capture locations, and spatially explicit capturerecapture models (SECR, or just spatial capturerecapture, SCR) first appeared in 2004 (Efford 2004).
7 August 2013 
The idea is to provide an R function which is as easy to use as t.test but which gives not a mere pvalue but the kind of output Bayesians are used to  posterior probability distributions. John's BESTmcmc function uses JAGS, but handles all the preliminaries automatically and produces a result in a simple format.
9 June 2013 
As soon as cameras with "data backs" came along in the early 90s, biologists realised that they could harvest data on the activity patterns of rare, secretive forest animals. Were they diurnal, nocturnal, crepuscular, or maybe cathemeral (active all around the clock)? More recently, people have tried to get clues about how species interact  competition or preypredator interactions  from activity patterns, by examining the extent of overlap.
In our corner of the biological world, Martin Ridout and Matt Linkie published a paper (2009) on the activity patterns of tigers, clouded leopards and golden cats in Sumatra, with a lot of technical detail on how overlap could be quantified and confidence intervals estimated. They followed up (2011) with a paper on tigers and their prey, also in Sumatra. ...
5 June 2013 
This is often a silly question: the means of real populations are almost always different, even if the difference is microscopic. More useful would be to estimate the difference and the probability that it is big enough to be of practical importance. See the BEST software for a way to do this in R.
Sometimes we are presented with confidence intervals for each of the means. This happens in particular with the standard packages we use for wildlife data analysis, where the output includes confidence intervals for each coefficient or real value. Can we infer evidence of a difference from confidence intervals in the same way as for a pvalue from a test of significance?
26 March 2013 
Sometimes people I talk to are worried because their data aren't normally distributed, and they believe that they can't use the usual techniques such as ttests or ANOVA without first transforming the data to be normal, or they must resort to nonparametric methods.
There are many good reasons for transforming data or NOT using t tests or F tests, but nonnormal data is not usually one of them!
21 Feb 2013 
A couple of people on a recent workshop had trouble with their AVG antivirus software when installing JAGS 3.3.0.
This appears to be due to AVG's paranoia: see
Martyn Plummer's comment. No malware is detected by McAfee
AntiVirus Plus or Trend Micro Office Scan. See also information
on false positives at the
AVG forum.
8 Feb 2013 
In ecology and wildlife studies, a lot of our spatial data takes the form of rasters rather than vector files. When you first add a raster in QGIS, you usually get a plain grey rectangle, or maybe just a grey outline on a white background, as most raster file formats have no styling information. To make sense of a raster, you need to change the style.
Here I'll give some hints for "quickanddirty" styling to display the contents of a raster. For a more detailed tutorial, see here.
18 Dec 2012 
In a recent post, I showed how to deal with "distance from..." data in GIS layers using the R packages for handling spatial information. The example I used there involved
1. producing a layer with distancefromnearestroad as a habitat layer so that we can calculate a probability of occupancy layer, and
2. extracting distancefromnearestroad for each of our cameras in order to model our data.
Here we will see how to do the same thing in QGIS.
16 Dec 2012 
At our recent workshop on Geographical Information Systems (GIS) using Quantum GIS we had a number of people interested in working with radio telemetry or GPS data to model animal home ranges. The home range plugin for QGIS doesn't work with current versions, at least with Windows.
It is designed to pass data to R and get the adehabitat package to do the home range estimation and pass the result back to QGIS. QGIS uses Python code, and to get it to talk to R requires a bit of software called "RPy2". This was always difficult to set up on Windows, but Python has been upgraded and RPy2 no longer works. In any case, the adehabitat package has been replaced by new packages with a wider ranger of options.
So now it's better to prepare spatial data in QGIS, read the files into R, process with adehabitatHR, write the results to new files, and load into QGIS.
13 Dec 2012 updated 5 Feb 2018 
We recently ran a workshop on Geographical Information Systems (GIS) using Quantum GIS for ecologists and wildlife researchers. For many species, distance from water, a road, forest edge, or a settlement may be an important habitat variable.
For example, we may be using automatic cameras to investigate occupancy of sites by leopards. Probability of occupancy may depend on distance from the nearest road. Given vector layers with roads and camera locations, we want to do two things:
1. produce a layer with distancefromnearestroad as a habitat layer so that we can calculate a probability of occupancy layer, and
2. extract distancefromnearestroad for each of our cameras in order to model our data.
16 Dec 2012, updated 14 Nov 2017 
I sometimes need to put formulae into my web pages, and I've been exploring the use of MathJax.
In the past I've inserted the formula into MS Word with MS Equation 3.0, doing a screen capture, cropping the image to the formula I want, saving as a .GIF file, and then displaying it on the web page as an image. So I get something like this for the Poisson distribution:
That's not ideal. If I want to change anything, I have to start all over again from Word. It's also messy if I want to put something like into the text; for a start it doesn't line up properly. MathJax allows me to type the formula in LaTeX style directly into the HTML code for my web page.
10 Dec 2012 
I have a collection of data sets for use during workshops or just to play with when trying out new statistical techniques or computer code.
A big question is what format to use, and I've changed my mind on this several times already!
After looking at this blog post by John Mount I've decided to try using tabseparated files with a .tsv extension.
8 Dec 2012 