Different packages have different ways of storing output from
MCMC runs: (1) |
(details below). Each has its own strengths, and some output
objects include more than copy of the results in different
R2WinBUGS output has (2), (3) and (4)
jagsUI output has (1) and (4). With many
parameters monitored, keeping multiple copies of the MCMC chains
is not a good use of R's limited memory.
In this post I will
describe a class for storing MCMC chains which offers the
advantages of each of the above while only storing one copy.
This is already implemented in the
previous post we looked at GOF tests for occupancy models.
Here we'll look at similar tests for spatially-explicit
capture-recapture data, and go on to other posterior predictive
package, and I propose to include it in the next versions of
wiqid, with the intention of making it the default
wiqid's MCMC functions. It's in the CRAN
which you can install with
This post uses the results of analysis of data
provided by Rajan Amin of the Zoological Society of London (ZSL).
The study deployed 44 detectors for 130 days and detected 22
When checking the output of Bayesian MCMC model fitting, we need
to ensure that we have enough draws from the posterior
distribution to get good summary statistics - mean, median and
A common way to do this is to calculate the
effective sample size (ESS), denoted
I wanted to refer to Google's original style guide for R, but
is now just a few disagreements with the
Style Guide. The latter seems to be overly prescriptive
or only relevant if you are creating a package, so here's
my own "just the essentials" style guide for JAGS code
as well as R.
check that it exceeds some rule-of-thumb value such as 10,000.
This seems a long detour compared with looking directly at the
Monte Carlo error.
Code is meant to be read by humans. "Any fool
can write code that a computer can understand. Good programmers
write code that humans can understand."
I've just been looking at the details of random number
generators (RNGs), in particular R's
function. We use these a lot for simulations, and (my immediate
concern) for random-walk MCMC samplers. The questions I wanted
to answer were:
The output of a Bayesian analysis is a complete description of
the posterior probability distribution of each of the parameters
we are interested in. But mostly we need to summarise the
distribution with a single number, a "point estimate". We can
use the posterior mean, median or mode. Which is the best?
- How often does
runif(n, 0, 1) produce
values equal to 0 or 1? (Never.)
- Can a sequence of random numbers from
have two values that are exactly equal? (Yes.)
- If I plug in the same value for
will I always get the same sequence of numbers? (Yes,
provided you also use the same RNG kind.)
- If I plug in different values for
will I get different sequences? (Not guaranteed.)
answer will depend on the situation, and specifically on what
are the consequences if the point estimate differs from the
true value. We'll explore this with a simple example.
It's important to check your MCMC results: theory guarantees
that the output from an MCMC run will eventually converge
on the right posterior distribution, but for a less-than-infinite run we have to be careful.
Checks will often show failure to converge, but will not tell us
definitely that convergence has occurred. Probability of
detecting a problem is less than 1! But running long
chains and many chains will help.
Ok, so we "do" the
diagnostics, but what exactly are we looking for, and what
messages to draw for fixing problems. I want a "field
guide" to diagnostic checks, especially the plots which give us
most information. This page is a start, but it doesn't include
all the issues that I've seen but didn't capture images of the
The issue of model selection for complex Bayesian models is
still very much open, so I was interested in a paper by
Broms, Mevin Hooten and Ryan Fitzpatrick (2016, Ecology).
There is good stuff there about using information criteria to
compare models, but in this post I want to focus on their use of cross-validation. Most of the goodness-of-fit measures we use do
not work for individual trials, ie, 1/0 data, and trials need to
be aggregated so that the data are of the form: y
successes in n trials. The scoring methods used by Broms
et al do work with single trials.
They provided the code in their supplementary
materials, but that is tailored to their study of 26 species of
fish at 113 sites in a river basin in Colorado. This is an attempt to simplify
it and make it easier for other researchers to adapt it to their
How does hierarchical help in getting sensible estimates for
cases where very few data are available? We'll explore that
here, using the concept of "pooling" data from multiple cases.
Suppose we set out camera traps in our study area with the
intention of using the proportion of days when we detected our
target species as an index of abundance. This is sometimes
called the Relative Abundance Index, RAI; I'm not sure how
useful it is in general, but let's assume it works for the
species we are interested in. Our toy data set has only 9 cameras, and they were put out
for 2 months. Four of the cameras operated properly for all 60
days, but four of them failed after only a few days
and one didn't work at all. Can we draw useful inferences about
the sites with hardly any data?
raster' package has a plot method for
Raster objects that
has many options to customise the plot. Unfortunately the
documentation is in two places, and some arguments are not
documented at all. Here I'll cover the arguments which you can
use when plotting a
I've added extra functions to the 'makeJAGSmask' package since
it first appeared in August 2016 - see the old post
here for details of the original functions. It's now
possible to create a mask from a raster object and to use the
values of the raster as a spatial covariate. You can also define
a 'core' area smaller than the habitat mask and get an estimate
of the number of activity centres in the core.
Spatial capture-recapture (SCR) - also known as
spatially-explicit capture recapture (SECR) - is the accepted
way to estimate density of individually-identifiable animals.
The analysis can be done with maximum likelihood (ML) methods or
with Bayesian (usually MCMC) methods, but these two methods use
very different models.
We talk about "fitting a model" to data, but how well does our
model actually fit? Assessing model fit, or Goodness-Of-Fit (GOF)
checks are important. Sometimes
the fit is embarrassingly poor, often an incentive to improve the model.
GOF measures rarely indicate how to improve, and we'll look at
more specific checks in a later post.
GOF looks at the fit of
the model to the same data set used to estimate parameters. We
are not concerned here with selecting models with the best
predictive ability. In general, the fit improves as we add more
parameters, and GOF is usually done with the full model, the one
with the maximum number of parameters.
If you are estimating a probability based on covariates you will
likely use a logistic model. That includes practically all our
occupancy, SCR and survival models. I like to use a uniform
prior for regression coefficients, usually
at least for preliminary runs and exploration, as it makes some
problems easier to spot in diagnostic plots. And ease of
spotting is important if you have dozens or hundreds
of parameters to check.
Uniform priors are difficult to
justify - how come a value of 4.999 is perfectly plausible but
5.0001 is impossible? - so should probably be replaced with a
prior that tails off for final analysis and reporting or
publication, but they are good for diagnosis.
At the start of the recent "Bayes with JAGS" workshop, we wrote
down some ideas to try to make the code we give to participants
more consistent: a sort of style guide for JAGS. Once you stop
writing WinBUGS-compatible code, lot of options open up.
point is that JAGS can use vectors and matrices (WinBUGS can't)
which makes your code run faster.
About 5% of men and a small percentage
of women are 'colour-blind', meaning that they cannot
distinguish certain colours, usually green vs red but more
rarely blues and yellows. When we use colours in plots in R, we
should make them intelligible to colour-blinds too.
I am fortunate in having two colour-blind colleagues who will
quickly point out any plots I produce which aren't clear to
We talk about colour-blind-friendly plots in our workshops, and
here are some of the tips we pass on...
So you want to learn R? Well, there's lots of resources for
learning R on the internet. But perhaps too much, so unless you
have a specific question or a specific error message you can
enter into the search engine, you can spend a lot of time
searching for needles in haystacks.
I've just been using
Google to find basic information about key concepts in R, and
found it difficult to find good sources. Many of the pages go
into unnecessary detail, assume a background in programming in
other languages, or use abstruse examples.
So some recommendations seem to be in order.
Passive detectors can be left out for long periods, providing more data on each animal captured and thus
giving better estimates
of detection parameters in spatial capture-recapture (SCR)
studies. But with long study periods, the risk of animals coming and going or changing
activity centres (ACs) increases.
In this post I want to explore what is "closure" in SCR, why
important, and what happens if it is violated. I'll also explore some ideas for open population models
which can be used to mitigate closure issues.
new version of the
overlap package (0.3.2) is now
available on CRAN, together with binary builds for Windows and
Mac. A minor fix for
getBandWidth means it works
more often with weird data sets, and there's a new function,
sunTime, which maps clock time to sunrise and
Update 10 Sept 2019: See the
activity package and this new paper for better
methods: Vazquez, C., Rowcliffe, J.M., Spoelstra, K., & Jansen,
P.A. (2019) Comparing diel activity patterns of wildlife across
latitudes and seasons: Time transformations using day length.
Methods in Ecology and Evolution,
Animal activity patterns are in fact aligned to
daylight intensity - in particular, sunrise, sunset and zenith -
not to the time on your clock or GPS unit or camera (Nouvellet
et al, 2012). Since the times of these events vary with the
time of year, using clock time instead of sun time can lead to
wrong inferences. Nouvellet et al (2012) give the example of
hunting behaviour of African wild dogs: impala are almost always
taken before sunset and kudu after sunset, but this effect
disappears if time is measured relative to 6pm instead of
you ever started a long JAGS run and wished you had some
indication of progress or could peek at the results so far?
That's possible with the
saveJAGS package. It has
certainly changed the way I do things. It's a wrapper for
rjags which periodically saves the MCMC chains to files.
The advantages of that are:
- you see progress as the files accumulate on the disk,
- you can load the output saved so far, run diagnostic
checks and get preliminary
- if there's a problem, or if you already have enough
iterations, you can terminate the run early without
waiting for it to finish,
- you can save long chains and all the parameters you
might need in files, then load into your R session only selected
parameters and thinned chains,
- you no longer need to worry about "Cannot allocate..."
memory errors when the JAGS run finishes,
- and of course you still have your samples in case of
power outages or other disasters - that was the original
purpose of the project,
- maybe Windows 10 is "another disaster": it will restart
your computer to install updates, terminating anything you,
the mere owner of the thing, wanted to run.
In Malaysia we prefer to use our national coordinate reference system (CRS) for
scientific purposes, rather than the international UTM standard. A UTM zone
boundary splits the Malay Peninsula in two, and another boundary bisects
Malaysian Borneo. We encountered multiple problems using the Malaysian CRS with
QGIS and R software, problems which Mikael Rittri attributed to the "Malayan
Monster" back in 2011. Some solutions are described here.
fairly common strategy in ecological research is to measure a
large number of covariates, put these together into a series of
models with all combinations of covariates, then look at
Akaike's Information Criterion (AIC) to see which is the best.
The idea is that the covariates that appear in the best model
(with lowest AIC) are really affecting the response variable.
This approach has the derogatory name of "dredging". But does it
I've recently been asked about coding categorical covariates in
JAGS. You can do it in the usual frequentist manner with dummy
variables for all but one of the variables, but JAGS allows
other, more easily interpreted ways of coding, especially if there
is only one
categorical covariate in the model and it
is a logistic or log regression. Many of our wildlife analyses
logistic models under the hood.
In R, the class
is used for categorical variables. A factor has a
attribute with the names of the categories, in alphabetical
order by default, and the values are stored as integers giving
the level number.
A toy example
I'll generate some simulated data for a logistic regression
model (as that generalises to a lot of our other models). We
have four players with different degrees of skill and a
continuous covariate which does nothing.
In occupancy modelling we are interested in the probability that a species is
present at a location, taking account of the possibility that it may be there
but not be detected. So we need to jointly estimate probability of occupancy and
probability of detection. In many studies we would expect probability of
occupancy to be affected by the number of individuals at the location, so it
would make sense to include that as a covariate in the analysis. But of course
we don't know how many are present - if we did we would not be trying to
estimate occupancy. The Royle-Nichols model treats the
number available for detection at a site as a latent variable which can be
estimated from detection/non-detection data.
In the simplest model, the number
available for detection at site \( i \), \( N_i \), is modelled as
being drawn from a Poisson distribution with parameter \(
\lambda \). This is the biological model. The observation model
assumes that detection of each individual is a Bernoulli trial,
individuals being detected with probability \( r \). The data do
not show which individuals are detected, just whether the
species was detected, and that is recorded if at least one
individual is observed.
a recent post I described the
limits on floating points numbers and the effect on
probabilities close to 0 or 1. This can cause errors in
manipulating probabilities, in particular in calculating
likelihoods, which can become very small. We get around that by
working with logs of probabilities. Multiplication is then easy
(just add the logs), but addition and subtraction are more
difficult. I proposed functions to add probabilities and to
calculate 1 - p avoiding underflow or overflow.
In the course
of applying these ideas to maximum likelihood estimation in the
wiqid package, a couple of new issues came up. We
logSumExp and develop new functions to
add together two vectors and to do matrix multiplication.
Great apes make nests to rest and ungulates produce
clearly-visible dung piles: both of these are used to estimate
the density of animals. Signs are much easier to count than
animals. There are more of them, and they don't run away or hide
when you approach. To convert the number or density of signs to
number of animals, we need two additional bits of information:
how many signs each animal produces per day, and how many days
the signs remain visible (the persistence time or decay time).
If each animal produces p
signs per day and signs remain visible for t days, then
sign density will be:
S = D x p x t
where D is the density of animals. If we
can estimate p and t, we can calculate D
from S. In this post I want to provide code for
estimating the persistence time of signs for retrospective
studies of decay.
There are limits to the size and precision of numbers which computers can
handle, and they can cause problems when we try to calculate
probabilities, and especially likelihoods. Numbers smaller than
10-308 become zero and
numbers greater than 0.9999999999999999 become 1. Although not
important in the real world, these rounding errors mean that
there are large ranges of parameter values that have likelihoods
of 0 or 1. Our algorithms to find maximum likelihoods or
generate MCMC chains fail when they crash into the 1 cliff or
fall into the 0 abyss.
The solution is to work
with logarithms of probabilities instead of the actual values [log(p)
instead of p], eg, we routinely work with log-likelihoods.
Multiplying probabilities is then simply a matter of adding up
the logs. But sometimes we need to add up probabilities or
calculate the complement, 1 - p, and we need to do that without
falling in the 0 abyss or smashing into the 1 wall.
We often have to deal with multidimensional data, which
generally has to be squeezed into a 2-D format for tables and
spreadsheets and then later reconstituted. Whenever I have to
do that, I need to rediscover how to do it. So here's a tutorial
for my future self which might be useful for others.
simple example: we have 3 sites, visited 4 times per year for 2
years. This is usually shoehorned into a table with 8 columns for
the visits, like this:
site 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4
A 0 1 1 3 2 0 1 1
B 1 0 1 0 3 0 2 2
C 5 2 1 1 3 4 2 1
These look like counts, but the data could be detection/nondetection
(0/1) data, wind speed at each visit, or the name of the
When animals can be individually identified by their natural
markings, as is the case with most cats, camera traps can be
used to gather data for density estimation with
SECR. Because markings differ on
each side of the animal, cameras are usually set in pairs to
simultaneously record both sides. I have just been looking at
ways to analyse data from unpaired cameras, and was surprised to
find that unpaired cameras can be preferable to paired cameras
when the number of cameras is limited.
This was based on
simulations with just one set of parameter values, but it does
suggest that projects with limited resources should consider
using unpaired cameras, as this allows more locations to be
Update 5 Nov 2018: See this new
paper by Ben Augustine et al (2018) Spatial capture–recapture
with partial identity: An application to camera traps, Annals
of Applied Statistics 12 (1) 67-95
5 February 2017, updated 17 April 2018,
5 Nov 2018
There is usually some correlation between the covariates that we
want to include in our model, a phenomenon known as "collinearity"
This is not a problem if the correlation is small, but you need
to be careful if the absolute correlation between two covariates
is > 0.7. You can avoid problems by discarding one of a pair of
correlated covariates, but that can be a mistake if both have a
real biological impact. We'll look at a toy example where both
covariates should be included.
Most runs of MCMC chains for Bayesian estimation begin with an
adapt phase, when the samplers used to produce the chains are
tuned for best performance. I recently did a long run for a
complex model, and the output was flagged with "convergence
failure". In fact the problem was caused by cutting short the
adapt phase. The default adapt phase for the R package
jagsUI is only 100 iterations, which was woefully
inadequate for my model. We should be specifying big numbers for
n.adapt and small
numbers, or even zero, for
You have produced several plots in R, and now you want to go
back and take a look at one of those old plots. On a Mac or
RStudio, you can do that with the navigation keys. R for Windows
has the same functionality but it is not enabled by default. Here's how to enable it.
The wiqid package has been
accepted on CRAN, with the first version being 0.1.0. This means
that it can now be installed - together with its dependencies -
and update.packages() will also
get you the newest version.
I have just [August 2016] put together an R package to automate the conversion
of habitat masks produced with Murray Efford's
package to the format needed to run in JAGS or WinBUGS/OpenBUGS.
[Rewritten 28 June 2019 to work with version 0.1.0 of the package.]
Updated 2018-07-16: The compatibility issues with R 3.3 and JAGS
4.2 do not affect current versions.
I don't have an Apple computer, but I have picked up some hints
about installing R and JAGS on a Mac from trying to
trouble-shoot friends' installations.
Oct 2017, changes in red.
People seem to run into
problems with different versions of the Mac OS, R, JAGS and the
rjags package. The only way to stay sane is to use recent
versions of all four.
Check your Mac OS version!
From the Apple menu, choose About This Mac; the version
number appears below the name. Note whether you have v. 10.11
(El Capitan) or
later. If you have an earlier version, upgrade your OS
before going further.
I recently tried installing R and JAGS on my machine running
Ubuntu. I wanted to test my
wiqid packages with the new
version of JAGS on Ubuntu. It took me a while, but I finally
found a simple way to do this which might be of interest to
I already had R and JAGS 3 installed, together with
rjags package version 3. Installing the
rjags package within
R (or updating it with
update.packages()) installs the new
rjags, v.4, which requires JAGS 4 and throws an error
if it isn't found. But the Ubuntu repository still has JAGS 3,
so you cannot update JAGS with Ubuntu Software Center.
In the last
post we looked at a way to use conjugate distributions for
via a Gibbs sampler. The output from this was an MCMC sample of
random draws from the posterior distribution. We can produce
similar MCMC samples without using conjugate distributions with
a method often called "Metropolis-within-Gibbs".
The idea for the sampler was developed by
Nicholas Metropolis and colleagues in a paper in 1953. This
was before the Gibbs sampler was proposed, but it uses the same
idea of updating the parameters one by one. A better name would
be "componentwise random walk Metropolis sampler". The rules for
the random walk ensure that a large number of samples will be a
good description of the posterior distribution.
As discussed in the
post, conjugate distributions provide an easy way to
calculate posterior distributions for a single parameter, such
as detection of a species during a single visit to a site where
it is present. If we have more than one unknown parameter in our
model - as with a simple occupancy model, where we have
detection and occupancy - we may still be able to use conjugacy
via a Gibbs sampler.
Gibbs sampling works if we can describe the posterior for
each parameter if we know all the other parameters in the model.
distributions provide useful tricks for combining informative
priors with likelihoods to produce posterior distributions. In
the days before powerful computers and clever algorithms, they
were often the only way. They only work for a single variable.
Nevertheless, Gibbs sampling, which the wiqid package uses when
possible, builds on the idea of conjugate distributions.
example, we'll use estimation of detection probability from data
for repeat visits to a site which is known to be occupied by our
target species. First, we'll describe the beta distribution,
then see how that can be combined with our data. A discussion of
priors will follow, and we'll finish with brief descriptions of
conjugate priors for other types of data.
I'm planning a series of posts looking at what happens under the
hood when we analyse a data set using some of the estimation
functions in the
wiqid package. I'll focus mainly on Bayesian
methods, but this first post will look at the likelihood, which
is used for both Bayesian analysis and maximum likelihood
We'll use a simple occupancy model. It has just
two parameters and both must between 0 and 1. That means that we
can plot all possible combinations of the two parameters in a
simple two-dimensional graph. As we'll see we need to add a
third dimension, but three is still manageable.
The wiqid package for R
statistical software provides Quick and Dirty functions for the
analysis of Wildlife data.
Currently it has functions for
estimating occupancy, abundance from closed captures, density
from spatial capture-recaptures, and survival from
mark-recapture data, plus a slew of functions for species
richness and alpha and beta diversity.
It is intended to be used for (1) simulations and bootstraps,
(2) teaching, and (3) introducing Bayesian methods. And it
should work on all platforms: Windows, Linux, and Mac.
Jackknife estimators are used in ecology in two situations:
In both cases, the raw number of animals or species observed
(Sobs) is often too low, as some animals/species are missed. The
raw number is thus a biased estimator. The jackknife aims to
produce unbiased estimates.
- mark-recapture estimation of number of animals in a
- species richness estimation for a defined assemblage.
Analysis of spatially explicit capture-recapture (SECR) data can
be done in a maximum likelihood (ML) or a Bayesian framework. Program
DENSITY and the
secr package take care of the former. Bayesian
analysis with the usual workhorses, WinBUGS, OpenBUGS and JAGS,
is straightforward if the traps are laid out in a large area of
Faced with patches of suitable habitat
surrounded by inhospitable terrain, or a large extent of habitat
punctuated with patches of non-habitat, we had the choice of ML
methods or one of the packages designed specifically for
Bayesian SECR analysis, such as SPACECAP or SCRbayes. But then
we are limited to the range of models provided by package
authors: we don't have the flexibility to specify our own models
that comes with WinBUGS, OpenBUGS or JAGS.
Here I present a way to incorporate patchy habitat into a BUGS/JAGS
our basic data analysis workshops, we use an idea from John Kruschke's
Doing Bayesian Data Analysis: we use spinners to
generate random values for continuous variables and introduce
the concept of probability density.
We start off with simple spinners representing a
uniform distribution over a range from, say, 0 to 0.5. We
discuss the problems of attaching a probability to an exact
value, which leads to probability of a range of values and hence
I have recently been looking at the design of camera-trap
studies to estimate the population density of tigers when
populations are very sparse. The
intention is to use recently-developed spatially explicit capture-recapture (SECR)
methods to analyse the data. The
optimal camera-trap layout for SECR may well differ from the design
used for older methods.
Before the advent of SECR methods, putting all your traps
into a single cluster with minimal perimeter length made sense,
as you needed to estimate the area trapped animals came from to
get a density. SECR estimates density directly, without needing
to estimate area, so a single, large cluster may no longer be
Do the models used for SECR (spatially explicit capture
recapture) assume that animals' home ranges are approximately
I've seen this asserted a couple of times, in particular in Tobler
and Powell (2013, p.110), and I've myself drawn circular home
ranges when discussing the interpretation of the capture
parameters, but I don't think it is a necessary assumption.
Update: See the new paper by Murray
Efford (2019) "Non-circular home ranges and the estimation of
population density" Ecology, 100, e02580.
I've a couple of ideas for blog posts on SECR (spatially
explicit capture-recapture), and this post sets out the basic
concepts of SECR which I will need to refer to in later posts.
Capture-recapture methods (also know as mark-recapture or
capture-mark-recapture) have been used to estimate the size of
animal populations for many years: the first software package
for analysis of this kind of data, CAPTURE (Otis et al
1978), is now 35 years old. Early methods did not use the
spatial component in the data, the capture locations, and
spatially explicit capture-recapture models (SECR, or just
spatial capture-recapture, SCR) first appeared in 2004 (Efford
John Kruschke's BEST code for R
is a nice introduction to Bayesian thinking for folks used to
t-tests. I've referred to it, linked to it, and used it in
workshops before now.
The idea is to provide an R function
which is as easy to use as t.test
but which gives not a mere p-value but the kind of output
Bayesians are used to - posterior probability distributions.
John's BESTmcmc function uses
JAGS, but handles all the preliminaries automatically and
produces a result in a simple format.
A new R package called overlap
to estimate the overlapping of animal activity patterns from
data derived from camera traps has now arrived on R's central
As soon as cameras with "data backs" came
along in the early 90s, biologists realised that they could
harvest data on the activity patterns of rare, secretive forest
animals. Were they diurnal, nocturnal, crepuscular, or maybe
cathemeral (active all around the clock)? More recently, people
have tried to get clues about how species interact - competition
or prey-predator interactions - from activity patterns, by
examining the extent of overlap.
In our corner of the biological world, Martin Ridout and Matt
Linkie published a paper (2009) on the activity patterns of
tigers, clouded leopards and golden cats in Sumatra, with a lot
of technical detail on how overlap could be quantified and
confidence intervals estimated. They followed up (2011) with a
paper on tigers and their prey, also in Sumatra. ...
Often we are interested in the difference between the means of
two populations and whether we can infer from samples from the
populations that the means are different.
This is often a silly
question: the means of real populations are almost always
different, even if the difference is microscopic. More useful
would be to estimate the difference and the probability that it
is big enough to be of practical importance. See the
BEST software for a way to do this in R.
Sometimes we are presented with confidence intervals for each
of the means. This happens in particular with the standard
packages we use for wildlife data analysis, where the output
includes confidence intervals for each coefficient or real
value. Can we infer evidence of a difference from confidence
intervals in the same way as for a p-value from a test of
Sometimes people I talk to are worried because their data
aren't normally distributed, and they believe that they can't
use the usual techniques such as t-tests or ANOVA without first
transforming the data to be normal, or they must resort to
There are many good reasons for transforming data or NOT
using t tests or F tests, but non-normal data is not usually one
A couple of people on a recent workshop had trouble with
their AVG anti-virus software when installing JAGS 3.3.0.
This appears to be due to AVG's paranoia: see
Martyn Plummer's comment. No malware is detected by McAfee
AntiVirus Plus or Trend Micro Office Scan. See also information
on false positives at the
In ecology and wildlife studies, a lot of our spatial data
takes the form of rasters rather than vector files. When you
first add a raster in QGIS, you usually get a plain grey
rectangle, or maybe just a grey outline on a white background,
as most raster file formats have no styling information. To make
sense of a raster, you need to change the style.
Here I'll give some hints for "quick-and-dirty" styling to
display the contents of a raster. For a more detailed tutorial, see
In a recent post,
I showed how to deal with "distance from..." data in GIS layers
using the R packages for handling spatial information. The
example I used there involved
1. producing a layer with distance-from-nearest-road as a
habitat layer so that we can calculate a probability of
occupancy layer, and
2. extracting distance-from-nearest-road for each of our cameras
in order to model our data.
Here we will see how to do the same thing
At our recent workshop on Geographical Information
Systems (GIS) using Quantum GIS we had a number of people
interested in working with radio telemetry or GPS data to model
animal home ranges. The home range plugin for QGIS doesn't work
with current versions, at least with Windows.
It is designed to pass data to R and get the adehabitat
package to do the home range estimation and pass the result back
to QGIS. QGIS uses Python code, and to get it to talk to R
requires a bit of software called "RPy2". This was always
difficult to set up on Windows, but Python has been upgraded and
RPy2 no longer works. In any case, the adehabitat package has
been replaced by new packages with a wider ranger of options.
So now it's better to prepare spatial data in QGIS, read the
files into R, process with adehabitatHR, write the results to
new files, and load into QGIS.
We recently ran a workshop on Geographical Information
Systems (GIS) using Quantum GIS for ecologists and wildlife
researchers. For many species, distance from water, a road,
forest edge, or a settlement may be an important habitat
For example, we may be using automatic cameras to investigate
occupancy of sites by leopards. Probability of occupancy may
depend on distance from the nearest road. Given vector layers
with roads and camera locations, we want to do two things:
1. produce a layer with distance-from-nearest-road as a
habitat layer so that we can calculate a probability of
occupancy layer, and
2. extract distance-from-nearest-road for each of our cameras
in order to model our data.
I sometimes need to put formulae into my web pages, and I've
been exploring the use of
In the past I've inserted the formula into MS Word with MS
Equation 3.0, doing a screen capture, cropping the image to the
formula I want, saving as a .GIF file, and then displaying it on
the web page as an image. So I get something like this for the
That's not ideal. If I want to change anything,
I have to start all over again from Word. It's also messy if I
want to put something like
into the text; for a start it doesn't line up properly. MathJax allows me to
type the formula in LaTeX style directly into the HTML code for
my web page.
I have a collection of data sets for use during workshops or
just to play with when trying out new statistical techniques or
A big question is what format to use, and I've changed my
mind on this several times already!
After looking at
this blog post by John Mount I've decided to try using
tab-separated files with a .tsv extension.