Importing data into R for home range analysis
|Back to home page||
At our recent workshop on Geographical Information Systems (GIS) using Quantum GIS we had a number of people interested in working with radio telemetry or GPS data to model animal home ranges. The home range plugin for QGIS doesn't work with current versions, at least with Windows.
It is designed to pass data to R and get the adehabitat package to do the home range estimation and pass the result back to QGIS. QGIS uses Python code, and to get it to talk to R requires a bit of software called "RPy2". This was always difficult to set up on Windows, but Python has been upgraded and RPy2 no longer works. In any case, the adehabitat package has been replaced by new packages with a wider ranger of options.
So now it's better to prepare spatial data in QGIS, read the files into R, process with adehabitatHR, write the results to new files, and load into QGIS.
The package author, Clément Calenge, has provided good documentation in the vignettes for adehabitatHR and its relatives. He works through several examples using data sets contained in the package. Although he describes how the data is formatted in R, he gives no guidance on how to get your data in spreadsheets and GIS layers into R and ready to analyze. Hence this post!
You will be working with two types of data: (1) the position of the animals when relocated, probably in a table with coordinates and extra columns for additional details, and (2) habitat data for the whole study area, in the form of GIS files.
What this covers:
For background information, look at the vignette for adehabitatHR. We'll look at the example which they provide, puechabonsp. Open R and enter the following code:
The example data is a list with two elements: "map" is a SpatialPixelDataFrame, which holds raster data, and "relocs" is a SpatialPointsDataFrame which holds the tracking data. These data classes are defined in R's sp package, and are used by all packages using spatial data. The Spatial* classes are for data which apply to a specific place on the earth; so they all contain coordinates and the CRS used. Well, that's not quite true, it is possible for the CRS to be "NA", but that rather defeats the purpose!
The actual CRS is not important for home range analysis (and for puechabonsp it's "NA"), provided (1) it is a projected CRS, ie. using metres not degrees, and (2) all the coordinates have the same CRS.
The example we use below is based on the puechabonsp data set. You can download the example data here. It's a .zip file: you should extract all the files and place them in a folder on your hard drive, not on the desktop. In R, go to File > Change dir... and navigate to the folder containing the data.
This will depend on the data you have available and of course the variables you want to include. Reprojecting raster data to a different CRS results in 'fuzziness' in the data, as the new grid cells don't match up with the old. So see which data rasters are in metres, and choose the CRS of the most important ones.
Resolution depends on the data available: working with 10m grid cells when your habitat data rasters have 1km cells is a waste of time, as more cells take more time to process.
For the example data we have only one raster file, and it uses "NTF (Paris) / Lambert zone III", EPSG code 27573, as the CRS. So we'll use that for all our data. The resolution for this is 100m, and uses a reasonable number of cells (6636) to cover the study area. We can get elevation data in about 90m resolution, so we'll use that for our rasters.
Open the example tracking data file, tracking_WGS84.csv, in a spreadsheet program.
It has one row for each relocation of an animal with X and Y coordinates and a column to identify which animal it refers to, plus columns for other data about the animal or the relocation which you might want to use.
Close the file in the spreadsheet program, load it into R and check it:
All the variables are numeric, except Name which is a factor with 4 levels.
We need some extra packages to handle our data; load them now:
We can turn track into a SpatialPointsDataFrame simply by specifying that the "X" and "Y" columns are the coordinates:
Let's use a different colour for each animal:
track has no CRS information:
As indicated by the file name, the coordinates use WGS84 as the CRS, in decimal degrees. The EPSG code for this is 4326, so we can complete that:
The last step simply told R the current CRS of the track data. We now need to transform the data to our chosen CRS, "NTF (Paris) / Lambert zone III", EPSG code 27573:
Notice that the proj4string is now very long, and the min/max coordinates are now in metres. This is what we want.
This is a limestone area with patches of bare rock. We think plant cover is an important variable, and we have an ESRI ascii file showing the percentage cover for the study area. Load it into R with raster:
ESRI ascii files contain no CRS information, but when we got the ascii file we were told it was in "NTF (Paris) / Lambert zone III", which has EPSG code 27573, so we can set that:
Often you will need to reproject a raster to match your chosen CRS and resolution. We'll see how to do that with the downloaded data.
We suspect animal movements may be affected by hunting regulations in protected areas, and we have a shape file with these data. Well, it's actually a shape file I made up myself, so don't arrest anyone for hunting in the protected areas shown! Read it into R with readShapeSpatial. I've added ".v" to the end of the name to remind myself that it's a vector format, not a raster.
Shape files do not include CRS information, so legal.v has "NA" for its proj4string. In fact, this information was digitized from Google Earth and used their WGS84/Pseudo Mercator projection, EPSG 3857. So we can set that:
Now we transform this layer to our chosen CRS and check that it matches the herb raster layer:
Raster layers give a single number for each cell, but vector layers often have several variables for each polygon. So we check the variables in legal2.v:
The types are A, B, C, and X. We need to code these as numbers 1-4 for the raster layer, so we make a new column "typeCode":
Now we can do our raster layer. All the raster layers must match exactly, so we use the herb raster as a template:
legal2.r has lots of NAs: land outside any of the polygons has been given NA, ie. status unknown. In fact, we know that it is unprotected. We'll code "unprotected" as 0, so change the NAs to 0s:
The NAs have disappeared and now we have lots of zeros. Plot it, and you'll see that most of the land is light grey = no legal protection.
Some parts of the study area are very steep, and we think slope will be important. Elevation and aspect may also be important. We don't have these layers, but we can download elevation from the internet and make slope and aspect layers ourselves with the functions in the raster package.
The easiest way to get data from the internet is to use getData; check the help page:
We'll get SRTM 90 m resolution data; for that we need to specify approximate longitude and latitude. Look at tracking_WGS84.csv again and you'll see the values are around long 4 and lat 44.
If you are online and have a fast connection, you can download the data, but it's a big file (24 MB) and will take a while. If you prefer, you can load the smaller data set from the example data.
This downloads the data from the web and crops it to about the area we are interested in, and then saves it:
This loads elev.mid from the data folder:
When you save with writeRaster, two new files appear in your folder, a big one with .gri extension and a small one with .grd. Open elevation_mid.grd in Notepad and you will see the information about the data set.
Let's see what we have in our mid-sized data set:
Note that elev.mid already has CRS information, it is unprojected WGS84 (ie. in degrees). Now we need to transform and reproject the elevation raster to match the herb raster.
When I first did this, I encountered a problem when I created the slope and aspect rasters. To do this, the elevation in each cell is compared with its 8 neighbours. The cells on the edge of the raster don't have 8 neighbours, so they get slope or aspect of NA. We can get around this by working with a raster which is bigger than the final version.
Before reprojecting, we'll pad or expand the herb raster, adding one row or column on each side:
Note that herb.padded has 2 more rows and 2 more columns, and the extent has expanded by 100m each way. Now we can deal with the elevation data:
The coordinates are now in metres, and the number of rows and columns and the extent all match herb.padded. You can see in the plot that the extent is smaller than for elev.mid.
The study area has some very steep places, and slope may affect the movement of our animals. Aspect may be important too, as north-facing slopes are colder and wetter, and retain snow longer in the spring.
We don't have layers for slope or aspect, but we can derive these from the elevation data with the function terrain:
NAs have appeared: these are around the edge of the raster. We can get aspect data in the same way:
The output is in degrees (0 - 360). We'd rather have just N, E, S, W, coded as 1, 2, 3, 4. We can get this with reclassify:
To use this, we need the rcl matrix with columns "from", "to", and "becomes":
Now the aspect layer just shows the four cardinal points.
To begin with, we put all our raster layers together in a rasterStack:
Oops! I forgot that legal2.r matches herb, not herb.padded; we'll pad it the same way:
The plots look okay, but we've got some odd names in there. We can fix that in the rasterStack:
The names are right now. Notice that herbaceous layer has data for the study area only , the others have data going far outside the area of interest. We can get rid of these extra data with mask:
Now all five layers match.
It's straightforward to convert our habitat rasterStack to a SpatialPixelsDataFrame. We'll plot it once more to check (using mimage, as it's a SpatialPixelsDataFrame):
I also checked the extent (bounding box) with bbox and compared it with the original herb raster, before padding:
The bounding boxes are the same; the extra rows and columns of NA cells that we added earlier have gone.
We're almost ready to put the tracking data and the habitat layers into a single list, but let's double-check that the CRS is the same for both:
Now they can go in the list with the names "map" and "relocs" - we'll stick to the names used in the adehabitatHR documentation.
Here's the bit discussed in Section 3.3 of the
vignette. We make a Minimum Convex Polygon containing 95% of
the relocations for each animal, see which cells of the map
are enclosed in the home range of animal #2, turn the vector of
cells into a SpatialPixelsDataFrame, then plot the
Correction 1 (12 Mar 2014): The original code below didn't produce the map shown (spotted by Christian Osorio Popiolek), which is the herbaceous layer, not the aspect layer. And layer #5 is the legalStatus layer! It's always safer to use the layer names rather than the numbers, as shown below.
Correction 2 (25 Jan 2016): The function overlay in package sp is defunct and has been replaced with the function over (spotted by Bill Kanapaux).
And now it works!
| Updated 25 Jan 2016 by Mike Meredith|