We often have to deal with multidimensional data, which
generally has to be squeezed into a 2D format for tables and
spreadsheets and then later reconstituted. Whenever I have to
do that, I need to rediscover how to do it. So here's a tutorial
for my future self which might be useful for others. Here's a
simple example: we have 3 sites, visited 4 times per year for 2
years. This is usually shoehorned into a table with 8 columns for
the visits, like this:
occasion
site 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4
A 0 1 1 3 2 0 1 1
B 1 0 1 0 3 0 2 2
C 5 2 1 1 3 4 2 1
These look like counts, but the data could be detection/nondetection
(0/1) data, wind speed at each visit, or the name of the
observer.
An example to play with
Here's the code to produce the data above:
colNames < as.vector(t(outer(1:2, 1:4, paste, sep=".")))
siteNames < c("A", "B", "C")
dat1 < matrix(rpois(24, 1.5), 3)
dimnames(dat1) < list(site = siteNames, occasion = colNames)
But instead of that, I want to play with a matrix of
character strings, where each string tells us what it is. Here
it is:
mat1 < outer(siteNames, colNames, paste0)
dimnames(mat1) < list(site = siteNames, occasion = colNames)
mat1
occasion
site 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4
A "A1.1" "A1.2" "A1.3" "A1.4" "A2.1" "A2.2" "A2.3" "A2.4"
B "B1.1" "B1.2" "B1.3" "B1.4" "B2.1" "B2.2" "B2.3" "B2.4"
C "C1.1" "C1.2" "C1.3" "C1.4" "C2.1" "C2.2" "C2.3" "C2.4"
A comment on dimnames
I think it's really useful to name matrices and arrays. As
you can see above, you can name the dimensions (here "site" and
"occasion") as well as the
rows and columns. The names generally survive when you summarise
(with colSums for instance) or select subsets.
mat1[, 1:4]
occasion
site 1.1 1.2 1.3 1.4
A "A1.1" "A1.2" "A1.3" "A1.4"
B "B1.1" "B1.2" "B1.3" "B1.4"
C "C1.1" "C1.2" "C1.3" "C1.4"
mat1[2, ]
1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4
"B1.1" "B1.2" "B1.3" "B1.4" "B2.1" "B2.2" "B2.3" "B2.4"
The last example has lost its row name. That's because it
doesn't have a row, it's not a matrix. By default, R drops
unused dimensions, so a single row or a single column becomes an
ordinary vector. To prevent this happening, add the argument
drop = FALSE to the call:
dim(mat1[2, ]) # no dimensions, it's not a matrix
NULL
dim(mat1[2, , drop = FALSE]) # now okay
[1] 1 8
mat1[2, , drop = FALSE] # now okay
occasion
site 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4
B "B1.1" "B1.2" "B1.3" "B1.4" "B2.1" "B2.2" "B2.3" "B2.4"
That's an important argument to add when extracting rows, and
getting a vector when you are expecting a matrix would trip you
up.
Thinking about multidimensional arrays
Visualising a 2D array  a matrix or table  is easy, we do
it all the time. I like to think of a 3D array as a collection
of pages, each with rows and columns; the pages form the 3rd
dimension. You can bundle pages into books and put several books
on a shelf: that's 4D array. Several shelves form a bookcase
(5D). A row of bookcases along a wall (6D). Rooms with
bookcases along a corridor (7D)... on different floors (8D)...
in different wings of the library (9D). That should be enough!
It's important to realise though that R stores even 9D
arrays as a single sequence of values, together with an
attribute that indicates how the sequence should be cut up. The
first few values belong to the 1st column on the 1st page of the
1st book...; next comes the 2nd column on the 1st page, and each
of the columns in turn until the 1st page has been dealt with.
Then comes the 1st column on the 2nd page, and so on.
It's very easy to change the dimensions attribute, a bit more
difficult to rearrange the values in the sequence.
Convert our simple matrix to a 3D array
So mat1 is stored as a single sequence of values. Let's look
at the structure:
str(mat1)
chr [1:3, 1:8] "A1.1" "B1.1" "C1.1" "A1.2" "B1.2" "C1.2" "A1.3" "B1.3" ...
 attr(*, "dimnames")=List of 2
..$ site : chr [1:3] "A" "B" "C"
..$ occasion: chr [1:8] "1.1" "1.2" "1.3" "1.4" ...
We want an array with sites (rows) x visits (columns) x years
(pages). So it should fill in all three rows of the first four columns, then move
to a second page.
( arr1 < array(mat1, c(3, 4, 2)) )
, , 1
[,1] [,2] [,3] [,4]
[1,] "A1.1" "A1.2" "A1.3" "A1.4"
[2,] "B1.1" "B1.2" "B1.3" "B1.4"
[3,] "C1.1" "C1.2" "C1.3" "C1.4"
, , 2
[,1] [,2] [,3] [,4]
[1,] "A2.1" "A2.2" "A2.3" "A2.4"
[2,] "B2.1" "B2.2" "B2.3" "B2.4"
[3,] "C2.1" "C2.2" "C2.3" "C2.4"
Notice that we don't need to tell R to "unwrap" mat1 ,
it will do that automatically. But we have lost the names; our
old dimnames attribute won't fit the new array. We'll
give the array new names, and then we'll check by pulling out one slice from each
dimension.
dimnames(arr1) < list(site = siteNames, visit=1:4, year=1:2)
arr1
, , year = 1
visit
site 1 2 3 4
A "A1.1" "A1.2" "A1.3" "A1.4"
B "B1.1" "B1.2" "B1.3" "B1.4"
C "C1.1" "C1.2" "C1.3" "C1.4"
, , year = 2
visit
site 1 2 3 4
A "A2.1" "A2.2" "A2.3" "A2.4"
B "B2.1" "B2.2" "B2.3" "B2.4"
C "C2.1" "C2.2" "C2.3" "C2.4"
arr1[3,,] # site 3
year
visit 1 2
1 "C1.1" "C2.1"
2 "C1.2" "C2.2"
3 "C1.3" "C2.3"
4 "C1.4" "C2.4"
arr1[,2,] # visit 2
year
site 1 2
A "A1.2" "A2.2"
B "B1.2" "B2.2"
C "C1.2" "C2.2"
arr1[,,1] # year 1
visit
site 1 2 3 4
A "A1.1" "A1.2" "A1.3" "A1.4"
B "B1.1" "B1.2" "B1.3" "B1.4"
C "C1.1" "C1.2" "C1.3" "C1.4"
A more complicated example
Now an example with four dimensions. Our data set has sites,
visits, and years, as before, and now just 2 species ("a" and "b"). In the
data file, there is a row for each site and each species,
grouped by species.
rowNames < as.vector(t(outer(c("a", "b"), siteNames, paste0)))
mat2 < outer(rowNames, colNames, paste0)
dimnames(mat2) < list(rowNames, colNames)
mat2
1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4
aA "aA1.1" "aA1.2" "aA1.3" "aA1.4" "aA2.1" "aA2.2" "aA2.3" "aA2.4"
aB "aB1.1" "aB1.2" "aB1.3" "aB1.4" "aB2.1" "aB2.2" "aB2.3" "aB2.4"
aC "aC1.1" "aC1.2" "aC1.3" "aC1.4" "aC2.1" "aC2.2" "aC2.3" "aC2.4"
bA "bA1.1" "bA1.2" "bA1.3" "bA1.4" "bA2.1" "bA2.2" "bA2.3" "bA2.4"
bB "bB1.1" "bB1.2" "bB1.3" "bB1.4" "bB2.1" "bB2.2" "bB2.3" "bB2.4"
bC "bC1.1" "bC1.2" "bC1.3" "bC1.4" "bC2.1" "bC2.2" "bC2.3" "bC2.4"
as.vector(mat2)
[1] "aA1.1" "aB1.1" "aC1.1" "bA1.1" "bB1.1" "bC1.1" "aA1.2" "aB1.2" "aC1.2" "bA1.2" "bB1.2"
[12] "bC1.2" "aA1.3" "aB1.3" "aC1.3" "bA1.3" "bB1.3" "bC1.3" "aA1.4" "aB1.4" "aC1.4" "bA1.4"
[23] "bB1.4" "bC1.4" "aA2.1" "aB2.1" "aC2.1" "bA2.1" "bB2.1" "bC2.1" "aA2.2" "aB2.2" "aC2.2"
[34] "bA2.2" "bB2.2" "bC2.2" "aA2.3" "aB2.3" "aC2.3" "bA2.3" "bB2.3" "bC2.3" "aA2.4" "aB2.4"
[45] "aC2.4" "bA2.4" "bB2.4" "bC2.4"
I've also displayed the sequence of values that we will have
to organise into an array. This won't work as nicely as last
time, and we have no choice about the structure of the initial
array; we'll change it later.
The first 3 values are the 3 sites with species "a" and the
first visit in the first year; these will go in column 1, with a
row for each
site. The next 3 are the 3 sites with species
"b" and that goes into the 2nd column, so species are in
columns. Now we come to the second visit in year 1, and this
should go on the second page; visits will be pages. After 4
pages, we have included all the data for the first year, and we
start a new book for the second year.
So the data will be read into a 4D array with 3 rows (sites)
x 2 columns (species) x 4 pages (visits) x 2 books (years).
As a rule of thumb, see which dimensions move fastest
(compare with a clock with second, minute and hour hands): the
sites "rotate" once for each species, the set of species rotates
once for each visit, the set of visits rotates once for each
year.
Or think of it as nesting: for the rows the sites are nested
within species, and in the columns visits are nested within
years. As usual in R, rows come first, so again it's
sites, then species, then visits, then years.
This
is how we do it.
arr2 < array(mat2, c(3, 2, 4, 2))
dimnames(arr2) < list(site = siteNames,
species = c("a", "b"),
visit = paste0("v", 1:4),
year = paste0("y", 1:2))
arr2
, , visit = v1, year = y1
species
site a b
A "aA1.1" "bA1.1"
B "aB1.1" "bB1.1"
C "aC1.1" "bC1.1"
, , visit = v2, year = y1
species
site a b
A "aA1.2" "bA1.2"
B "aB1.2" "bB1.2"
C "aC1.2" "bC1.2"
, , visit = v3, year = y1
species
site a b
A "aA1.3" "bA1.3"
B "aB1.3" "bB1.3"
C "aC1.3" "bC1.3"
, , visit = v4, year = y1
species
site a b
A "aA1.4" "bA1.4"
B "aB1.4" "bB1.4"
C "aC1.4" "bC1.4"
, , visit = v1, year = y2
species
site a b
A "aA2.1" "bA2.1"
B "aB2.1" "bB2.1"
C "aC2.1" "bC2.1"
, , visit = v2, year = y2
species
site a b
A "aA2.2" "bA2.2"
B "aB2.2" "bB2.2"
C "aC2.2" "bC2.2"
, , visit = v3, year = y2
species
site a b
A "aA2.3" "bA2.3"
B "aB2.3" "bB2.3"
C "aC2.3" "bC2.3"
, , visit = v4, year = y2
species
site a b
A "aA2.4" "bA2.4"
B "aB2.4" "bB2.4"
C "aC2.4" "bC2.4"
Well, that's very nice, but it isn't the format we want:
we'll use the aperm() function to permute the dimensions.
The order of the dimensions in the current array, arr2 ,
is 1.sites, 2.species, 3.visits, 4.years, and we
want sites (currently #1) x visits (#3) x years (#4) x species
(#2). So we need to
enter perm = c(1, 3, 4, 2) :
( arr3 < aperm(arr2, c(1,3,4,2)) )
, , year = y1, species = a
visit
site v1 v2 v3 v4
A "aA1.1" "aA1.2" "aA1.3" "aA1.4"
B "aB1.1" "aB1.2" "aB1.3" "aB1.4"
C "aC1.1" "aC1.2" "aC1.3" "aC1.4"
, , year = y2, species = a
visit
site v1 v2 v3 v4
A "aA2.1" "aA2.2" "aA2.3" "aA2.4"
B "aB2.1" "aB2.2" "aB2.3" "aB2.4"
C "aC2.1" "aC2.2" "aC2.3" "aC2.4"
, , year = y1, species = b
visit
site v1 v2 v3 v4
A "bA1.1" "bA1.2" "bA1.3" "bA1.4"
B "bB1.1" "bB1.2" "bB1.3" "bB1.4"
C "bC1.1" "bC1.2" "bC1.3" "bC1.4"
, , year = y2, species = b
visit
site v1 v2 v3 v4
A "bA2.1" "bA2.2" "bA2.3" "bA2.4"
B "bB2.1" "bB2.2" "bB2.3" "bB2.4"
C "bC2.1" "bC2.2" "bC2.3" "bC2.4"
An example for you...
If you want to experiment more with this, try turning the
arr3 array back into a 2D matrix, but this time with the rows
grouped by site (instead of by species) and the occasions groups
by visit (instead of year). The result should look something
like this:
"aA1.1" "aA2.1" "aA1.2" "aA2.2" "aA1.3" "aA2.3" "aA1.4" "aA2.4"
"bA1.1" "bA2.1" "bA1.2" "bA2.2" "bA1.3" "bA2.3" "bA1.4" "bA2.4"
"aB1.1" "aB2.1" "aB1.2" "aB2.2" "aB1.3" "aB2.3" "aB1.4" "aB2.4"
"bB1.1" "bB2.1" "bB1.2" "bB2.2" "bB1.3" "bB2.3" "bB1.4" "bB2.4"
"aC1.1" "aC2.1" "aC1.2" "aC2.2" "aC1.3" "aC2.3" "aC1.4" "aC2.4"
"bC1.1" "bC2.1" "bC1.2" "bC2.2" "bC1.3" "bC2.3" "bC1.4" "bC2.4"
Hint: Use aperm() to get a matrix with the
values in the right rows/columns/pages/books, then change the
dimensions with array() or matrix().
