Unfortunately, the way how tree species are coded in forest data varies vastly among research institutions, forest administrations, and the likes. In order to make the package ForestElementsR broadly applicable, it requires a generic coding system that can cover any specific species coding system and allows to translate from one into the other. In contrast to what one might expect, this is not a trivial task, as most existing codings do include not only codes for single species, but also for species groups. These groups are rarely the same across different codings which causes certain issues to be covered by a useful generic coding system. Such a generic approach, in addition, requires to be open to include any desired additional species and specific codings.
Before I show how to actually work with species codings in ForestElementsR, I will talk about where to find all implemented codings in the package. For the code examples below to work, you will need to attach ForestElementsR itself, and the packages tibble, dplyr, and ggplot2 from the tidyverse which make handling and output more convenient.
library(ForestElementsR)
library(tibble)
library(dplyr)
library(ggplot2)
The data.frame (actually a tibble) species_master_table is the most important part of the generic species coding system. Any single species to be included in any specific coding must be absolutely listed here, as the species master table serves as the common reference for all implemented codings. Conversely, specific species codings do not need to comprise all species provided in the species master table. In order to view this table, it is only necessary to type its name:
species_master_table
#> # A tibble: 101 × 6
#> genus species_no deciduous_conifer name_sci name_eng name_ger
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 abies 001 conif Abies alba silver fir Tanne
#> 2 abies 002 conif Abies grandis Oregon fir Große K…
#> 3 abies 003 conif Abies balsamea balsam fir Balsamt…
#> 4 abies 004 conif Abies concolor white fir Kolorad…
#> 5 abies 005 conif Abies concolor lowiana Sierra wh… Sierra-…
#> 6 abies 006 conif Abies amabilis red fir Purpurt…
#> 7 abies 007 conif Abies firma Momi fir Momi-Ta…
#> 8 abies 008 conif Abies homolepis Nikko fir Nikko-T…
#> 9 abies 009 conif Abies nordmanniana Nordmann … Nordman…
#> 10 abies 010 conif Abies procera noble fir Edeltan…
#> # ℹ 91 more rows
# Also show the tail of the table
species_master_table |> tail(10)
#> # A tibble: 10 × 6
#> genus species_no deciduous_conifer name_sci name_eng name_ger
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 sorbus 001 decid Sorbus aucuparia rowan Vogelbe…
#> 2 sorbus 002 decid Sorbus torminalis wild service… Elsbeere
#> 3 sorbus 003 decid Sorbus aria common white… Mehlbee…
#> 4 sorbus 004 decid Sorbus intermedia Swedish whit… Schwedi…
#> 5 sorbus 005 decid Sorbus domestica sorb tree Speierl…
#> 6 tilia 001 decid Tilia cordata small-leaved… Winterl…
#> 7 tilia 002 decid Tilia platyphyllos large-leaved… Sommerl…
#> 8 ulmus 001 decid Ulmus glabra Scots elm Bergulme
#> 9 ulmus 002 decid Ulmus minor field elm Feldulme
#> 10 ulmus 003 decid Ulmus laevis European whi… Flatter…
In contrast to specific codings (see below) the species master table must contain single species only, i.e. each row represents a species, never a group of species. Currently, it comprises 101 tree species. Let us have a look at the table’s anatomy:
The key fields of the species master table are genus and species_no. Together, they must be unique. Both are of type character. genus represents a specie’s genus name, always in lower case letters, and species_no is always a three-digit number with leading zeroes. This approach was chosen because of a few advantages: While genus names are usually stable, species names may change more often. Therefore, the species inside a genus are identified with a number instead of a name. New species can be easily added without the danger of running out of numbers and being thus forced to break the coding concept. For convenience, the table also contains the column deciduous_conifer which allows only for the two values conif and decid. This column is not part of the actual species key, but it is intended for filtering purposes, and for all relevant forest tree species, the distinction between both groups should be biologically correct or at least practical. The three remaining fields, name_sci, name_eng, and name_ger contain the scientific, colloquial English, and colloquial German names of all species.
All specific species codings implemented in ForestElementsR are stored in the tibble species_codings:
species_codings
#> # A tibble: 6 × 2
#> species_coding code_table
#> <chr> <named list>
#> 1 master <tibble [101 × 9]>
#> 2 tum_wwk_short <tibble [101 × 9]>
#> 3 tum_wwk_long <tibble [144 × 9]>
#> 4 ger_nfi_2012 <tibble [101 × 9]>
#> 5 bavrn_state <tibble [137 × 9]>
#> 6 bavrn_state_short <tibble [101 × 9]>
Each row in this tibble represents a specific coding; hereby the
column species_coding provides the coding’s name, and the
column code_table provides an own tibble that defines the
coding and links it to the species master table. Currently, there are
six codings implemented (master, tum_wwk_short,
tum_wwk_long, ger_nfi_2012, bavrn_state,
bavrn_state_short). We use the coding tum_wwk_short
for explaining the implementation. This species coding is used for many
purposes at the Chair of Forest Growth and Yield Science at the
Technical University of Munich. It comprises a small set of the most
important tree species in Central Europe only, while all other species
are attributed to three larger container groups. In order to see the
coding table, it could be accessed by usual indexing of the tibble
species_coding, but it is more convenient to use the function
fe_species_get_coding_table which needs to be called with the
name of the desired coding:
fe_species_get_coding_table("tum_wwk_short")
#> # A tibble: 101 × 9
#> species_id genus species_no deciduous_conifer name_sci name_eng name_ger
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 picea 001 conif Picea a… Norway … Fichte
#> 2 2 abies 001 conif Abies a… silver … Tanne
#> 3 3 pinus 001 conif Pinus s… Scots p… Kiefer
#> 4 4 larix 001 conif Larix d… Europea… Europäi…
#> 5 5 fagus 001 decid Fagus s… Europea… Buche
#> 6 6 quercus 001 decid Quercus… peduncu… Eiche (…
#> 7 6 quercus 002 decid Quercus… peduncu… Eiche (…
#> 8 7 pseudotsu… 001 conif Pseudot… Douglas… Douglas…
#> 9 8 acer 001 decid aliae d… other h… Sonstig…
#> 10 8 acer 002 decid aliae d… other h… Sonstig…
#> # ℹ 91 more rows
#> # ℹ 2 more variables: level <int>, is_tree <lgl>
Clearly, this table closely resembles the species master table, as they have in common the columns genus, species_no, deciduous_conifer, name_sci, name_eng, and name_ger. Most importantly, however, there is the additional column species_id. This column contains the actual coding, and it is always of type character, even if the coding is exclusively consisting of numbers. Such a coding table is not required to comprise all species available in the species master table, but it must not contain any species which is not included there. In other words, a coding table is not allowed to contain any combination of genus and species_no which is not contained in the species master table. The species names, however, may differ from those in the master table in order to allow e.g. for regional colloquial naming preferences or, more importantly, for naming species groups which, by definition, do not exist in the master table.
Besides species_id, every coding table carries two more columns, level and is_tree. The column level marks how fine or coarse a code is: 0 is the finest level (a single species, or a group that is not contained in any other group of the coding), and higher numbers denote ever coarser groups that nest the finer ones. For a “flat” coding that only distinguishes single species and non-overlapping groups, level is 0 throughout. Codings that additionally provide nesting group codes are called hierarchical; they are explained in Section 2.2.2. The column is_tree is TRUE for all ordinary tree-species codes and FALSE for the rare codes that denote a non-tree category such as a shrub; see Section 2.2.3.
Let us have a view on the coding in compact form:
fe_species_get_coding_table("tum_wwk_short") |>
select(species_id, name_eng) |> # English names only for clarity
distinct()
#> # A tibble: 10 × 2
#> species_id name_eng
#> <chr> <chr>
#> 1 1 Norway spruce
#> 2 2 silver fir
#> 3 3 Scots pine
#> 4 4 European larch
#> 5 5 European beech
#> 6 6 pedunculate/sessile oak (group)
#> 7 7 Douglas fir
#> 8 8 other hardwood
#> 9 9 soft deciduous wood
#> 10 10 other conifers
As it is easily visible in this display, the coding distinguishes only ten species (groups). From the names, it can be already guessed which species_ids refer to single species and which to groups, but we should use R to find this out unambiguously:
fe_species_get_coding_table("tum_wwk_short") |>
group_by(species_id, name_eng) |>
summarise(n_species = n()) |>
arrange(as.numeric(species_id)) # not required, but output is nicely sorted
#> `summarise()` has regrouped the output.
#> ℹ Summaries were computed grouped by species_id and name_eng.
#> ℹ Output is grouped by species_id.
#> ℹ Use `summarise(.groups = "drop_last")` to silence this message.
#> ℹ Use `summarise(.by = c(species_id, name_eng))` for per-operation grouping
#> (`?dplyr::dplyr_by`) instead.
#> # A tibble: 10 × 3
#> # Groups: species_id [10]
#> species_id name_eng n_species
#> <chr> <chr> <int>
#> 1 1 Norway spruce 1
#> 2 2 silver fir 1
#> 3 3 Scots pine 1
#> 4 4 European larch 1
#> 5 5 European beech 1
#> 6 6 pedunculate/sessile oak (group) 2
#> 7 7 Douglas fir 1
#> 8 8 other hardwood 43
#> 9 9 soft deciduous wood 10
#> 10 10 other conifers 40
Clearly, every species_id with n_species > 1 actually represents a group of tree species. Let us look at the smallest group (species_id 6), which comprises only two species:
fe_species_get_coding_table("tum_wwk_short") |>
select(species_id, genus, species_no, name_eng) |>
filter(species_id == "6")
#> # A tibble: 2 × 4
#> species_id genus species_no name_eng
#> <chr> <chr> <chr> <chr>
#> 1 6 quercus 001 pedunculate/sessile oak (group)
#> 2 6 quercus 002 pedunculate/sessile oak (group)
We see that the two species in this group are quercus 001 and quercus 002, but the colloquial species name in the coding table is the group name only. In order to find out the species names, we can obtain them from the species master table with the help of genus and species_no:
species_master_table |>
filter(genus == "quercus" & species_no %in% c("001", "002")) |>
select(-deciduous_conifer)
#> # A tibble: 2 × 5
#> genus species_no name_sci name_eng name_ger
#> <chr> <chr> <chr> <chr> <chr>
#> 1 quercus 001 Quercus robur pedunculate oak Stieleiche
#> 2 quercus 002 Quercus petraea sessile oak Traubeneiche
Almost every real-world coding distinguishes not only single species but also species groups. In the simplest case, a coding is a partition: each species belongs to exactly one code, and the codes never overlap. The tum_wwk_short coding used above is such a partition - the group codes 8, 9, and 10 are disjoint, and so are the single-species codes.
Some codings, however, need a species to appear both as its own code and inside a coarser group of the same coding. The Bavarian state coding bavrn_state, for instance, codes the pedunculate oak singly as 54 and the sessile oak singly as 55, but it also keeps the older group code 70 (“oak”) that comprises both. Such codings are called hierarchical. To keep casting between codings well defined, the codes of a coding must form a laminar family: the species sets of any two codes are either disjoint or fully nested - partial overlaps are forbidden. The column level records the nesting depth (0 = finest leaf, higher = coarser group).
We can see this directly in the coding table of bavrn_state. The leaf codes 54 and 55 sit at level 0, while the group code 70 that contains them sits at level 1:
fe_species_get_coding_table("bavrn_state") |>
filter(species_id %in% c("54", "55", "70")) |>
select(species_id, genus, species_no, name_eng, level)
#> # A tibble: 6 × 5
#> species_id genus species_no name_eng level
#> <chr> <chr> <chr> <chr> <int>
#> 1 54 quercus 001 pedunculate oak 0
#> 2 55 quercus 002 sessile oak 0
#> 3 70 quercus 001 oak (group) 1
#> 4 70 quercus 002 oak (group) 1
#> 5 70 quercus 003 oak (group) 1
#> 6 70 quercus 006 oak (group) 1
When a species is cast into a hierarchical coding, it is always resolved to the finest code that represents it. The pedunculate oak (quercus_001 in the master coding) therefore becomes the leaf code 54, not the group code 70:
as_fe_species_bavrn_state(fe_species_master("quercus_001")) |> unclass()
#> [1] "54"
A few codings contain legal codes that do not stand for a tree species and that one cannot compute with - for example the bavrn_state code 99 (“Strauch”, shrub). Such codes are flagged with is_tree = FALSE in the coding table. Whether a code is a tree code is derived from its link to the species master table (a code that resolves to at least one master species is a tree code), so there is no separate flag that could fall out of sync.
Two exported helpers report this information.
fe_species_non_tree_codes() lists the non-tree codes of a
coding, and fe_species_is_tree() tests, element by element,
whether the codes of an fe_species vector denote tree
species:
fe_species_non_tree_codes("bavrn_state")
#> [1] "99"
spec_ids <- fe_species_bavrn_state(c("10", "60", "99"))
fe_species_is_tree(spec_ids)
#> [1] TRUE TRUE FALSE
Constructing a species vector that contains a non-tree code is
allowed (the code is part of the coding), but objects that are meant to
hold computable trees - such as fe_stand() and its
relatives - reject them with a clear error. When a non-tree code is
cast into another coding, it resolves to NA (with
a message), because it has no tree-species equivalent.
Six species codings are currently implemented. While their
documentation is available in the package, and can be accessed with
?species_codings, I list them also here:
master: This is the original species coding used by the package ForestElementsR. It contains each species from the species_master_table and no species groups. This coding corresponds directly to the species_master_table. Its species_ids are the master table’s columns genus and species_no combined into one character string, separated by an underscore.
tum_wwk_short: This is one of two codings in use at the Chair of Forest Growth and Yield Science at the Technical University of Munich. It defines only a small set of single species explicitly (the most important ones in Central Europe), while all other species are attributed to a few large container groups.
tum_wwk_long: This is one of two codings in use at the Chair of Forest Growth and Yield Science at the Technical University of Munich. It defines a larger set of single species than the tum_wwk_short coding. This coding is hierarchical (see Section 2.2.2): besides the single-species (leaf) codes it also provides the coarser species-group codes that contain them, and it covers every species of the master table.
bavrn_state: This species coding is the coding used by the Bavarian State Forest Service. It is hierarchical (e.g. the single oak codes 54 and 55 nest in the group code 70), and it contains one non-tree code (99, “Strauch”/shrub; see Section 2.2.3).
bavrn_state_short: This coding combines the species of bavrn_state into larger groups. These groups are typically used by the Bavarian State Forest Service in aggregated evaluations.
ger_nfi_2012: The ger_nfi_2012 species coding is the species coding used by the German National Forest Inventory of 2012 (Riedel et al. 2017).
The full coding table returned by
fe_species_get_coding_table() carries one row per
elementary species. For a hierarchical coding this means a species
can occur several times (once as a leaf code and once in each group that
contains it), and a group code spreads over as many rows as it has
member species. That is exactly what the casting machinery needs, but it
is awkward as a printed lookup key for field work.
For that purpose there is fe_species_get_field_table().
It returns the code-level view: each code exactly
once, together with the name of the species or group it stands
for. The names are taken from the coding itself (not from the master
table), so group names appear as such, and all three name columns are
always included regardless of the fe_spec_lang option. The
rows are in the coding’s canonical order (leaf codes first, then the
coarser groups), and the level and is_tree columns are
kept, as both matter in the field:
fe_species_get_field_table("tum_wwk_short")
#> # A tibble: 10 × 6
#> species_id name_sci name_eng name_ger level is_tree
#> <chr> <chr> <chr> <chr> <int> <lgl>
#> 1 1 Picea abies Norway spruce Fichte 0 TRUE
#> 2 2 Abies alba silver fir Tanne 0 TRUE
#> 3 3 Pinus sylvestris Scots pine Kiefer 0 TRUE
#> 4 4 Larix decidua European larch Europäi… 0 TRUE
#> 5 5 Fagus sylvatica European beech Buche 0 TRUE
#> 6 6 Quercus robur/petraea pedunculate/ses… Eiche (… 0 TRUE
#> 7 7 Pseudotsuga menziesii Douglas fir Douglas… 0 TRUE
#> 8 8 aliae deciduae other hardwood Sonstig… 0 TRUE
#> 9 9 aliae deciduae molli ligno soft deciduous … Weichla… 0 TRUE
#> 10 10 alia conifera other conifers Sonstig… 0 TRUE
Rendering such a table into a nicely formatted, printable document (e.g. a PDF) is deliberately left to the downstream packages that already carry a document-rendering toolchain; ForestElementsR itself only provides the data.
Species codes as implemented in this package are vectors with a few special properties. Most users of the package, will work with species codes as columns in a data.frame (or tibble), where they are provided in parallel with other columns (i.e. vectors) that contain other tree information, e.g. tree diameters, heights, or spatial coordinates. For the sake of clarity, however, we demonstrate most applications for isolated vectors of species codes.
For each implemented species coding there exists a user friendly function for constructing a vector of species. The naming convention for this function is fe_species_coding_name, whereby coding_name is the name of the desired coding as in the column species_coding in the tibble species_codings (see above). Thus, e.g. for creating a vector of tum_wwk_short or ger_nfi_2012 codes, one would use the functions fe_species_tum_wwk_short or fe_species_ger_nfi_2012, respectively. As their input, these functions require a vector of codes either in numeric or character format:
spec_ids_1 <- fe_species_tum_wwk_short(c(1, 1, 1, 5, 5, 5, 5, 3, 3, 8, 9, 8))
spec_ids_2 <- fe_species_ger_nfi_2012(
c(10, 10, 10, 100, 100, 100, 100, 20, 20, 190, 290, 190)
)
spec_ids_1
#> <fe_species_tum_wwk_short[12]>
#> [1] 1 1 1 5 5 5 5 3 3 8 9 8
spec_ids_2
#> <fe_species_ger_nfi_2012[12]>
#> [1] 10 10 10 100 100 100 100 20 20 190 290 190
If the input vector contains codes which are not supported by the chosen coding, the attempt terminates with an error:
fe_species_tum_wwk_short(c(1, 321, 1, 9999))
#> Error:
#> ! Code(s) 321, 9999 is/are not supported by species coding 'tum_wwk_short'
fe_species_ger_nfi_2012(c("100", "290", "Peter", "Paul", "Mary"))
#> Error:
#> ! Code(s) Peter, Paul, Mary is/are not supported by species coding 'ger_nfi_2012'
For each implemented coding there exists a function is_fe_species_coding_name for checking whether an object is a vector of species codes of the requested class:
spec_ids <- c(1:10)
is_fe_species_tum_wwk_short(spec_ids)
#> [1] FALSE
spec_ids <- fe_species_tum_wwk_short(c(1:10))
is_fe_species_tum_wwk_short(spec_ids)
#> [1] TRUE
is_fe_species_bavrn_state(spec_ids)
#> [1] FALSE
NA values are in principle allowed in species code vectors. There may be, however, objects (like fe_stand, covered in an own vignette) which enforce species code vectors without NAs.
By default, species code vectors are displayed “as they are”, i.e. what we see are the original codes as in the column species_id in the corresponding coding’s table (see above). Sometimes, e.g. for creating output for third parties, the actual species names are preferable. The most convenient way to achieve that is to set the option fe_spec_lang which can take the values sci, eng, ger, and code. Let’s create four species code vectors
spec_ids_1 <- fe_species_tum_wwk_short(c(1, 1, 5, 5, 5, 5, 3, 3))
spec_ids_2 <- fe_species_ger_nfi_2012(c(100, 100, 20, 20, 30, 110))
spec_ids_3 <- fe_species_bavrn_state(c(60, 60, 30, 30, 84, 86))
spec_ids_4 <- fe_species_master(c("abies_001", "tilia_002", "ulmus_001"))
The default display is:
#> <fe_species_tum_wwk_short[8]>
#> [1] 1 1 5 5 5 5 3 3
#> <fe_species_ger_nfi_2012[6]>
#> [1] 100 100 20 20 30 110
#> <fe_species_bavrn_state[6]>
#> [1] 60 60 30 30 84 86
#> <fe_species_master[3]>
#> [1] abies_001 tilia_002 ulmus_001
With the option fe_spec_lang set on “sci”, the scientific species names are displayed:
options(fe_spec_lang = "sci") # Display scientific species names
spec_ids_1
#> <fe_species_tum_wwk_short[8]>
#> [1] Picea abies Picea abies Fagus sylvatica Fagus sylvatica
#> [5] Fagus sylvatica Fagus sylvatica Pinus sylvestris Pinus sylvestris
spec_ids_2
#> <fe_species_ger_nfi_2012[6]>
#> [1] Fagus sylvatica Fagus sylvatica Pinus sylvestris Pinus sylvestris
#> [5] Abies alba Quercus robur
spec_ids_3
#> <fe_species_bavrn_state[6]>
#> [1] Fagus sylvatica Fagus sylvatica Abies alba Abies alba
#> [5] Salix spec. Alnus glutinosa
spec_ids_4
#> <fe_species_master[3]>
#> [1] Abies alba Tilia platyphyllos Ulmus glabra
For printing the colloquial English species names, the option “eng” is the choice:
options(fe_spec_lang = "eng") # Display English species names
spec_ids_1
#> <fe_species_tum_wwk_short[8]>
#> [1] Norway spruce Norway spruce European beech European beech European beech
#> [6] European beech Scots pine Scots pine
spec_ids_2
#> <fe_species_ger_nfi_2012[6]>
#> [1] European beech European beech Scots pine Scots pine
#> [5] silver fir pedunculate oak
spec_ids_3
#> <fe_species_bavrn_state[6]>
#> [1] European beech European beech silver fir silver fir willow (group)
#> [6] Black alder
spec_ids_4
#> <fe_species_master[3]>
#> [1] silver fir large-leaved lime Scots elm
In the same way, you can use
options(fe_spec_lang = "ger") for having the German species
names displayed. With options(fe_spec_lang = "code") or
options(fe_spec_lang = NULL). If you do not want to work
with such options, and want just a quick check of the species names
corresponding to given codes, you could use the function
format. It takes the species code vector to be displayed, and
spec_lang, which can be “sci”, “eng”, “ger”, and “code” with
exactly the same meanings as explained above. The output of
format is never an fe_species coding object, but always a
character vector (which is useful for some purposes):
format(spec_ids_1, spec_lang = "eng")
#> [1] "Norway spruce" "Norway spruce" "European beech" "European beech"
#> [5] "European beech" "European beech" "Scots pine" "Scots pine"
format(spec_ids_2, spec_lang = "sci")
#> [1] "Fagus sylvatica" "Fagus sylvatica" "Pinus sylvestris" "Pinus sylvestris"
#> [5] "Abies alba" "Quercus robur"
format(spec_ids_3, spec_lang = "code")
#> [1] "60" "60" "30" "30" "84" "86"
format(spec_ids_4, spec_lang = "ger")
#> [1] "Tanne" "Sommerlinde" "Bergulme"
Note that the names for display are always taken from the specific coding’s table, not from the species master table. Be also aware that such species names are not the codes themselves. This means, you cannot generate a species code vector from a vector of species names:
spec_names <- c("Abies alba", "Picea abies")
fe_species_ger_nfi_2012(spec_names)
#> Error:
#> ! Code(s) Abies alba, Picea abies is/are not supported by species coding 'ger_nfi_2012'
When assigning new values to elements of a species coding vector, the safest way to do so is to provide the new values as an instance of the same class. But with all other values, an attempt will be made to convert them into an instance of the goal class. If this is not possible, the assignment does not take place, and an error is thrown.
spec_vec <- fe_species_bavrn_state(c("10", "10", "10", "50", "50", "50"))
format(spec_vec, "eng")
#> [1] "Norway spruce" "Norway spruce" "Norway spruce" "Douglas fir"
#> [5] "Douglas fir" "Douglas fir"
# Safest way, same class on both sides of the '<-'
spec_vec[3] <- fe_species_bavrn_state("40")
is_fe_species_bavrn_state(spec_vec)
#> [1] TRUE
format(spec_vec, "eng")
#> [1] "Norway spruce" "Norway spruce" "European larch" "Douglas fir"
#> [5] "Douglas fir" "Douglas fir"
# Character vector is converted
spec_vec[3:4] <- c("40", "70")
is_fe_species_bavrn_state(spec_vec)
#> [1] TRUE
format(spec_vec, "eng")
#> [1] "Norway spruce" "Norway spruce" "European larch" "oak (group)"
#> [5] "Douglas fir" "Douglas fir"
# Numerical vector is converted
spec_vec[3:4] <- c(60, 87)
is_fe_species_bavrn_state(spec_vec)
#> [1] TRUE
format(spec_vec, "eng")
#> [1] "Norway spruce" "Norway spruce" "European beech" "noble hardwood"
#> [5] "Douglas fir" "Douglas fir"
# Species code not supported by goal coding - no assignment and error
spec_vec[1:2] <- c("3333", "12")
#> Error:
#> ! Code(s) 3333 is/are not supported by species coding 'bavrn_state'
is_fe_species_bavrn_state(spec_vec)
#> [1] TRUE
format(spec_vec, "eng")
#> [1] "Norway spruce" "Norway spruce" "European beech" "noble hardwood"
#> [5] "Douglas fir" "Douglas fir"
# Vectors of other species codings are converted, if possible
spec_vec[5:6] <- fe_species_tum_wwk_short(c("3", "3")) # "3" Scots pine in rhs
# coding
is_fe_species_bavrn_state(spec_vec)
#> [1] TRUE
format(spec_vec, "code") # "3" becomes "20" ...
#> [1] "10" "10" "60" "87" "20" "20"
format(spec_vec, "eng") # ... which is Scots pine in the goal coding
#> [1] "Norway spruce" "Norway spruce" "European beech" "noble hardwood"
#> [5] "Scots pine" "Scots pine"
For each implemented species coding there is a function as_fe_species_coding_name which tries to convert an object of any other given species coding implemented in ForestElementsR into an instance of the goal object. You can use it also for converting numeric or character vectors (as an alternative to fe_species_coding_name), but the interesting feature is the conversion between different codings:
spec_ids <- as_fe_species_tum_wwk_short(c("1", "3", "5"))
as_fe_species_ger_nfi_2012(spec_ids) |> format("eng")
#> [1] "Norway spruce" "Scots pine" "European beech"
When the initial species code vector contains codes which belong to
the same species group in the goal coding, information is lost when
doing the conversion. This is a backward ambiguous cast. In
such a case, the conversion is executed, but a message is
issued. (In earlier versions of the package this was a warning;
it was downgraded to a message because such information loss is the
normal, intended outcome of aggregating into coarser groups, and a
warning forced users to wrap every deliberate aggregation in
suppressWarnings().)
spec_ids_1 <- as_fe_species_ger_nfi_2012(c("170", "150", "140"))
spec_ids_1 |> format("eng")
#> [1] "elm (spec.)" "lime (spec.)" "sycamore maple"
# Backward ambiguous cast (possibly, but with information loss)
spec_ids_2 <- as_fe_species_tum_wwk_short(spec_ids_1)
#> Cast loses information. Goal code(s) 8 correspond to 3 original code(s).
spec_ids_2 |> format("eng")
#> [1] "other hardwood" "other hardwood" "other hardwood"
Conversely, when casting into a hierarchical coding (one that offers both single-species and group codes, see Section 2.2.2), each species is resolved to the finest code available for it - the single-species code if there is one, the smallest containing group otherwise. This happens automatically and without information loss:
# Pedunculate and sessile oak resolve to the single codes 54 and 55,
# not to the group code 70
as_fe_species_bavrn_state(fe_species_master(c("quercus_001", "quercus_002"))) |>
format("code")
#> [1] "54" "55"
Conversions with no match in the goal coding terminate with an error:
spec_ids <- as_fe_species_bavrn_state(c("11", "11", "11"))
spec_ids |> format("eng")
#> [1] "Serbian spruce" "Serbian spruce" "Serbian spruce"
# No Serbian spruce in the tum_wwk_long coding
spec_ids |> as_fe_species_tum_wwk_long()
#> <fe_species_tum_wwk_long[3]>
#> Error in `match.arg()`:
#> ! 'arg' muss NULL oder ein Zeichenkettenvektor sein
Forward ambiguous casts occur when one code in the initial code vector has several matches in the goal coding. If this is the case, execution terminates, and an error is thrown:
# Each of these codes comprises many single species
spec_ids <- fe_species_tum_wwk_short(c("8", "9", "10"))
spec_ids |> format("eng")
#> [1] "other hardwood" "soft deciduous wood" "other conifers"
# Conversion attempt terminates with error
spec_ids |> as_fe_species_ger_nfi_2012()
#> Error:
#> ! Ambiguous cast attempt. Original code(s) 10, 8, 9 correspond(s) to 12, 23, 9 goal code(s).
# Similar
as_fe_species_master(fe_species_ger_nfi_2012("90"))
#> Error:
#> ! Ambiguous cast attempt. Original code(s) 90 correspond(s) to 10 goal code(s).
There is one controlled exception to the forward-ambiguous error. A
few source group codes genuinely straddle two groups of a goal coding,
so there is no single matching target node, yet a sensible aggregate
exists. For these cases the package ships a small, documented table,
species_cast_overrides, that declares the deliberate target
code. When such an override applies, the cast is carried out (lossily,
with a message) instead of raising an error:
species_cast_overrides
#> # A tibble: 3 × 4
#> coding_from coding_to species_id_from species_id_to
#> <chr> <chr> <chr> <chr>
#> 1 ger_nfi_2012 tum_wwk_short 290 8
#> 2 bavrn_state tum_wwk_short 70 6
#> 3 bavrn_state tum_wwk_short 80 8
# ger_nfi_2012 code 290 has no single match in tum_wwk_short, but the
# override resolves it to code 8
as_fe_species_tum_wwk_short(fe_species_ger_nfi_2012("290")) |> format("code")
#> Applied cast override(s) ger_nfi_2012 -> tum_wwk_short: 290 -> 8
#> [1] "8"
Finally, a non-tree code (see Section 2.2.3) has no tree-species equivalent
in any goal coding, so it is resolved to NA (again with a
message) rather than treated as a failed match:
as_fe_species_tum_wwk_short(fe_species_bavrn_state(c("10", "99"))) |>
unclass()
#> Non-tree code(s) 99 have no equivalent in coding 'tum_wwk_short' and become NA.
#> [1] "1" NA
Note that the operability of a species coding cast is checked for each single conversion attempt, because it does depend on the single species codes to be converted. I.e. some conversions between the same codings will work well while others fail:
# Conversion from tum_wwk_short to ger_nfi_2012 - works
spec_ids_1 <- fe_species_tum_wwk_short(c("1", "3", "5"))
spec_ids_1 |> format("eng")
#> [1] "Norway spruce" "Scots pine" "European beech"
spec_ids_2 <- as_fe_species_ger_nfi_2012(spec_ids_1)
spec_ids_2 |> format("eng")
#> [1] "Norway spruce" "Scots pine" "European beech"
# Conversion from tum_wwk_short to ger_nfi_2012 - fails
spec_ids_1 <- fe_species_tum_wwk_short(c("8", "9", "10"))
spec_ids_1 |> format("eng")
#> [1] "other hardwood" "soft deciduous wood" "other conifers"
spec_ids_2 <- as_fe_species_ger_nfi_2012(spec_ids_1)
#> Error:
#> ! Ambiguous cast attempt. Original code(s) 10, 8, 9 correspond(s) to 12, 23, 9 goal code(s).
In some cases one might want to extract the character vector of species codes out of an fe_species_coding_name vector. This is possible either with unclass or with vctrs::vec_data (the species codings are implemented based on the package vctrs).
spec_ids <- fe_species_ger_nfi_2012(c("10", "10", "100", "170"))
spec_ids
#> <fe_species_ger_nfi_2012[4]>
#> [1] 10 10 100 170
chars_1 <- unclass(spec_ids)
chars_1
#> [1] "10" "10" "100" "170"
chars_2 <- vctrs::vec_data(spec_ids)
chars_2
#> [1] "10" "10" "100" "170"
is_fe_species_ger_nfi_2012(chars_1)
#> [1] FALSE
is_fe_species_ger_nfi_2012(chars_2)
#> [1] FALSE
is.character(chars_1)
#> [1] TRUE
is.character(chars_2)
#> [1] TRUE
As mentioned above, species codes do typically not come as isolated vectors, but as columns in a data frame (tibble). We isolate one such data frame from the fe_stand object selection_forest_1_fe_stand which is among the example data that come with the package ForestElementsR:
dat <- selection_forest_1_fe_stand$trees |> select(
tree_id, species_id, time_yr, dbh_cm, height_m
)
dat
#> # A tibble: 283 × 5
#> tree_id species_id time_yr dbh_cm height_m
#> <chr> <tm_wwk_shrt> <dbl> <dbl> <dbl>
#> 1 1 1 2022 9.6 9.5
#> 2 2 1 2022 9.1 8.5
#> 3 3 1 2022 11 10.9
#> 4 4 1 2022 24.9 23
#> 5 5 1 2022 20.9 19.4
#> 6 6 1 2022 8.2 8.3
#> 7 7 1 2022 22.6 19.8
#> 8 8 1 2022 18.8 19
#> 9 9 1 2022 27.8 26.2
#> 10 10 1 2022 26.8 25.8
#> # ℹ 273 more rows
Here, each row represents one tree, the column species_id represents species codes, and the other columns represent additional key fields (tree_id, time_yr) and tree data (dbh_cm, height_m). When the package tidyverse or tibble is attached, the tibble is displayed as shown below, and the abbreviation tm_wwk_shrt indicates, that the coding is tum_wwk_short. As by standard only the first ten lines are shown, we see only the species code “1”. For finding out if there are more species, we could use the function summary:
dat |> summary()
#> tree_id species_id time_yr dbh_cm height_m
#> Length :283 1:130 Min. :2022 Min. : 7.00 Min. : 7.20
#> N.unique :283 2: 98 1st Qu.:2022 1st Qu.: 9.90 1st Qu.:11.65
#> N.blank : 0 5: 42 Median :2022 Median :17.50 Median :19.40
#> Min.nchar: 1 8: 13 Mean :2022 Mean :21.01 Mean :19.83
#> Max.nchar: 3 3rd Qu.:2022 3rd Qu.:28.05 3rd Qu.:26.95
#> Max. :2022 Max. :73.40 Max. :39.30
Very similar to a summary for a factor, the summary for the column species_id provides the row counts for each of the four coded species. In order to display species names instead of the codes, we have to set the option fe_spec_lang (see also above):
# Set option to display colloquial English species names, and store the
# previous setting in opt_prev
opt_prev <- getOption("fe_spec_lang")
options(fe_spec_lang = "eng")
# Display dat
dat
#> # A tibble: 283 × 5
#> tree_id species_id time_yr dbh_cm height_m
#> <chr> <tm_wwk_shrt> <dbl> <dbl> <dbl>
#> 1 1 Norway spruce 2022 9.6 9.5
#> 2 2 Norway spruce 2022 9.1 8.5
#> 3 3 Norway spruce 2022 11 10.9
#> 4 4 Norway spruce 2022 24.9 23
#> 5 5 Norway spruce 2022 20.9 19.4
#> 6 6 Norway spruce 2022 8.2 8.3
#> 7 7 Norway spruce 2022 22.6 19.8
#> 8 8 Norway spruce 2022 18.8 19
#> 9 9 Norway spruce 2022 27.8 26.2
#> 10 10 Norway spruce 2022 26.8 25.8
#> # ℹ 273 more rows
# Display a summary of dat
dat |> summary()
#> tree_id species_id time_yr dbh_cm
#> Length :283 European beech: 42 Min. :2022 Min. : 7.00
#> N.unique :283 Norway spruce :130 1st Qu.:2022 1st Qu.: 9.90
#> N.blank : 0 other hardwood: 13 Median :2022 Median :17.50
#> Min.nchar: 1 silver fir : 98 Mean :2022 Mean :21.01
#> Max.nchar: 3 3rd Qu.:2022 3rd Qu.:28.05
#> Max. :2022 Max. :73.40
#> height_m
#> Min. : 7.20
#> 1st Qu.:11.65
#> Median :19.40
#> Mean :19.83
#> 3rd Qu.:26.95
#> Max. :39.30
# Reset option to previous value
options(fe_spec_lang = opt_prev)
Let’s assume, we want to know the mean stem volume per species (group) and its standard deviation. In order to achieve that, we require each tree’s volume first. This can be done with the function v_gri which requires the three inputs species_id, dbh_cm, and height_m. The function v_gri is originally designed to work with the species coding tum_wwk_short (as available in the example data), but it can process any input for species_id that can be converted into the former.
opt_prev <- getOption("fe_spec_lang")
options(fe_spec_lang = "eng")
dat <- dat |>
mutate(v_cbm = v_gri(species_id, dbh_cm, height_m))
# Note that the summary of species_id does not preserve the original order of
# the codes (species are alphabetically sorted, dependent on language setting)
dat |> summary()
#> tree_id species_id time_yr dbh_cm
#> Length :283 European beech: 42 Min. :2022 Min. : 7.00
#> N.unique :283 Norway spruce :130 1st Qu.:2022 1st Qu.: 9.90
#> N.blank : 0 other hardwood: 13 Median :2022 Median :17.50
#> Min.nchar: 1 silver fir : 98 Mean :2022 Mean :21.01
#> Max.nchar: 3 3rd Qu.:2022 3rd Qu.:28.05
#> Max. :2022 Max. :73.40
#> height_m v_cbm
#> Min. : 7.20 Min. :0.00996
#> 1st Qu.:11.65 1st Qu.:0.03441
#> Median :19.40 Median :0.23213
#> Mean :19.83 Mean :0.65723
#> 3rd Qu.:26.95 3rd Qu.:0.83518
#> Max. :39.30 Max. :6.97478
options(fe_spec_lang = opt_prev)
The summary reveals a wide range of volumes which is plausible, given the range of dbh and height values. For obtaining the mean volumes per species (group), we can use the dplyr functions group_by and summarise which work also with our species codings. We see from the summary below that e.g. Abies alba has the smallest mean stem volume which comes, however, with the highest standard deviation.
# Set option for displaying scientific species names
opt_prev <- getOption("fe_spec_lang")
options(fe_spec_lang = "sci")
dat |>
group_by(species_id) |>
summarise(
mean_stem_volume_cbm = mean(v_cbm),
sd_stem_volume_cbm = sd(v_cbm)
)
#> # A tibble: 4 × 3
#> species_id mean_stem_volume_cbm sd_stem_volume_cbm
#> <tm_wwk_shrt> <dbl> <dbl>
#> 1 Picea abies 0.669 0.926
#> 2 Abies alba 0.553 1.14
#> 3 Fagus sylvatica 0.805 0.829
#> 4 aliae deciduae 0.855 0.533
# In contrast to summary, summarise keeps the original order of the species
# codes, no matter the language setting
options(fe_spec_lang = opt_prev)
Note, that plotting functions do currently not work with the species codings. Use the format function for such purposes:
# Note: Using simply 'format(species_id)' below would use the current setting
# of the option fe_spec_lang
dat |>
ggplot() +
geom_point(aes(x = dbh_cm, y = v_cbm, col = format(species_id, "eng"))) +
scale_color_discrete("Species") +
scale_x_log10() +
scale_y_log10()
Stem volume over diameter by species in log-log display
There are two rather different developer tasks around species codings, and it helps to keep them apart:
Maintaining the data of the codings -
adding a species to the master table, adding a code to an existing
coding, fixing a name, or building a “short” aggregation coding. This is
now entirely CSV-driven: a set of exported builder functions
turns editable CSV files into the validated package data, steered by two
workbench scripts in data-raw/. You edit CSV, you do not
edit R code. Section 4.1 describes the layout,
and Section 4.2 is a step-by-step
recipe.
Adding a genuinely new coding - this additionally needs a new S3 (vctrs) class and the cast functions that connect it to all the other codings. That part still lives in R source files and is described in Section 4.3.
Finally, Section 4.4 repeats the standing
warning never to touch fe_species_helper_functions.R
without knowing exactly what you are doing.
Before we get into the details, note that all species codings inherit from the vctrs_vctr class, which is provided by the package vctrs:
fe_species_bavrn_state("30") |> class()
#> [1] "fe_species_bavrn_state" "vctrs_vctr"
fe_species_ger_nfi_2012("20") |> class()
#> [1] "fe_species_ger_nfi_2012" "vctrs_vctr"
fe_species_tum_wwk_long("87") |> class()
#> [1] "fe_species_tum_wwk_long" "vctrs_vctr"
fe_species_tum_wwk_short("7") |> class()
#> [1] "fe_species_tum_wwk_short" "vctrs_vctr"
fe_species_master("abies_004") |> class()
#> [1] "fe_species_master" "vctrs_vctr"
While this does not allow for building species_coding super- and subclasses, which would be an obvious feature for a system of species codings, it has a very convenient way of supporting casts between different classes. As this is a key requirement of our implementation, we decided to design a vctrs based solution.
All coding data is generated from editable CSV files by exported
builder functions; the package data objects
(species_master_table, species_codings,
species_cast_overrides) are the output, never
edited directly.
The master table lives in
data-raw/species_master_table.csv (exactly six columns:
genus, species_no, deciduous_conifer,
name_sci, name_eng, name_ger; one row per
single species). master_template_csv() writes a fresh
snapshot, and master_table_from_csv() reads it back with
strict validation (unique keys, lower-case genus, three-digit
species_no, no NAs).
Each coding has its own CSV in
data-raw/codings/<coding>.csv, in the
species-indexed + parent_code format. There is one
row per master species, plus extra declaration rows for group names. The
key columns are:
parent_code chains.coding_template_from_master() produces such a CSV
(blank, or prefilled from an existing coding), and
coding_table_from_template() turns the edited CSV into a
validated coding table. It checks the laminarity invariant
(codes are nested or disjoint, never partially overlapping), derives
level and is_tree, and stores the rows in canonical
order. For a “short” aggregation coding it additionally verifies,
against the freshly built parent coding, that the coding is a valid
coarsening (every parent group maps to exactly one short code). The
parent/child pairs (bavrn_state_short ←
bavrn_state, tum_wwk_short ←
tum_wwk_long) are registered internally.
Cast overrides are their own little CSV
(data-raw/codings/cast_overrides.csv), built and validated
by cast_overrides_from_csv() into the package object
species_cast_overrides (see Section 3.3).
These are the day-to-day data tasks, both driven by a workbench
script in data-raw/ that is meant to be run
manually, block by block (not source()d in
one go - the first block can overwrite a CSV).
To add or change a species in the master table
(data-raw/species_master_table.R):
data-raw/species_master_table.csv from the installed
table.master_table_from_csv(), inspect, then
usethis::use_data() and reinstall the package so the coding
builder sees the new master.To add a code to a coding, fix a name, or build a short
coding (data-raw/species_codings.R):
coding_name and
mode ("new" starts a blank CSV from the
master; "edit" prefills from the installed coding). The
parent coding, if any, is resolved automatically.data-raw/codings/<coding>.csv.species_codings object (parents before children), then
usethis::use_data() and reinstall. If the change affects a
cast override, rerun data-raw/cast_overrides.R as
well.devtools::test() and a full
R CMD check. Note that the data-raw CSVs are
excluded from the built package (.Rbuildignore), so the
test suite is deliberately written to work without them - run
the real check, not just test_dir() against a loaded
session.A genuinely new coding needs two things. Its data is built
exactly as in Sections 4.1 and 4.2: add a new
data-raw/codings/<coding>.csv, register the coding
name in data-raw/species_codings.R (and, if the new coding
is a “short” aggregation of an existing one, add a parent/child row to
the internal aggregation registry). The builder takes care of
level, is_tree, and non-tree codes automatically. What
remains is the code: a vctrs S3 class plus the cast functions
that connect the new coding to every other coding. The remaining steps
cover that code side.
Now, you must provide the functions in order to make your new coding workable. While this sounds difficult, it is actually really easy. Before we explain how to do that, be aware of the following naming convention:
The S3 class covering your species coding must be named “fe_species_” followed by the name of your coding.
In other words, if your new coding is named john_doe_coding (and that is also exactly what you called it in the tibble species codings), then your S3 class name must be fe_species_john_doe_coding.
First, copy the R source file of one of the implemented codings, and give it the name of your S3 class (in our example fe_species_john_doe_coding.R). Note, that the files with the existing implementation follow this naming convention. For this explanation, I assume you have copied and renamed the file fe_species_tum_wwk_short.R. You could now literally get an almost working implementation by automatic search for the term fe_species_tum_wwk_short and replace it with fe_species_john_doe_coding, however, if you must, do it function by function, not for the whole file in one go. Note, that you must also exchange the terms in the documentation above each function, not only in the R code itself. Important: you will also have to adjust the examples by using species codes which are actually covered by your coding. Otherwise, the examples will not work, and the package will not pass R CMD check.
From top to bottom of the file fe_species_tum_wwk_short.R, the functions to update are:
the constructor new_fe_species_tum_wwk_short
is_fe_species_tum_wwk_short
the formatter format.fe_species_tum_wwk_short
summary.fe_species_tum_wwk_short
vec_ptype_abbr.fe_species_tum_wwk_short; here you should also replace the provided abbreviation for the coding name by one of your own (this abbreviation is printed e.g. as type information below the column head if your coding is a column of a tibble)
validate_fe_species_tum_wwk_short
fe_species_tum_wwk_short, the function users should use for constructing an instance of a species coding object
vec_proxy_order.fe_species_tum_wwk_short; guarantees always the same order if species id’s are to be sorted. The order will not change, even if the option fe_spec_lang is changed
Now comes a block of species type casting functions. Their names are built like .e.g vec_cast.fe_species_tum_wwk_short.fe_species_ger_nfi_2012, which means vec_cast.fe_species_GOAL_CODING_NAME.fe_species_FROM_CODING_NAME. These functions are very short, and some of them use the coding names internally. If you are qualified to work on this R package, you understand immediately, what to adapt. In general, in the function names, you must replace tum_wwk_short as the goal coding with john_doe_coding, In addition, you must copy one of the functions which casts between two species codings, and adapt it so that it casts from tum_wwk_short to john_doe_coding, i.e. name it vec_cast.fe_species_john_doe_coding.fe_species_tum_wwk_short, and make the obvious adaptions in the function’s body.
as_fe_species_tum_wwk_short which is the actual functions users call for casts between codings
In the previous step, you have placed a vec_cast function that casts other codings into your new coding in the implementation of the new coding. Now, you have to add such a function that casts from your coding into another coding to the implementation of each other coding. In other words, the implementation of fe_species_tum_wwk_short requires a function called vec_cast.fe_species_tum_wwk_short.fe_species_john_doe_coding, and the implementation of fe_species_ger_nfi_2012 requires a function vec_cast.fe_species_ger_nfi_2023.fe_species_john_doe_coding, and so on.
Clearly, when implementing your new species coding by editing an existing source file, you must adapt the existing documentation you find there to the new requirements. However, you must not forget to add your coding to the general documentation of species codings of the package. You find this in the file data_species_codings.R which is Roxygen2 code. Add a short description and examples in the same style as you find it for the other codings.
The package ForestElementsR comprises a suite of automated tests. You
must add your now coding also there. You find the implementations of the
tests in the subdirectory /tests/testhat/; the files you need
are called test_species_coding_consistency, and
test_species_coding_casts. Several tests also iterate over
all codings (e.g. for canonical row order, completeness, name
uniqueness, and non-tree handling); a new coding is picked up there
automatically once it is part of species_codings. See how
the tests for the other codings are implemented, and follow these
examples.
The functions in the source file fe_species_helper_functions.R were very carefully crafted, and they provide the common technical background for existing and future species codings implemented in the package ForestElementsR. If you fiddle around there without knowing 500% exactly what you are doing, you will almost certainly goof it up.