R2camtrapdp converts camera-trap data held in an
arbitrary spreadsheet into a Camera Trap Data Package (Camtrap
DP).
This version is schema-driven: the structure, types and constraints of the output tables are read from the official Frictionless table schemas of the Camtrap DP version you choose. As a result the package
1.0, 1.0.1, 1.0.2) — and with
other schema flavors such as the bioacoustics extension
(see §8) — simply by pointing it at the right schema,required, unique, enum,
minimum/maximum, pattern,
date/datetime format, …),The classic helper functions (create_deployments(),
create_media(), create_observations()) and the
R6_CamtrapDP class keep the same names and arguments as
before, so existing scripts continue to work. The new schema-driven
behaviour is added on top.
Note on internet access. Setting a table (
set_deployments()etc.) downloads the table schema for the chosenversionfrom GitHub the first time it is needed, and then caches it. If you work offline, pass a downloaded schema file with thelocal_schema =argument.
The package ships with example data for several deployments with image records.
# multiple deployments with image data
data("Idep") # deployment table
data("Iobs") # observation tableIdep holds one row per deployment (camera placement)
with columns such as deploymentID, longitude,
latitude, locationID,
startDate/startTime,
endDate/endTime, cameraID,
cameraModel, Delay, Height,
bait and setupBy. Iobs holds one
row per observation with the institution/collection codes,
filename, deploymentID,
date/time, obsID,
eventID, eventStart/eventEnd,
object, genus, species,
class and individualCount.
The whole pipeline is driven by the schema of the version you pick.
Camtrap DP versions 1.0, 1.0.1 and
1.0.2 are all supported; their table schemas share the same
field names, types and constraints, so the only practical difference is
that 1.0.2 recognises a few more missing-value tokens
(NA, NaN, nan). You can inspect
the schema of any version directly with TableSchema.
(Note: the official 1.0 profile — the metadata
JSON Schema — has an upstream bug, a malformed internal
$ref, that newer Frictionless rejects. Specifying
version = "1.0" therefore emits a warning;
validate_frictionless() works around the bug automatically,
but 1.0.1 or later is recommended.)
version <- "1.0.1"
dep_schema <- TableSchema$new("deployments", version = version)
dep_schema$field_names() # every column the schema defines
dep_schema$required_field_names() # columns that must be present and non-missing
dep_schema$empty_table() # a 0-row, correctly typed "shell" tableYou rarely need to do this by hand — the R6_CamtrapDP
object loads and caches the right schema for you — but it is useful for
understanding what a given version expects.
check_schema() confirms that the schema itself is a
well-formed Frictionless Table Schema (supported field
types, constraints that are valid for each type,
primary/foreign keys that reference defined fields) — useful before
adopting a brand-new or hand-edited schema.
Some Camtrap DP information is specified not as a machine-checkable
constraint but as a URL: semantic mappings
(skos:exactMatch / broadMatch /
narrowMatch to Darwin Core, Audubon Core, … terms) and
reference URLs in field descriptions (for example the IANA media-type
registry for fileMediatype, or method DOIs for
individualSpeed). The package only enforces the structured
constraints; the URL-referenced meaning is not validated. To
make sure you never overlook such a specification when adopting a
version or a new schema flavor, list them with:
dep_schema$external_references() # every URL the schema declares (skos, descriptions, schema URL)
dep_schema$semantic_only_fields() # fields whose meaning is URL-defined and cannot be value-checkedexternal_references() returns a tidy table
(resource, field, key,
category, url);
semantic_only_fields() flags the columns you should check
against the referenced authority by hand. The whole package can be
scanned at once with datapackage$external_references().
Using the deployment data (Idep), the deployments table
is created exactly as before. create_deployments() accepts
either combined datetimes or separate date/time columns.
deployments <- create_deployments(
deploymentID = Idep$deploymentID,
longitude = Idep$longitude,
latitude = Idep$latitude,
locationID = Idep$locationID,
deploymentStart_date = Idep$startDate,
deploymentStart_time = Idep$startTime,
deploymentEnd_date = Idep$endDate,
deploymentEnd_time = Idep$endTime,
cameraID = Idep$cameraID,
cameraModel = Idep$cameraModel,
cameraDelay = Idep$Delay,
cameraHeight = Idep$Height,
baitUse = Idep$bait,
setupBy = Idep$setupBy)create_deployments() also accepts (not shown above):
deploymentStart / deploymentEnd (combined
datetimes, used instead of the *_date / *_time
pairs), locationName, coordinateUncertainty,
cameraDepth (mutually exclusive with
cameraHeight), cameraTilt,
cameraHeading, detectionDistance,
timestampIssues, featureType,
habitat, deploymentGroups,
deploymentTags, deploymentComments, and
tz (time zone, default "Asia/Tokyo").
# media ID
mediaIDi <- paste(Iobs$institutionCode,
Iobs$collectionCode,
Iobs$locationID,
as.numeric(factor(Iobs$filename)),
sep = "_")
# file information
fileName <- Iobs$filename
filetype <- tolower(unlist(lapply(strsplit(fileName, "\\."), "[", 2)))
fileMediatype <- paste("image", filetype, sep = "/")
filePublic <- !grepl("ヒト", fileName) # hide human images from the public
media <- create_media(
mediaID = mediaIDi,
deploymentID = Iobs$deploymentID,
timestamp_date = Iobs$date,
timestamp_time = Iobs$time,
filePath = "Image",
filePublic = filePublic,
fileMediatype = fileMediatype,
captureMethod = "activityDetection",
fileName = fileName)create_media() also accepts (not shown above):
timestamp (combined datetime, instead of
timestamp_date / timestamp_time),
exifData, favorite,
mediaComments, tz, and
omitduplicate (drop duplicate mediaIDs,
default TRUE).
# event-based observations
observationLevel <- "event"
# observationType must be one of the schema enum values
observationType <- ifelse(Iobs$object == "hito", "human",
ifelse(Iobs$object == "none", "blank",
ifelse(Iobs$object == "unidentifiable", "unknown", "animal")))
# scientific name
scientificName <- ifelse(is.na(Iobs$genus), Iobs$class, paste(Iobs$genus, Iobs$species))
# unique observation IDs
observationID <- paste(mediaIDi, Iobs$obsID, sep = "_")
observations <- create_observations(
observationID = observationID,
deploymentID = Iobs$deploymentID,
eventID = Iobs$eventID,
eventStart = Iobs$eventStart,
eventEnd = Iobs$eventEnd,
observationLevel = observationLevel,
observationType = observationType,
scientificName = scientificName,
count = Iobs$individualCount,
classificationMethod = "human",
classificationProbability = 1)create_observations() also accepts (not shown above):
mediaID, the eventStart_date /
eventStart_time and eventEnd_date /
eventEnd_time pairs (instead of combined
eventStart / eventEnd),
cameraSetupType, lifeStage, sex,
behavior, individualID,
individualPositionRadius,
individualPositionAngle, individualSpeed,
bboxX, bboxY, bboxWidth,
bboxHeight, classifiedBy,
classificationTimestamp, observationTags,
observationComments, tz, and
omitduplicate.
The version you give here selects the schemas used for
validation and written into datapackage.json. Change it to
target a different Camtrap DP release.
set_deployments(), set_media() and
set_observations() keep their original names, but now each
one coerces the table to the schema types and validates it
against the schema for the chosen version. Any problems are
printed as a summary; you can switch the printing off with
validate = FALSE.
datapackage$set_deployments(deployments)
datapackage$set_media(media)
datapackage$set_observations(observations)(The chunks that download a schema, write files, look up taxonomy, or call Python are shown but not executed when this vignette is built, so they produce no output here.)
The validation summary tells you, for every issue, the file, the
column, the row, the violated rule and a message — for example a value
that breaks an enum, a number outside its
minimum/maximum, or a datetime that does not
match the required format. A value that does not even fit the column
type (e.g. a non-numeric string in a number field) is
reported as a type error rather than being silently turned
into NA.
Foreign keys (e.g. media.deploymentID must exist in
deployments, and observations.mediaID must
exist in media) and primary-key uniqueness are read from
each table’s schema and checked across the tables you have added.
If a primary-key or a required foreign-key column is entirely
missing in a stored table (often a column-name mismatch that
coercion filled with NA), check_relations()
warns and points at the data, e.g.
datapackage$data$observations has 'deploymentID' entirely missing ...,
so you can inspect datapackage$data$<resource>
directly.
Camtrap DP requires five metadata properties (contributors, project,
spatial, temporal, taxonomic — plus created). Six further
properties are optional. The metadata functions are unchanged from
previous versions.
The required metadata is itself read from the package
profile (a JSON Schema).
metadata_requirements() lists every required top-level
property, the method that sets it, and whether it is currently set;
check_metadata() validates the current object against the
profile and reports anything missing (including nested keys such as
project.samplingDesign).
datapackage$metadata_requirements() # checklist: property, required, set_with, currently_set
datapackage$check_metadata() # report missing required metadataThis is the R-side counterpart of the metadata (profile) validation that Frictionless performs (§6), so you can confirm the required structure before writing the package and calling Python.
add_contributors() imports a data frame with columns
title, email, path,
role and organization. role may
be contact, principalInvestigator,
rightsHolder, publisher or
contributor.
cd <- data.frame(
title = c("Keita Fukasawa", "Kana Terayama"),
email = c("fukasawa@nies.go.jp", "terayama.kana@nies.go.jp"),
path = c("https://orcid.org/0000-0003-0272-9180",
"https://orcid.org/0000-0001-6935-7233"),
role = c("contact", "principalInvestigator"),
organization = c("National Institute for Environmental Studies (NIES)",
"National Institute for Environmental Studies (NIES)"))
datapackage$add_contributors(cd)datapackage$set_project(
title = "DummyData",
samplingDesign = "simpleRandom",
captureMethod = "activityDetection",
individualAnimals = FALSE,
observationLevel = "event")samplingDesign is one of simpleRandom,
systematicRandom, clusteredRandom,
experimental, targeted or
opportunistic; captureMethod is
activityDetection or timeLapse;
observationLevel is media or
event. The optional id, acronym,
description and path arguments are also
available.
set_st() derives the spatial and temporal coverage from
the deployments, so it must be called after
set_deployments().
set_taxon() lists the unique scientificName
values from the observations and looks up taxonID,
taxonRank and the higher taxonomy from a taxonomic database
(gbif by default; also itis /
ncbi; see taxadb::get_ids). The Camtrap DP
taxonomic block requires a taxonID (a GBIF /
IUCN identifier or URI), so taxadb is a required dependency
of R2camtrapdp (installed with it); this step also needs internet
access.
Names that cannot be matched get taxonID = NA (omitted
from the output, not a bogus <uri>NA).
set_taxon() warns about scientificName values
with unnecessary whitespace and about names with no taxonID
in the chosen database, so you can clean or check those names.
Camtrap DP expects at least one license for the data and one for the media.
set_custom() attaches an extra resource (for example
data used by an abundance estimator) as metadata. It must be called
after the three core tables have been set.
# return the camtrapdp object
data_camtrapdp <- datapackage$out_camtrapdp()
# or also write deployments.csv / media.csv / observations.csv + datapackage.json
datapackage$out_camtrapdp(write = TRUE, directory = path)When written, the CSV files contain every schema column, booleans are
written as true/false, and unset metadata is
omitted so that empty placeholders do not cause spurious validation
errors.
Before running Python, you can check on the R side whether the package is even a well-formed Frictionless data package — and whether it is Camtrap DP form. This mirrors, in R, the structural checks Frictionless performs, so problems with a brand-new or unusual schema surface early.
datapackage$check_descriptor() # package + table-schema structure (Frictionless spec)
datapackage$check_camtrap_profile() # warn if the profile is not a Camtrap DP profileA package can be a valid Frictionless data package without
being Camtrap DP form: that depends on whether its
profile is the Camtrap DP profile (which is the default).
The authoritative check, including GeoJSON validity and the physical
file structure, is still the Frictionless run below.
You can confirm the written package against the official schemas with
the Python Frictionless
validator. This requires Python with frictionless installed
(pip install frictionless).
issues <- datapackage$validate_frictionless(directory = path, python = "python")
ctdp_is_valid(issues) # TRUE if there are no errorsNote — this rewrites path.
validate_frictionless() defaults to
write = TRUE, so it calls out_camtrapdp() and
overwrites the datapackage.json and CSVs
in directory from the current object before validating. To
validate a package that already exists on disk without
overwriting it, use write = FALSE, or the
standalone validate-only function (no R6 object needed):
issues is a tidy table with one row per problem, giving
the source file, the field (column or property
path), the row, the violated constraint, the
offending value, and a message, so you can see
exactly where any error occurs. For cell errors value is
the failing cell; for metadata (profile) errors it is resolved from
datapackage.json via the property path in the note
(e.g. contributors[].email → the actual email value(s)).
You can also aggregate the R-side schema checks, the relation checks,
the metadata (profile) checks, the conformance pre-checks and
(optionally) the Frictionless report in one call:
The helpers above assume you already named your variables. If instead
you have a raw spreadsheet with its own column names, you can map and
validate it in one step with ctdp_build_table(), which
applies a column mapping, merges separate date/time columns, coerces to
the schema types and validates — for any version.
version <- "1.0.1"
dep_schema <- TableSchema$new("deployments", version = version)
# an example raw sheet with arbitrary column names + a custom column
raw <- data.frame(
station = c("A01", "A02"),
lat = c(35.1, 36.2),
lon = c(139.5, 140.1),
start_day = c("2023-04-01", "2023-04-02"),
start_clk = c("09:00:00", "10:30:00"),
end_day = c("2023-05-01", "2023-05-02"),
end_clk = c("09:00:00", "10:30:00"),
myNote = c("kept as a custom column", "kept too"),
stringsAsFactors = FALSE)
# mapping: names are SOURCE columns, values are Camtrap DP FIELD names
mapping <- c(station = "deploymentID", lat = "latitude", lon = "longitude")
built <- ctdp_build_table(
dep_schema, raw, mapping = mapping,
datetime_merges = list(
list(date_col = "start_day", time_col = "start_clk", target = "deploymentStart"),
list(date_col = "end_day", time_col = "end_clk", target = "deploymentEnd")))
ctdp_summarize_validation(built$issues) # any schema problems
datapackage$set_deployments(built$data) # feed the result into the packageCustom columns such as myNote are kept; when the package
is written, the custom column is declared in an inline extended schema
in datapackage.json so that Frictionless accepts it.
Because every table is driven by the schema you point it at, the
package is not limited to the camera-trap schemas hosted by TDWG. To
target a different flavor — for instance the bioacoustics
extension of Camtrap DP — give the table and profile URLs
explicitly. These schemas live in a different repository and use their
own field set (e.g. deviceID instead of
cameraID, plus samplingFrequency,
frequencyLow/frequencyHigh, …) and per-table
datetime formats (the media / observations
event timestamps use fractional seconds
%Y-%m-%dT%H:%M:%S.%f%z, while the deployments
times do not); the schema-driven validation adapts to all of this
automatically. If your raw media /
observations timestamps lack the fractional part,
.000 is added automatically so the value matches the
schema’s %f format.
Point the package at the flavor once with
set_properties(), then add tables as usual — the
set_*() methods use the configured
schema_urls, so you do not need to pass
schema = to each call:
ba <- "https://raw.githubusercontent.com/camera-traps/bioacoustics/main/camtrap-dp/1.0.2/%s"
dp <- R6_CamtrapDP$new(version = "1.0.2")
dp$set_properties(
version = "1.0.2",
profile = sprintf(ba, "camtrap-dp-profile-acoustic.json"),
schema_urls = list(
deployments = sprintf(ba, "deployments-table-schema-acoustic.json"),
media = sprintf(ba, "media-table-schema-acoustic.json"),
observations = sprintf(ba, "observations-table-schema-acoustic.json")))
# audio timestamps carry fractional seconds to match the acoustic schema format
dp$set_media(data.frame(
mediaID = "m1", deploymentID = "D1",
timestamp = "2023-04-01T09:05:00.000+0900",
filePath = "audio/m1.wav", filePublic = TRUE, fileMediatype = "audio/wav",
samplingFrequency = 48000L, channels = 1L,
stringsAsFactors = FALSE))You only need a mapping for columns whose name
differs from the acoustic field. Columns that already use the
acoustic field name (deploymentID, latitude,
deploymentStart, …) are matched automatically — no mapping
needed. For deployments, the camera-trap camera* fields are
renamed to device*; the camera-only fields have no acoustic
equivalent and should be dropped; and a few acoustic-only fields can be
set if you have the data.
library(dplyr)
# camera-trap deployments -> acoustic deployments (only the renamed columns)
mapping <- c(
cameraID = "deviceID",
cameraModel = "deviceModel",
cameraDelay = "deviceDelay",
cameraHeight = "deviceHeight",
cameraDepth = "deviceDepth",
cameraTilt = "deviceTilt",
cameraHeading = "deviceHeading")
dep_acoustic <- camtrap_deployments %>%
select(-any_of(c("featureType", "timestampIssues"))) # camera-only: no acoustic field
dp$set_deployments(dep_acoustic, mapping = mapping)Field correspondence — deployments:
| Camera-trap field | Acoustic field | Action |
|---|---|---|
deploymentID, locationID,
locationName, latitude,
longitude, coordinateUncertainty,
deploymentStart, deploymentEnd,
setupBy, detectionDistance,
baitUse, habitat,
deploymentGroups, deploymentTags,
deploymentComments |
same name | no mapping |
cameraID / cameraModel /
cameraDelay / cameraHeight /
cameraDepth / cameraTilt /
cameraHeading |
deviceID / deviceModel /
deviceDelay / deviceHeight /
deviceDepth / deviceTilt /
deviceHeading |
map |
featureType, timestampIssues |
— | drop |
| — | elevation, devicePlatform,
recordingSchedule, locationType |
acoustic-only (set if available) |
For observations the only renamed field is
cameraSetupType → deviceSetupType (acoustic
also adds frequencyLow / frequencyHigh /
classificationConfirmation). For media
there are no renames, only extra fields (duration,
bitDepth, samplingFrequency,
gain, channels).
Inspect a flavor the same way as any other schema. Note that
TableSchema$new("deployments", version = "1.0.2")
without url_template loads the
camera-trap deployments schema; pass the acoustic URL to
inspect the acoustic requirements. requirements() returns a
tidy table of every field’s type, format and constraints.
acoustic_dep <- TableSchema$new(
"deployments", version = "1.0.2",
url_template = sprintf(ba, "deployments-table-schema-acoustic.json"))
acoustic_dep$field_names()
acoustic_dep$required_field_names()
acoustic_dep$requirements() # field / type / format / required / enum / min / max / pattern
acoustic_dep$external_references()Note that
create_deployments(),create_media()andcreate_observations()are tailored to the camera-trap schema. For a different flavor (or for new columns in a future version), build the tables with the schema-driven path (ctdp_build_table()or theset_*()methods with a customschema =) rather than thecreate_*()helpers.