R2camtrapdp: acoustic (audio) data

library(R2camtrapdp)

Overview

Acoustic (audio) recordings use the bioacoustics flavor of Camtrap DP. Audio data is media-based (observationLevel = "media"): each observation refers to a media file (mediaID). The differences from the camera-trap schema are read automatically from the bioacoustics schemas; the main ones are:

Tip — pass datetimes as POSIXct. The package then formats each table’s datetime correctly (offset +0900, and .000 fractional seconds where the schema requires them). A raw string like "2026/6/12 12:00:00" is written as-is and fails validation.

This example assumes you have only two field notebooks — a deployment notebook and an observation notebook — and that the observation notebook carries the audio file names, from which media is derived.

Data

data("Adep")   # deployment field-notebook (one row per device deployment)
data("Aobs")   # observation field-notebook (one row per observation; has `filename`)
str(Adep, vec.len = 2)
#> 'data.frame':    2 obs. of  14 variables:
#>  $ deploymentID     : chr  "AC01" "AC02"
#>  $ longitude        : num  139 139
#>  $ latitude         : num  34.9 34.9
#>  $ locationID       : chr  "L01" "L02"
#>  $ startDate        : chr  "2026-06-12" "2026-06-13"
#>  $ startTime        : chr  "06:00:00" "06:00:00"
#>  $ endDate          : chr  "2026-06-20" "2026-06-21"
#>  $ endTime          : chr  "06:00:00" "06:00:00"
#>  $ deviceID         : chr  "AM1" "AM2"
#>  $ deviceModel      : chr  "AudioMoth 1.2.0" "AudioMoth 1.2.0"
#>  $ samplingFrequency: int  48000 48000
#>  $ bitDepth         : int  16 16
#>  $ channels         : int  1 1
#>  $ setupBy          : chr  "Jane Doe" "Jane Doe"
str(Aobs, vec.len = 2)
#> 'data.frame':    6 obs. of  19 variables:
#>  $ institutionCode: chr  "NIES" "NIES" ...
#>  $ collectionCode : chr  "ACO" "ACO" ...
#>  $ obsID          : chr  "1" "2" ...
#>  $ eventID        : chr  "e1" "e1" ...
#>  $ deploymentID   : chr  "AC01" "AC01" ...
#>  $ locationID     : chr  "L01" "L01" ...
#>  $ date           : chr  "2026-06-12" "2026-06-12" ...
#>  $ time           : chr  "06:00:00" "06:00:00" ...
#>  $ filename       : chr  "AM1_20260612_060000.wav" "AM1_20260612_060000.wav" ...
#>  $ duration       : num  60 60 60 60 60 ...
#>  $ object         : chr  "animal" "animal" ...
#>  $ class          : chr  "Aves" "Aves" ...
#>  $ genus          : chr  "Strix" "Cuculus" ...
#>  $ species        : chr  "uralensis" "canorus" ...
#>  $ individualCount: int  NA NA NA NA NA ...
#>  $ frequencyLow   : num  300 500 300 NA 500 ...
#>  $ frequencyHigh  : num  900 900 900 NA 900 ...
#>  $ eventStart     : chr  "2026-06-12 06:00:05" "2026-06-12 06:00:12" ...
#>  $ eventEnd       : chr  "2026-06-12 06:00:09" "2026-06-12 06:00:18" ...

Adep has deploymentID, coordinates, startDate/startTime, endDate/endTime, deviceID, deviceModel, the recording settings (samplingFrequency, bitDepth, channels) and setupBy. Aobs has deploymentID, filename, date/time, duration, the taxonomy (class/genus/species), individualCount, frequencyLow/frequencyHigh and eventStart/eventEnd. In this example data the coordinates fall in the Izu Peninsula (Japan), individualCount is NA (not counted from audio), and frequencyLow/frequencyHigh use approximate values from the literature.

1. Point the package at the bioacoustics flavor

ba <- "https://raw.githubusercontent.com/camera-traps/bioacoustics/main/camtrap-dp/1.0.2/%s"

dp <- R6_CamtrapDP$new(version = "1.0.2",
  title = "Acoustic survey example", description = "AudioMoth recordings",
  id = "https://example.org/dataset/acoustic-1")

dp$set_properties(
  version     = "1.0.2",
  profile     = sprintf(ba, "camtrap-dp-profile-acoustic.json"),
  schema_urls = list(
    deployments  = sprintf(ba, "deployments-table-schema-acoustic.json"),
    media        = sprintf(ba, "media-table-schema-acoustic.json"),
    observations = sprintf(ba, "observations-table-schema-acoustic.json")))

2. Deployments (from the deployment notebook)

Build the deployments table from Adep. Combine the date and time columns into POSIXct so the package writes the correct datetime format.

deployments <- data.frame(
  deploymentID    = Adep$deploymentID,
  latitude        = Adep$latitude,
  longitude       = Adep$longitude,
  locationID      = Adep$locationID,
  deploymentStart = as.POSIXct(paste(Adep$startDate, Adep$startTime), tz = "Asia/Tokyo"),
  deploymentEnd   = as.POSIXct(paste(Adep$endDate,   Adep$endTime),   tz = "Asia/Tokyo"),
  deviceID        = Adep$deviceID,
  deviceModel     = Adep$deviceModel,
  setupBy         = Adep$setupBy,
  stringsAsFactors = FALSE)
dp$set_deployments(deployments)

3. Media (derived from the observation notebook’s file names)

There is no separate media notebook: build media from the unique filenames in Aobs (one row per audio file), and bring the recording settings over from Adep.

files <- Aobs[!duplicated(Aobs$filename), ]   # one row per audio file

media <- data.frame(
  mediaID       = files$filename,             # the file name is the media identifier
  deploymentID  = files$deploymentID,
  timestamp     = as.POSIXct(paste(files$date, files$time), tz = "Asia/Tokyo"),
  filePath      = file.path("audio", files$filename),
  filePublic    = TRUE,
  fileMediatype = paste0("audio/", tolower(tools::file_ext(files$filename))),  # "audio/wav"
  duration      = files$duration,
  stringsAsFactors = FALSE)

# add device recording settings (samplingFrequency / bitDepth / channels) from Adep
media <- merge(media, Adep[, c("deploymentID", "samplingFrequency", "bitDepth", "channels")],
               by = "deploymentID", all.x = TRUE)
head(media)
#>   deploymentID                 mediaID           timestamp
#> 1         AC01 AM1_20260612_060000.wav 2026-06-12 06:00:00
#> 2         AC01 AM1_20260612_063000.wav 2026-06-12 06:30:00
#> 3         AC01 AM1_20260613_060000.wav 2026-06-13 06:00:00
#> 4         AC02 AM2_20260613_060000.wav 2026-06-13 06:00:00
#>                        filePath filePublic fileMediatype duration
#> 1 audio/AM1_20260612_060000.wav       TRUE     audio/wav       60
#> 2 audio/AM1_20260612_063000.wav       TRUE     audio/wav       60
#> 3 audio/AM1_20260613_060000.wav       TRUE     audio/wav       60
#> 4 audio/AM2_20260613_060000.wav       TRUE     audio/wav       60
#>   samplingFrequency bitDepth channels
#> 1             48000       16        1
#> 2             48000       16        1
#> 3             48000       16        1
#> 4             48000       16        1
dp$set_media(media)

timestamp is written with fractional seconds (e.g. 2026-06-12T06:00:00.000+0900) to match the acoustic media format — handled automatically because it is a POSIXct.

4. Observations (from the observation notebook)

observations <- data.frame(
  observationID    = paste(Aobs$deploymentID, Aobs$eventID, Aobs$obsID, sep = "_"),
  deploymentID     = Aobs$deploymentID,
  mediaID          = Aobs$filename,           # link to media (mediaID = filename)
  eventStart       = as.POSIXct(Aobs$eventStart, tz = "Asia/Tokyo"),
  eventEnd         = as.POSIXct(Aobs$eventEnd,   tz = "Asia/Tokyo"),
  observationLevel = "media",
  observationType  = ifelse(Aobs$object == "none", "blank",
                     ifelse(Aobs$object == "hito", "human", "animal")),
  scientificName   = ifelse(is.na(Aobs$genus), Aobs$class, paste(Aobs$genus, Aobs$species)),
  count            = Aobs$individualCount,    # NA here (not counted from audio)
  frequencyLow     = Aobs$frequencyLow,
  frequencyHigh    = Aobs$frequencyHigh,
  stringsAsFactors = FALSE)
dp$set_observations(observations)

5. Metadata, relations, write, validate

dp$add_contributors(data.frame(title = "Jane Doe", role = "contact",
                               organization = "NIES", stringsAsFactors = FALSE))
dp$add_license(name = "CC0-1.0",   scope = "data")
dp$add_license(name = "CC-BY-4.0", scope = "media")
dp$set_project(title = "Acoustic survey", samplingDesign = "systematicRandom",
               captureMethod = "recordingSchedule", individualAnimals = FALSE,
               observationLevel = "media")
dp$set_st()
# dp$set_taxon()   # taxonID from GBIF/ITIS/NCBI; requires the taxadb package + internet

dp$check_relations()   # PK/FK; warns and points at datapackage$data$... if a key is missing

path <- file.path(tempdir(), "acoustic-package")
dp$out_camtrapdp(write = TRUE, directory = path)

issues <- dp$validate_frictionless(directory = path, python = "python")  # pip install frictionless
ctdp_is_valid(issues)

To validate a package that already exists on disk without overwriting it:

ctdp_validate_frictionless("path/to/existing/acoustic-package", python = "python")

6. Inspecting the acoustic requirements

ba <- "https://raw.githubusercontent.com/camera-traps/bioacoustics/main/camtrap-dp/1.0.2/%s"
acoustic_media <- TableSchema$new(
  "media", version = "1.0.2",
  url_template = sprintf(ba, "media-table-schema-acoustic.json"))

acoustic_media$field_names()
acoustic_media$requirements()                 # type / format / required / enum per field
acoustic_media$field("timestamp")$format      # "%Y-%m-%dT%H:%M:%S.%f%z"