| Type: | Package |
| Title: | A Comprehensive Toolkit for Working with Encrypted Parquet Files |
| Version: | 0.1.4 |
| Description: | Utilities for reading, writing, and managing RCDF files, including encryption and decryption support. It offers a flexible interface for handling data stored in encrypted Parquet format, along with metadata extraction, key management, and secure operations using AES and RSA encryptions. |
| Author: | Bhas Abdulsamad |
| Maintainer: | Bhas Abdulsamad <aeabdulsamad@gmail.com> |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| Imports: | arrow, duckdb, haven, openxlsx, fs, zip, glue, utils (≥ 4.0.0), openssl (≥ 2.1.1), dplyr (≥ 1.1.0), stringr (≥ 1.4.0), jsonlite (≥ 1.8.0), DBI (≥ 1.1.0), RSQLite (≥ 2.2.0), uuid (≥ 0.1.2), lifecycle |
| Suggests: | dbplyr (≥ 2.4.0), rlang (≥ 1.0.2), testthat, cli, devtools, knitr, rmarkdown, tibble, withr, gt (≥ 0.10.0) |
| Config/testthat/edition: | 3 |
| RoxygenNote: | 7.3.3 |
| VignetteBuilder: | knitr |
| Depends: | R (≥ 4.1.0) |
| URL: | https://yng-me.github.io/rcdf/, https://github.com/yng-me/rcdf |
| BugReports: | https://github.com/yng-me/rcdf/issues |
| NeedsCompilation: | no |
| Packaged: | 2026-03-03 06:41:16 UTC; bhasabdulsamad |
| Repository: | CRAN |
| Date/Publication: | 2026-03-03 08:40:08 UTC |
rcdf: A Comprehensive Toolkit for Working with Encrypted Parquet Files
Description
Utilities for reading, writing, and managing RCDF files, including encryption and decryption support. It offers a flexible interface for handling data stored in encrypted Parquet format, along with metadata extraction, key management, and secure operations using AES and RSA encryptions.
Author(s)
Maintainer: Bhas Abdulsamad aeabdulsamad@gmail.com (ORCID) [copyright holder]
See Also
Useful links:
Report bugs at https://github.com/yng-me/rcdf/issues
Add metadata attributes to a data frame
Description
Adds variable labels and value labels to a data frame based on a metadata
dictionary. This is particularly useful for preparing datasets for use with
packages like haven or for exporting to formats like SPSS or Stata.
Usage
add_metadata(data, metadata, ..., set_data_types = FALSE)
Arguments
data |
A data frame containing the raw dataset. |
metadata |
A data frame that serves as a metadata dictionary. It must contain
at least the columns: |
... |
Additional arguments (currently unused). |
set_data_types |
Logical; if |
Details
The function first checks the structure of the metadata using an internal helper.
Then, for each variable listed in metadata, it:
Adds a label using the
labelattributeConverts values to labelled vectors using
haven::labelled()if avaluesetis provided
If value labels are present, the function tries to align data types between the data and the valueset (e.g., converting character codes to integers if necessary).
Value
A tibble with the same data as data, but with added attributes:
Variable labels (via the
labelattribute)Value labels (as a
haven::labelledclass, if applicable)
Examples
data <- data.frame(
sex = c(1, 2, 1),
age = c(23, 45, 34)
)
metadata <- data.frame(
variable_name = c("sex", "age"),
label = c("Gender", "Age in years"),
type = c("categorical", "numeric"),
valueset = I(list(
data.frame(value = c(1, 2), label = c("Male", "Female")),
NULL
))
)
labelled_data <- add_metadata(data, metadata)
str(labelled_data)
Convert to rcdf class
Description
Converts an existing list or compatible object into an object of class rcdf.
Usage
as_rcdf(data)
Arguments
data |
A list or object to be converted to class |
Value
The input object with class set to rcdf.
Examples
my_list <- list(a = 1, b = 2)
rcdf_obj <- as_rcdf(my_list)
class(rcdf_obj)
Collect
Description
Collect
Usage
collect(data, ...)
Arguments
data |
A lazy data frame (e.g. from dbplyr or dtplyr) from database connection. |
... |
Optional arguments |
Value
A data frame
Generate a random password
Description
This function generates a random password of a specified length. It includes alphanumeric characters by default and can optionally include special characters.
Usage
generate_pw(length = 16, special_chr = TRUE)
Arguments
length |
Integer. The length of the password to generate. Default is |
special_chr |
Logical. Whether to include special characters
(e.g., |
Value
A character string representing the generated password.
Examples
generate_pw()
generate_pw(32)
generate_pw(12, special_chr = FALSE)
Generate RSA key pair and save to files
Description
This function generates an RSA key pair (public and private) and saves them to specified files.
Usage
generate_rsa_keys(path, ..., password = NULL, which = "public", prefix = NULL)
Arguments
path |
A character string specifying the directory path where the key files in |
... |
Additional arguments passed to the |
password |
A character string specifying the password for the private key. If |
which |
A character string specifying which key to return. Can be either |
prefix |
A character string used as a prefix for the key file names. Defaults to |
Value
A character string representing the file path of the generated key (either public or private, based on the which argument).
Examples
# Generate both public and private RSA keys and save them to the temp directory
path_to <- tempdir()
generate_rsa_keys(path = path_to, password = "securepassword")
Get metadata attribute from RCDF data
Description
Get metadata attribute from RCDF data
Usage
get_attr(rcdf, key)
Arguments
rcdf |
RCDF data |
key |
Valid metadata key. |
Value
RCDF attribute/s or NULL
Examples
## Not run:
# Assuming `df` is a valid RCDF object
get_attr(df, "area_names")
# To get nested attributes
get_attr(df, "meta.source_note")
## End(Not run)
Extract metadata from an RCDF file
Description
Retrieves a specific metadata value from a .rcdf file.
Usage
get_rcdf_metadata(path, key)
Arguments
path |
Character string. The file path to the |
key |
Character string. The metadata key to extract from the file. |
Value
The value associated with the specified metadata key, or NULL if the key does not exist.
Examples
## Not run:
# Assuming "example.rcdf" is a valid RCDF file in the working directory:
get_rcdf_metadata("example.rcdf", "log_id")
## End(Not run)
Merge multiple RCDF files
Description
Merge multiple RCDF files
Usage
merge_rcdf(
rcdf_files,
decryption_keys,
passwords,
merged_file_path,
pub_key = NULL
)
Arguments
rcdf_files |
A character vector of RCDF file paths |
decryption_keys |
Decryption keys associated with each RCDF file. Must match the length of the vector passed in the |
passwords |
Password of the associated decryption keys. Must match the length of |
merged_file_path |
File path or name of the merged RCDF file. |
pub_key |
Public key to encrypt the merged file. If |
Value
NULL (void)
Examples
## Not run:
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key-pw.pem')
pw <- '1234'
temp_dir <- tempdir()
merge_rcdf(
rcdf_files = rcdf_path,
decryption_keys = private_key,
passwords = pw,
merged_file_path = file.path(temp_dir, "merged.rcdf"),
pub_key = file.path(dir, 'sample-public-key-pw.pem')
)
unlink(file.path(temp_dir, "merged.rcdf"), force = TRUE)
## End(Not run)
Create an empty rcdf object
Description
Initializes and returns an empty rcdf object. This is a convenient constructor
for creating a new rcdf-class list structure.
Usage
rcdf_list(...)
Arguments
... |
Optional elements to include in the list. These will be passed to
the internal list constructor and included in the resulting |
Value
A list object of class rcdf.
Examples
rcdf <- rcdf_list()
class(rcdf)
Read environment variables from a file
Description
Based on https://github.com/gaborcsardi/dotenv
Usage
read_dot_env(path = ".env")
Arguments
path |
A string specifying the path to the |
Details
Reads a .env file containing environment variables in the format KEY=VALUE, and returns them as a named list.
Lines starting with # are considered comments and ignored.
Value
A named list of environment variables. Each element is a key-value pair extracted from the file. If no variables are found, NULL is returned.
Examples
## Not run:
# Assuming an `.env` file with the following content:
# DB_HOST=localhost
# DB_USER=root
# DB_PASS="secret"
env_vars <- read_dot_env(".env")
print(env_vars)
# Should output something like:
# $DB_HOST
# [1] "localhost"
# If no path is given, it defaults to `.env` in the current directory.
env_vars <- read_dot_env()
## End(Not run)
Read environment variables from a file
Description
Usage
read_env(path = ".env")
Arguments
path |
A string specifying the path to the |
Details
Reads a .env file containing environment variables in the format KEY=VALUE, and returns them as a named list.
Lines starting with # are considered comments and ignored.
Value
A named list of environment variables. Each element is a key-value pair extracted from the file. If no variables are found, NULL is returned.
Examples
## Not run:
# Assuming an `.env` file with the following content:
# DB_HOST=localhost
# DB_USER=root
# DB_PASS="secret"
env_vars <- read_env(".env")
print(env_vars)
# Should output something like:
# $DB_HOST
# [1] "localhost"
# If no path is given, it defaults to `.env` in the current directory.
env_vars <- read_env()
## End(Not run)
Read Parquet file with optional decryption
Description
This function reads a Parquet file, optionally decrypting it using the provided decryption key. If no decryption key is provided, it reads the file normally without decryption. It supports reading Parquet files as Arrow tables or regular data frames, depending on the as_arrow_table argument.
Usage
read_parquet(
path,
...,
decryption_key = NULL,
as_arrow_table = FALSE,
metadata = NULL
)
Arguments
path |
The file path to the Parquet file. |
... |
Additional arguments passed to |
decryption_key |
A list containing |
as_arrow_table |
Logical. If |
metadata |
Optional metadata (e.g., a data dictionary) to be applied to the resulting data. |
Value
An Arrow table or a data frame, depending on the value of as_arrow_table.
Examples
## Not run:
# Using sample Parquet files from `mtcars` dataset
dir <- system.file("extdata", package = "rcdf")
# Not encrypted
read_parquet(file.path(dir, "mtcars.parquet"))
# Encrypted
read_parquet(
file.path(dir, "mtcars-encrypted.parquet"),
decryption_key = 'rppqM5CuEqotys4wQq/g7xh6wpIjRozcAIbI9sagwKE='
)
## End(Not run)
Read Parquet file as database
Description
This function reads a Parquet file, optionally decrypting it using the provided decryption key. If no decryption key is provided, it reads the file normally without decryption. It supports reading Parquet files as Arrow tables or regular data frames, depending on the as_arrow_table argument.
Usage
read_parquet_tbl(conn, file, decryption_key, table_name = NULL, columns = NULL)
Arguments
conn |
A DuckDB connection. |
file |
The file path to the Parquet file. |
decryption_key |
A list containing |
table_name |
Database table name. If |
columns |
A character vector matching the column names available in the Parquet file. |
Value
Lazy table from DuckDB connection
Examples
## Not run:
# Using sample Parquet files from `mtcars` dataset
dir <- system.file("extdata", package = "rcdf")
# Encrypted
read_parquet_tbl(
file.path(dir, "mtcars-encrypted.parquet"),
decryption_key = 'rppqM5CuEqotys4wQq/g7xh6wpIjRozcAIbI9sagwKE='
)
## End(Not run)
Read and decrypt RCDF data
Description
This function reads an RCDF file, decrypts its contents using the specified decryption key, and loads it into R as an RCDF object.
Usage
read_rcdf(
path,
...,
decryption_key,
password = NULL,
metadata = list(),
ignore_duplicates = TRUE,
recursive = FALSE,
return_meta = FALSE
)
Arguments
path |
A string specifying the path to the RCDF archive (zip file). If a directory is provided, all |
... |
Additional parameters passed to other functions, if needed (not yet implemented). |
decryption_key |
The key used to decrypt the RCDF. This can be an RSA or AES key, depending on how the RCDF was encrypted. |
password |
A password used for RSA decryption (optional). |
metadata |
An optional list of metadata object containing data dictionaries, value sets, and primary key constraints for data integrity measure (a |
ignore_duplicates |
A |
recursive |
Logical. If |
return_meta |
Logical. If |
Value
An RCDF object, which is a list of Parquet files (one for each record) along with attached metadata.
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key-pw.pem')
pw <- '1234'
## Not run:
rcdf_data <- read_rcdf(
path = rcdf_path,
decryption_key = private_key,
password = pw
)
rcdf_data
## End(Not run)
Write Parquet file with optional encryption
Description
This function writes a dataset to a Parquet file. If an encryption key is provided, the data will be encrypted before writing. Otherwise, the function writes the data as a regular Parquet file without encryption.
Usage
write_parquet(data, path, ..., encryption_key = NULL)
Arguments
data |
A data frame or tibble to write to a Parquet file. |
path |
The file path where the Parquet file will be written. |
... |
Additional arguments passed to |
encryption_key |
A list containing |
Value
None. The function writes the data to a Parquet file at the specified path.
Examples
## Not run:
data <- mtcars
key <- "rppqM5CuEqotys4wQq/g7xh6wpIjRozcAIbI9sagwKE="
temp_dir <- tempdir()
rcdf::write_parquet(
data = data,
path = file.path(temp_dir, "mtcars.parquet"),
encryption_key = key
)
## End(Not run)
Write data to RCDF format
Description
This function writes data to an RCDF (Reusable Data Container Format) archive. It encrypts the data using AES, generates metadata, and then creates a zip archive containing both the encrypted Parquet files and metadata. The function supports the inclusion of metadata such as system information and encryption keys.
Usage
write_rcdf(
data,
path,
pub_key,
...,
metadata = list(),
ignore_duplicates = TRUE
)
Arguments
data |
A list of data frames or tables to be written to RCDF format. Each element of the list represents a record. |
path |
The path where the RCDF file will be written. The file will be saved with a |
pub_key |
The public RSA key used to encrypt the AES encryption keys. |
... |
Additional arguments passed to helper functions if needed. |
metadata |
A list of metadata to be included in the RCDF file. |
ignore_duplicates |
A |
Value
NULL. The function writes the data to a .rcdf file at the specified path.
Examples
## Not run:
# Example usage of writing an RCDF file
rcdf_data <- rcdf_list()
rcdf_data$mtcars <- mtcars
dir <- system.file("extdata", package = "rcdf")
temp_dir <- tempdir()
write_rcdf(
data = rcdf_data,
path = file.path(temp_dir, "mtcars.rcdf"),
pub_key = file.path(dir, 'sample-public-key.pem')
)
write_rcdf(
data = rcdf_data,
path = file.path(temp_dir, "mtcars-pw.rcdf"),
pub_key = file.path(dir, 'sample-public-key-pw.pem')
)
unlink(file.path(temp_dir, "mtcars.rcdf"), force = TRUE)
unlink(file.path(temp_dir, "mtcars-pw.rcdf"), force = TRUE)
## End(Not run)
Write RCDF data to multiple formats
Description
Exports RCDF-formatted data to one or more supported open data formats. The function automatically dispatches to the appropriate writer function based on the formats provided.
Usage
write_rcdf_as(data, path, formats, ...)
Arguments
data |
A named list or RCDF object. Each element should be a table or tibble-like object (typically a |
path |
The target directory where output files should be saved. |
formats |
A character vector of file formats to export to. Supported formats include: |
... |
Additional arguments passed to the respective writer functions. |
Value
Invisibly returns NULL. Files are written to disk.
See Also
write_rcdf_csv write_rcdf_tsv write_rcdf_json write_rcdf_xlsx write_rcdf_dta write_rcdf_sav write_rcdf_sqlite
Examples
## Not run:
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key-pw.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key, password = '1234')
temp_dir <- tempdir()
write_rcdf_as(data = rcdf_data, path = temp_dir, formats = c("csv", "xlsx"))
unlink(temp_dir, force = TRUE)
## End(Not run)
Write RCDF data to CSV files
Description
Writes each table in the RCDF object as a separate .csv file.
Usage
write_rcdf_csv(data, path, ..., parent_dir = NULL)
Arguments
data |
A valid RCDF object. |
path |
The base output directory. |
... |
Additional arguments passed to |
parent_dir |
Optional subdirectory under |
Value
Invisibly returns NULL. Files are written to disk.
See Also
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key-pw.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key, password = '1234')
temp_dir <- tempdir()
write_rcdf_csv(data = rcdf_data, path = temp_dir)
unlink(temp_dir, force = TRUE)
Write RCDF data to Stata .dta files
Description
Writes each table in the RCDF object to a .dta file for use in Stata.
Usage
write_rcdf_dta(data, path, ..., parent_dir = NULL)
Arguments
data |
A valid RCDF object. |
path |
Output directory for files. |
... |
Additional arguments passed to |
parent_dir |
Optional subdirectory under |
Value
Invisibly returns NULL. Files are written to disk.
See Also
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key-pw.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key, password = '1234')
temp_dir <- tempdir()
write_rcdf_dta(data = rcdf_data, path = temp_dir)
unlink(temp_dir, force = TRUE)
Write RCDF data to JSON files
Description
Writes each table in the RCDF object as a separate .json file.
Usage
write_rcdf_json(data, path, ..., parent_dir = NULL)
Arguments
data |
A valid RCDF object. |
path |
The output directory for files. |
... |
Additional arguments passed to |
parent_dir |
Optional subdirectory under |
Value
Invisibly returns NULL. Files are written to disk.
See Also
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key-pw.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key, password = '1234')
temp_dir <- tempdir()
write_rcdf_json(data = rcdf_data, path = temp_dir)
unlink(temp_dir, force = TRUE)
Write RCDF data to Parquet files
Description
This function writes an RCDF object (a list of data frames) to multiple Parquet files. Each data frame in the list is written to its corresponding Parquet file in the specified path.
Usage
write_rcdf_parquet(
data,
path,
...,
parent_dir = NULL,
primary_key = NULL,
ignore_duplicates = TRUE
)
Arguments
data |
A list where each element is a data frame or tibble that will be written to a Parquet file. |
path |
The directory path where the Parquet files will be written. |
... |
Additional arguments passed to |
parent_dir |
An optional parent directory to be included in the path where the files will be written. |
primary_key |
A |
ignore_duplicates |
A |
Value
A character vector of file paths to the written Parquet files.
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key-pw.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key, password = '1234')
temp_dir <- tempdir()
write_rcdf_parquet(data = rcdf_data, path = temp_dir)
unlink(temp_dir, force = TRUE)
Write RCDF data to SPSS .sav files
Description
Writes each table in the RCDF object to a .sav file using the haven package for compatibility with SPSS.
Usage
write_rcdf_sav(data, path, ..., parent_dir = NULL)
Arguments
data |
A valid RCDF object. |
path |
Output directory for files. |
... |
Additional arguments passed to |
parent_dir |
Optional subdirectory under |
Value
Invisibly returns NULL. Files are written to disk.
See Also
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key-pw.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key, password = '1234')
temp_dir <- tempdir()
write_rcdf_sav(data = rcdf_data, path = temp_dir)
unlink(temp_dir, force = TRUE)
Write RCDF data to a SQLite database
Description
Writes all tables in the RCDF object to a single SQLite database file.
Usage
write_rcdf_sqlite(data, path, db_name = "cbms_data", ..., parent_dir = NULL)
Arguments
data |
A valid RCDF object. |
path |
Output directory for the database file. |
db_name |
Name of the SQLite database file (without extension). |
... |
Additional arguments passed to |
parent_dir |
Optional subdirectory under |
Value
Invisibly returns NULL. A .db file is written to disk.
See Also
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key-pw.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key, password = '1234')
temp_dir <- tempdir()
write_rcdf_sqlite(data = rcdf_data, path = temp_dir)
unlink(temp_dir, force = TRUE)
Write RCDF data to TSV files
Description
Writes each table in the RCDF object as a separate tab-separated .txt file.
Usage
write_rcdf_tsv(data, path, ..., parent_dir = NULL)
Arguments
data |
A valid RCDF object. |
path |
The base output directory. |
... |
Additional arguments passed to |
parent_dir |
Optional subdirectory under |
Value
Invisibly returns NULL. Files are written to disk.
See Also
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key-pw.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key, password = '1234')
temp_dir <- tempdir()
write_rcdf_tsv(data = rcdf_data, path = temp_dir)
unlink(temp_dir, force = TRUE)
Write RCDF data to Excel files
Description
Writes each table in the RCDF object as a separate .xlsx file using the openxlsx package.
Usage
write_rcdf_xlsx(data, path, ..., parent_dir = NULL)
Arguments
data |
A valid RCDF object. |
path |
The output directory. |
... |
Additional arguments passed to |
parent_dir |
Optional subdirectory under |
Value
Invisibly returns NULL. Files are written to disk.
See Also
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key-pw.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key, password = '1234')
temp_dir <- tempdir()
write_rcdf_xlsx(data = rcdf_data, path = temp_dir)
unlink(temp_dir, force = TRUE)