How capable this package when tuning neural networks? One of the
package’s capabilities is the ability to fine-tune the whole
architecture, and this includes the depth of the architecture — not
limited to the number of hidden neurons, also includes the number of
layers. Neural networks with {torch} natively supports
different activation functions for different layers, thus
{kindling} supports:
softshrink(lambd = 0.2)){kindling} has its own function to define the grid which
includes the depth of the architecture: grid_depth(), an
analogue function to dials::grid_space_filling(), except it
creates "regular" grid. You can tweak n_hlayer
parameter, and you can define the grid that has the depth. This
parameter can be scalar (e.g. 2), integer vector
(e.g. 1:2), and/or using a {dials} function
called n_hlayer(). When n_hlayer is greater
than 2, the certain parameters hidden_neurons and
activations creates a list-column, which contains vectors
for each parameter grid, depending on n_hlayer you
defined.
We won’t stop you from using library() function, but we
strongly recommend using box::use() and explicitly import
the names from the namespaces you want to attach.
# library(kindling)
# library(tidymodels)
# library(modeldata)
box::use(
kindling[mlp_kindling, act_funs, args, hidden_neurons, activations, grid_depth],
dplyr[select, ends_with, mutate, slice_sample],
tidyr[drop_na],
rsample[initial_split, training, testing, vfold_cv],
recipes[
recipe, step_dummy, step_normalize,
all_nominal_predictors, all_numeric_predictors
],
modeldata[penguins],
parsnip[tune, set_mode, fit, augment],
workflows[workflow, add_recipe, add_model],
dials[learn_rate],
tune[tune_grid, show_best, collect_metrics, select_best, finalize_workflow, last_fit],
yardstick[metric_set, rmse, rsq],
ggplot2[autoplot]
)We’ll use the penguins dataset from
{modeldata} to predict body mass (in kilograms) from
physical measurements — a straightforward regression task that lets us
focus on the tuning workflow.
{kindling} provides the mlp_kindling()
model spec. Parameters you want to search over are marked with
tune().
spec = mlp_kindling(
hidden_neurons = tune(),
activations = tune(),
epochs = 50,
learn_rate = tune()
) |>
set_mode("regression")Note that n_hlayer is not listed here — it is handled
inside grid_depth() rather than the model spec
directly.
We sample 30 rows per species to keep the example lightweight, and
stratify splits on species to preserve class balance. The
target variable is body_mass_kg, derived from the original
body_mass_g column.
penguins_clean = penguins |>
drop_na() |>
select(body_mass_g, ends_with("_mm"), sex, species) |>
mutate(body_mass_kg = body_mass_g / 1000) |>
slice_sample(n = 30, by = species)
set.seed(123)
split = initial_split(penguins_clean, prop = 0.8, strata = species)
train = training(split)
test = testing(split)
folds = vfold_cv(train, v = 5, strata = body_mass_kg)
rec = recipe(body_mass_kg ~ ., data = train) |>
step_dummy(all_nominal_predictors()) |>
step_normalize(all_numeric_predictors())You still can use standard {dials} grids but the
limitation is that they don’t know about network depth, so
{kindling} provides grid_depth(). The
n_hlayer argument controls which depths to search over.
Remember, it accepts:
n_hlayer = 2n_hlayer = 1:3{dials} range object:
n_hlayer = n_hlayer(c(1, 3))When n_hlayer > 1, the hidden_neurons
and activations columns become list-columns, where each row
holds a vector of per-layer values.
set.seed(42)
depth_grid = grid_depth(
hidden_neurons(c(16, 32)),
activations(c("relu", "elu", "softshrink(lambd = 0.2)")),
learn_rate(),
n_hlayer = 1:3,
size = 10,
type = "latin_hypercube"
)
depth_gridHere we constrain hidden_neurons to the range
[16, 32] and limit activations to three candidates —
including the parametric softshrink. Latin hypercube
sampling spreads the 10 candidates more evenly across the search space
compared to a random grid.
What happens to the tuning part? The solution is easy: the parameters
induced into list-columns and it becomes something like
list(c(1, 2)), so internally the configured argument
unlisted through list(c(1, 2))[[1]] (it always produces
only 1 element).
Once we’ve identified the best configuration, we finalize the workflow and fit it on the full training set.
{kindling} supports parametric activation functions,
meaning each layer’s activation can carry its own tunable parameter.
When passed as a string such as "softshrink(lambd = 0.2)",
{kindling} parses and constructs the activation
automatically. This means you can include them directly in the
activations() candidate list inside
grid_depth() without any extra setup, as shown above.
For manual (non-tuned) use, you can also specify activations per layer explicitly: