GPU / Vulkan Backend

ggmlR is GPU-first: Vulkan is auto-detected at install time and used by default when available. All APIs (sequential, functional, autograd, ONNX) fall back to CPU transparently when no GPU is present.

1. Check availability

if (ggml_vulkan_available()) {
  cat("Vulkan is available\n")
  ggml_vulkan_status()              # print device list and properties
} else {
  cat("No Vulkan GPU — running on CPU\n")
}

n <- ggml_vulkan_device_count()
cat("Vulkan device count:", n, "\n")

2. Device enumeration

# Low-level device registry (all backends including CPU)
ggml_backend_load_all()

n_dev <- ggml_backend_dev_count()
for (i in seq_len(n_dev)) {
  dev  <- ggml_backend_dev_get(i - 1L)   # 0-based
  name <- ggml_backend_dev_name(dev)
  desc <- ggml_backend_dev_description(dev)
  mem  <- ggml_backend_dev_memory(dev)
  cat(sprintf("[%d] %s — %s\n", i, name, desc))
  cat(sprintf("    %.1f GB free / %.1f GB total\n",
              mem["free"] / 1e9, mem["total"] / 1e9))
}

Full example: inst/examples/device_discovery.R

3. Autograd device selection

# Select GPU (falls back to CPU if unavailable)
device <- tryCatch({
  ag_device("gpu")
  "gpu"
}, error = function(e) {
  message("GPU not available, using CPU")
  "cpu"
})

cat("Active device:", device, "\n")

After ag_device("gpu"), all subsequent ag_param and ag_tensor calls allocate on the GPU. Switch back with ag_device("cpu").

4. Mixed precision (f16 / bf16)

if (device == "gpu") {
  ag_dtype("f16")     # half-precision on Vulkan GPU
  # ag_dtype("bf16") # bfloat16 — falls back to f16 on Vulkan automatically
} else {
  ag_dtype("f32")     # full precision on CPU
}

cat("Active dtype:", ag_default_dtype(), "\n")

bf16 → f16 fallback is automatic on Vulkan because bf16 is not natively supported in GLSL shaders.

5. Vulkan memory info

if (ggml_vulkan_available()) {
  mem <- ggml_vulkan_device_memory(0L)
  cat(sprintf("GPU memory: %.1f MB free / %.1f MB total\n",
              mem$free / 1e6, mem$total / 1e6))
}

6. Multi-GPU

ggmlR supports multiple Vulkan devices via the backend scheduler. dp_train() distributes data across replicas automatically:

n_gpu <- ggml_vulkan_device_count()
cat(sprintf("Using %d GPU(s)\n", n_gpu))

# dp_train handles multi-GPU internally — see vignette("data-parallel-training")

For low-level multi-GPU scheduler usage see inst/examples/multi_gpu_example.R.

7. Sequential / functional API on GPU

The high-level ggml_fit() API picks up the Vulkan backend automatically — no extra configuration needed:

data(iris)
x_train <- scale(as.matrix(iris[, 1:4]))
y_train <- model.matrix(~ Species - 1, iris)

model <- ggml_model_sequential() |>
  ggml_layer_dense(64L, activation = "relu", input_shape = 4L) |>
  ggml_layer_dense(3L,  activation = "softmax") |>
  ggml_compile(optimizer = "adam", loss = "categorical_crossentropy")

# Training runs on GPU if Vulkan is available
model <- ggml_fit(model, x_train, y_train, epochs = 50L,
                  batch_size = 32L, verbose = 0L)

8. ONNX inference on GPU

# Weights loaded to GPU once at load time
model_onnx <- onnx_load("model.onnx", device = "vulkan")

# Repeated inference — no weight re-transfer
for (i in seq_len(100L)) {
  out <- onnx_run(model_onnx, list(input = batch[[i]]))
}

9. Build configuration

Vulkan support is compiled in automatically when libvulkan-dev and glslc are detected at install time. SIMD (AVX2/AVX512) for the CPU fallback requires an explicit flag:

# Default install (Vulkan auto-detected, CPU fallback without SIMD)
R CMD INSTALL .

# With CPU SIMD acceleration
R CMD INSTALL . --configure-args="--with-simd"

# Also enable the hard-exit path for multi-GPU scripts (see below)
R CMD INSTALL . --configure-args="--with-simd --enable-hard-exit"

On Windows R ignores configure.args, so set the equivalent environment variables before installing from source: Sys.setenv(GGML_USE_SIMD = "1"), Sys.setenv(GGML_VK_HARD_EXIT = "1").

--enable-hard-exit is off by default and is only relevant to standalone multi-GPU scripts. It controls whether ggml_vulkan_shutdown(hard = TRUE) may call _exit() to skip the exit-time Vulkan loader teardown, which can otherwise flakily segfault after such a script has printed its results. The released package omits it because CRAN policy forbids a package from terminating the user’s R session; in a default build hard = TRUE tears Vulkan down normally and emits a warning rather than silently doing nothing. See vignette("multi-gpu") for the full story.

To check what was compiled in:

cat(ggml_version(), "\n")
ggml_vulkan_status()   # shows "Vulkan not available" if not compiled in
ggml_vulkan_hard_exit_available()   # TRUE only with --enable-hard-exit