llamaR: Interface for Large Language Models via 'llama.cpp'

Provides 'R' bindings to 'llama.cpp' for running Large Language Models ('LLMs') locally with optional 'Vulkan' GPU acceleration via 'ggmlR'. Supports model loading, text generation, 'tokenization', token-to-piece conversion, 'embeddings' (single and batch), encoder-decoder inference, low-level batch management, chat templates, 'LoRA' adapters, explicit backend/device selection, multi-GPU split, and 'NUMA' optimization. Includes a high-level 'ragnar'-compatible embedding provider ('embed_llamar'). Built on top of 'ggmlR' for efficient tensor operations.

Version: 0.2.3
Depends: R (≥ 4.1.0), ggmlR
Imports: jsonlite, utils
LinkingTo: ggmlR
Suggests: testthat (≥ 3.0.0), withr
Published: 2026-04-06
DOI: 10.32614/CRAN.package.llamaR
Author: Yuri Baramykov [aut, cre], Georgi Gerganov [cph] (Author of the 'llama.cpp' library included in src/)
Maintainer: Yuri Baramykov <lbsbmsu at mail.ru>
BugReports: https://github.com/Zabis13/llamaR/issues
License: MIT + file LICENSE
URL: https://github.com/Zabis13/llamaR
NeedsCompilation: yes
SystemRequirements: C++17, GNU make
Materials: README, NEWS
CRAN checks: llamaR results

Documentation:

Reference manual: llamaR.html , llamaR.pdf

Downloads:

Package source: llamaR_0.2.3.tar.gz
Windows binaries: r-devel: not available, r-release: not available, r-oldrel: not available
macOS binaries: r-release (arm64): not available, r-oldrel (arm64): not available, r-release (x86_64): not available, r-oldrel (x86_64): not available
Old sources: llamaR archive

Linking:

Please use the canonical form https://CRAN.R-project.org/package=llamaR to link to this page.