llm.api 0.1.4

CRAN release consolidating the 0.1.3.1–0.1.3.5 development cycle. Highlights since the on-CRAN 0.1.3:

The per-cycle detail follows.

llm.api 0.1.3.5

Refreshed default models

When no model is given, each provider now defaults to a recent, cost-appropriate, snapshot-priceable model, replacing dated defaults:

This affects chat(), agent(), and the chat_*() / chat_session_*() wrappers. Pass model = explicitly to use any other model.

llm.api 0.1.3.4

Cache-aware cost estimates

usage$cost (from chat() and agent()) now accounts for prompt caching instead of billing every input token at the full rate. Anthropic cache writes/reads are priced from Anthropic’s published multipliers (5-minute write 1.25x, 1-hour write 2x, read 0.1x of the base input rate), and OpenAI / Moonshot cache hits are priced from each model’s cached-input rate in the bundled snapshot.

New exported helpers:

agent()$usage now also carries cumulative cache_read_input_tokens and cache_creation_input_tokens so callers can inspect cache activity after a multi-turn run.

The bundled price snapshot was refreshed (2026-05-24) to carry per-model cached-input rates; base input/output rates for existing models are unchanged. Cost estimates remain offline and approximate; prices_snapshot_date() docs now spell that out, with source URLs.

llm.api 0.1.3.3

Fix: cache / thinking_budget_tokens silently disabled under the default provider

The Anthropic-only guards in chat() ran before provider auto-detection, comparing against the literal "auto" default. So chat(prompt, model = "claude-...", cache = "5m") tripped a spurious “Anthropic-only” warning, downgraded the opt-in, and fell through to the default provider. Detection now runs first, so the guards see the resolved provider. .validate_thinking_budget() still runs up front as provider-independent input validation. Network-free regression coverage added.

llm.api 0.1.3.2

Three additions, all backward-compatible (new parameters default to no-op behaviour) and zero new dependencies.

Anthropic prompt caching (cache parameter)

chat(cache = c("none", "5m", "1h")) and agent(cache = c("none", "5m", "1h")). Default "none" preserves current behaviour; opting in wraps the system message in an ephemeral cache_control block. "5m" uses Anthropic’s default TTL; "1h" requests the longer cache window. Worth turning on when the system prompt is long-lived across calls — cache reads cost ~10% of normal input tokens but cache writes cost ~25% more, so opt-in is the right default. Anthropic-only; warns and degrades to no-op for other providers.

Anthropic extended thinking budget (thinking_budget_tokens)

chat(thinking_budget_tokens = N) and agent(thinking_budget_tokens = N). When set, sends thinking = {type: "enabled", budget_tokens: N} to the Anthropic Messages API. Validates inputs early: must be a single integer >= 1024, and (when max_tokens is set) must be strictly less than it since the budget is counted against max_tokens. Anthropic-only; warns and degrades for other providers.

OpenAI max_tokensmax_completion_tokens mapping

OpenAI deprecated max_tokens in favour of max_completion_tokens, and o-series reasoning models reject max_tokens entirely. chat() and agent() now rename for OpenAI requests only; Moonshot and Ollama (which share the OpenAI-compatible code path) continue to receive max_tokens since their endpoints still expect it. The rename is gated on the caller not already passing max_completion_tokens, so explicit-set values win.

llm.api 0.1.3.1

llm.api 0.1.3

llm.api 0.1.1