This release brings first-class extended thinking across providers, full Gemini 3 Pro/Flash thinking-signature support (chat + tools), a Rails upgrade path to persist it, and a tighter streaming pipeline. Plus official Ruby 4.0 support, safer model registry refreshes, a Vertex AI global endpoint fix, and a docs refresh.
π§ Extended Thinking Everywhere
Tune reasoning depth and budget across providers with with_thinking, and get thinking output back when available:
chat = RubyLLM.chat(model: "claude-opus-4.5")
.with_thinking(effort: :high, budget: 8000)
response = chat.ask("Prove it with numbers.")
response.thinking&.text
response.thinking&.signature
response.thinking_tokens
response.thinking and chunk.thinking expose thinking content during normal and streaming requests.
response.thinking_tokens and response.tokens.thinking track thinking token usage when providers report it.
Gemini 3 Pro/Flash fully support thought signatures across chat and tool calls, so multi-step sessions stay consistent.
Extended thinking quirks are now normalized across providers so you can tune one API and get predictable output.
Stream thinking and answer content side-by-side:
chat = RubyLLM.chat(model: "claude-opus-4.5")
.with_thinking(effort: :medium)
chat.ask("Solve this step by step: What is 127 * 43?") do |chunk|
print chunk.thinking&.text
print chunk.content
end
Streaming stays backward-compatible: existing apps can keep printing chunk.content, while richer UIs can also render chunk.thinking.
π§° Rails + ActiveRecord Persistence
Thinking output can now be stored alongside messages (text, signature, and token usage), with an upgrade generator for existing apps: