Skip to content

Providers

Providers are connections to external AI services that power the platform's features. They're shared across all projects — you set up a provider once, and any project can use it.

Provider Types

Bonsai uses four types of external services:

LLM — Language Models

These power the AI's "brain" — text generation, intent classification, data extraction, and tool execution. The same providers can also be used for Embeddings.

API TypeProviderNotes
OpenAIOpenAIGPT-4o, GPT-4o-mini, o1, o3, etc. Native OpenAI integration.
AnthropicAnthropicClaude 3.5, Claude 3 Opus, Claude 4, etc. Native Anthropic integration.
Google GeminiGoogleGemini 2.0, Gemini 2.5, Gemini 3, etc. Native Gemini integration.
Mistral AIMistralMistral Large, Small, Codestral, etc.
GroqGroqUltra-fast inference for Llama, Mixtral, and other models.
DeepSeekDeepSeekDeepSeek-V3, DeepSeek-R1, etc.
xAI (Grok)xAIGrok models.
OpenRouterOpenRouterUnified gateway to hundreds of models from many providers.
Together AITogether AIOpen-source and proprietary models.
Fireworks AIFireworks AIFast inference for open-source models.
Perplexity AIPerplexityModels with built-in web search.
CohereCohereCommand R+ and other Cohere models.
OpenAI-compatibleAnyGeneric OpenAI-compatible API endpoint. Use for self-hosted models (vLLM, Ollama, LM Studio) or any provider not listed above.

Used by: Stages (response generation), Classifiers (intent detection), Context Transformers (data extraction), Tools (function execution).

TTS — Text-to-Speech

These convert the AI's text responses into spoken audio.

API TypeFeatures
ElevenLabsHigh-quality voices, multilingual, voice cloning
OpenAI TTSSimple, fast, several voice options
Azure SpeechWide language support, neural voices
DeepgramLow-latency streaming voices
CartesiaLow-latency, multilingual

Used by: Agents (voice configuration).

ASR — Automatic Speech Recognition

These convert user speech into text in real time.

API TypeFeatures
Azure SpeechWide language support, real-time streaming
DeepgramHigh accuracy, real-time streaming
ElevenLabsSpeech recognition capabilities
AssemblyAIUniversal streaming transcription
SpeechmaticsReal-time speech-to-text

Used by: Projects (ASR configuration for voice input).

Storage

These store conversation artifacts like audio recordings and transcripts.

API TypeDescription
Amazon S3S3 or any S3-compatible storage
Azure BlobAzure Blob Storage
Google Cloud StorageGCS buckets
LocalLocal filesystem (for development)

Used by: Projects (storage configuration for conversation artifacts).

Creating a Provider

Go to Administration > Providers and click Create Provider.

Fields

  • Name — A descriptive label (e.g., "OpenAI GPT-4o", "ElevenLabs Production").
  • Description — Optional notes.
  • Provider Type — Select the category: LLM, TTS, ASR, or Storage.
  • API Type — Select the specific service (e.g., OpenAI, Anthropic).
  • Configuration — Provider-specific connection settings (API key, base URL, etc.).
  • Tags — Optional labels for organizing providers.

Configuration

Each provider type requires specific settings. At minimum, most need:

  • API Key — Your authentication credential for the service.
  • Base URL (optional) — Override the default endpoint, useful for proxies or self-hosted instances.

The exact fields vary by provider — the form dynamically shows the relevant settings when you select the API type.

Where Providers Are Used

ResourceProvider Type
ProjectASR (speech input), Storage (artifacts)
AgentTTS (voice output)
StageLLM (response generation)
ClassifierLLM (intent classification)
Context TransformerLLM (data extraction)
ToolLLM (function execution)

LLM Settings

When you reference an LLM provider (on a stage, classifier, transformer, or tool), you also configure LLM settings that control how the model behaves:

  • Model — Which specific model to use (e.g., gpt-4o, claude-3-5-sonnet). You can pick from the provider's catalog or enter a custom model name.
  • Max Tokens — Maximum number of output tokens in the response.
  • Temperature — How creative/random the output is (0 = deterministic, higher = more creative). Disabled when reasoning/thinking is active.
  • Top P — Nucleus sampling threshold. Disabled (or limited) when reasoning/thinking is active.
  • Timeout — Request timeout in milliseconds.

Provider-specific reasoning/thinking settings:

OpenAI (o1, o3, and reasoning models):

  • Reasoning Effort — Controls the depth of internal reasoning (low, medium, high, xhigh). When set, temperature and Top P are disabled.
  • Reasoning Summary — Optionally include a summary of the model's reasoning (concise, detailed, or auto).

Anthropic (Claude extended thinking):

  • Thinking Mode — Enable Claude's extended thinking (enabled for a manual token budget, adaptive for Claude Opus 4.6+ which auto-adjusts). When active, temperature is disabled and Top P is limited to 0.95–1.0.
  • Thinking Budget Tokens — Maximum tokens for internal reasoning when using enabled mode (minimum 1024).

Google Gemini (thinking models):

  • Thinking Level — Predefined reasoning depth for Gemini 3 models (minimal, low, medium, high).
  • Thinking Budget — Token budget for Gemini 2.5 models (-1 for dynamic, 0 to disable, or a specific value from 128–32768).
  • Include Thoughts — Attach thought summaries to the response for debugging.

These settings are configured where the provider is referenced (e.g., on the stage), not on the provider itself. This way, one provider can be used with different settings in different contexts.

Security

Provider configurations contain sensitive data (API keys, connection strings). Only operators with the appropriate permissions can view or modify provider settings.

Tips

  • Name providers descriptively — Include the service and purpose: "OpenAI - GPT-4o Production" is better than "LLM Provider".
  • Have separate providers for different uses — You might use GPT-4o for response generation but GPT-4o-mini for classification (faster and cheaper).
  • Manage API keys carefully — Provider API keys give access to paid external services. Keep them secure and monitor usage.
  • Use tags for organization — If you have many providers, tags help you find them quickly.
  • Test before going live — Create a conversation in the Playground and verify the provider is working correctly before deploying to production.

Released under the Apache-2.0 License.