Groups & Load Balancing
Load balancing allows you to distribute load across multiple AI providers. This helps optimize costs and avoid rate limits.
INFO
Distribution is based on the total amount of tokens consumed by referenced models in the group.
Defining Load Balancing Groups
Define in your config.toml (currently not configurable elsewhere). The general syntax can be in compact or expanded form as follows:
[groups.{group-name}]
models = [model1, model2, ...][groups.{group-name}]
models = [
{ name = "openai", weight = 1 },
{ name = "anthropic", weight = 2 }
]INFO
The referenced model names are resolved to full model names the same way Model Resolution takes place. Their respective configuration are as well resolved according to Config Resolution rules.
Example: Equal Distribution
The following defines a group called balanced that evenly distributes load across anthropic and openai.
[groups.balanced]
models = ["anthropic", "openai"]Multiple Providers
[groups.all-providers]
models = ["anthropic", "openai", "google"]Each provider receives approximately 33% of the load.
Example: Weighted Distribution
Give more requests to specific providers:
[groups.weighted]
providers = [
{ name = "openai", weight = 1 },
{ name = "anthropic", weight = 2 }
]In this example:
- anthropic: 66% of requests (weight 2 out of 3 total)
- openai: 33% of requests (weight 1 out of 3 total)
Usage of Groups
A group can be referenced by using its name as the model name whereever a model name can be configured.
For example in a prompt file:
---
model: balanced
---
Your prompt here