Routing
The Albert API allows you to configure one or more external API clients for each model.
These clients are defined in the configuration file (see deployment).
A single model can have multiple clients.
Example Configuration
In the example below, the turbo model is configured with two clients: an OpenAI client and a vLLM client.
The model can be called either using its ID (turbo) or using the alias defined in the aliases field (turbo-alias).
Each client calls a different model, specified by the model field.
For example, the OpenAI client calls gpt-3.5-turbo, while the vLLM client calls meta-llama/Llama-3.1-8B-Instruct.
❗️ Important:
When configuring multiple clients for a model, we strongly recommend that they are of the same type and call the same underlying model.
Otherwise, responses may have different structures.
models:
- id: turbo
type: text-generation
aliases: ['turbo-alias']
load_balancing_strategy: least_busy
clients:
- model: gpt-3.5-turbo
type: openai
args:
api_url: https://api.openai.com
api_key: sk-...sA
timeout: 60
- model: meta-llama/Llama-3.1-8B-Instruct
type: vllm
args:
api_url: http://localhost:8000
api_key: sf...Df
timeout: 60
Code Logic
When the API starts, a ModelRegistry object is initialized.
This registry contains a ModelRouter for each model defined under models in the configuration file.
Each ModelRouter contains one or more ModelClient objects, as specified in the clients list.
ModelRegistry
ModelRegistry acts like a dictionary and allows retrieving a model by its ID or one of its aliases (see deployment).
from app.utils.lifespan import models
model = models["guillaumetell-7b"]
If the model does not exist, the API returns an HTTP 404 error (Model not found) instead of
raising a KeyError.
The returned object is a ModelRouter, which contains the model’s configuration and its associated clients.
ModelRouter
The ModelRouter object stores the model configuration and its clients.
It exposes a get_client method to select a client for the model.
If multiple clients are available, the method selects one according to the configured routing_strategy
(see deployment)..
The model information corresponds to what is returned by the GET /v1/models endpoint:
id: model identifier used by clientstype: model type (see models)aliases: list of model aliasesmax_context_length: maximum input length supported by the model
from app.utils.lifespan import models
model = models["guillaumetell-7b"]
client = model.get_client(endpoint="chat/completions")
The endpoint parameter is optional.
If not provided, get_client checks that the model type is compatible with the requested endpoint.
ModelClient
ModelClient is an AsyncOpenAI-like object that handles requests to the external API.
It exposes three main attributes:
api_url: the external API URLapi_key: the external API keymodel: the external model ID
Several ModelClient subclasses exist, such as VllmModelClient and OpenAIModelClient.
Each defines an ENDPOINT_TABLE mapping the supported external API endpoints to Albert API endpoints.
Routing strategies
Shuffle
The shuffle strategy randomly distributes requests among available clients in a balanced way