Routing

The Albert API allows you to configure one or more external API clients for each model.
These clients are defined in the configuration file (see deployment).
A single model can have multiple clients.

Example Configuration

In the example below, the turbo model is configured with two clients: an OpenAI client and a vLLM client.
The model can be called either using its ID (turbo) or using the alias defined in the aliases field (turbo-alias).

Each client calls a different model, specified by the model field.
For example, the OpenAI client calls gpt-3.5-turbo, while the vLLM client calls meta-llama/Llama-3.1-8B-Instruct.

❗️ Important:
When configuring multiple clients for a model, we strongly recommend that they are of the same type and call the same underlying model.
Otherwise, responses may have different structures.

models:
  - id: turbo
    type: text-generation
    aliases: ['turbo-alias']
    load_balancing_strategy: least_busy
    clients:
      - model: gpt-3.5-turbo
        type: openai
        args:
          api_url: https://api.openai.com
          api_key: sk-...sA
          timeout: 60
      - model: meta-llama/Llama-3.1-8B-Instruct
        type: vllm
        args:
          api_url: http://localhost:8000
          api_key: sf...Df
          timeout: 60

Code Logic

When the API starts, a ModelRegistry object is initialized. This registry contains a ModelRouter for each model defined under models in the configuration file. Each ModelRouter contains one or more ModelClient objects, as specified in the clients list.

ModelRegistry

ModelRegistry acts like a dictionary and allows retrieving a model by its ID or one of its aliases (see deployment).

from app.utils.lifespan import models

model = models["guillaumetell-7b"]

If the model does not exist, the API returns an HTTP 404 error (Model not found) instead of raising a KeyError.

The returned object is a ModelRouter, which contains the model’s configuration and its associated clients.

ModelRouter

The ModelRouter object stores the model configuration and its clients. It exposes a get_client method to select a client for the model.

If multiple clients are available, the method selects one according to the configured routing_strategy (see deployment)..

The model information corresponds to what is returned by the GET /v1/models endpoint:

id : model identifier used by clients
type : model type (see models)
aliases : list of model aliases
max_context_length : maximum input length supported by the model

from app.utils.lifespan import models

model = models["guillaumetell-7b"]

client = model.get_client(endpoint="chat/completions")

The endpoint parameter is optional. If not provided, get_client checks that the model type is compatible with the requested endpoint.

ModelClient

ModelClient is an AsyncOpenAI-like object that handles requests to the external API. It exposes three main attributes:

api_url : the external API URL
api_key : the external API key
model : the external model ID

Several ModelClient subclasses exist, such as VllmModelClient and OpenAIModelClient. Each defines an ENDPOINT_TABLE mapping the supported external API endpoints to Albert API endpoints.

Routing strategies

Shuffle

The shuffle strategy randomly distributes requests among available clients in a balanced way

Example Configuration​

Code Logic​

ModelRegistry​

ModelRouter​

ModelClient​

Routing strategies​

Shuffle​