Budget
OpenGateLLM allows you to define the costs for each model in the config.yml file then attach a budget to each user.
Model costs
For each model provider, you can define the costs of each model in the config.yml file for the prompt and completion tokens (per million tokens).
The following parameters are used for cost computation:
model_cost_prompt_tokensmodel_cost_completion_tokens
For more information, see Configuration documentation.
Example:
models:
[...]
- name: my-language-model
type: text-generation
providers:
- type: openai
url: https://api.openai.com
key: ${OPENAI_API_KEY}
model_name: gpt-4o-mini
model_cost_prompt_tokens: 0.1
model_cost_completion_tokens: 0.3
User budget
Each user has a budget defined by create user endpoint or update user endpoint. The budget is defined in the budget field. You need has admin permission to create or update a user (see Identity and access management documentation).
- Create user
- Update user
curl -X POST http://localhost:8000/v1/admin/users \
-H "Authorization: Bearer <api_key>" \
-H "Content-Type: application/json" \
-d '{
"email": "john.doe@example.com",
"role": 1,
"budget": 100
}'
curl -X PATCH http://localhost:8000/v1/admin/users/1 \
-H "Authorization: Bearer <api_key>" \
-H "Content-Type: application/json" \
-d '{
"budget": 100
}'
If budget is not defined, the user has no limit on the number of requests.
How it works
The compute cost is calculated based on the number of tokens used and the budget defined for the model based on the following formula:
cost = round((prompt_tokens / 1000000 * client.costs.prompt_tokens) + (completion_tokens / 1000000 * client.costs.completion_tokens), ndigits=6)
The compute cost returned in the response, in the usage.cost field. After the request is processed, the budget amount of the user is updated by the hooks decorator attached to each endpoint. The request cost is stored in the usage table, see usage monitoring documentation for more information.