In this article, you will learn how Mistral is using Lago to build a billing system based on AI tokens. This template is suitable for Large Language Model (LLM) and Generative AI companies whose pricing can vary based on the application or model used.

Summary

1

Aggregate usage with filters

Use single metric to meter tokens with different filters. (More here).

2

Set up a per-token pricing

Create a plan to price packages of tokens used. (More here).

3

Prepay usage with credits

Prepay usage with credits and set top-up rules in real-time. (More here).

4

Ingest usage in real-time

Retrieve consumed tokens in real-time. (More here).

Pricing structure

For Mistral, pricing depends on the language model used. Here are several price points they offer: Prices are per 1M tokens used. You can think of tokens as pieces of words (learn more here).”

Open source models pricing

ModelsInputOutput
open-mistral-7b$0.25 / 1M tokens$0.25 / 1M tokens
open-mixtral-8x7b$0.7 / 1M tokens$0.7 / 1M tokens
open-mixtral-8x22b$2 / 1M tokens$6 / 1M tokens

Optimized models pricing

ModelsInputOutput
mistral-small$1 / 1M tokens$3 / 1M tokens
mistral-medium$2.7 / 1M tokens$8.1 / 1M tokens
mistral-large$4 / 1M tokens$12 / 1M tokens

Step 1: Aggregate usage with filters

Lago monitors usage by converting events into billable metrics. To illustrate how this works, we are going to take the Optimized models (Mistral small, medium or large models) as an example. Mistral pricing includes a single metric based on the total number of tokens processed on the platform.

To create the corresponding metric, we use the sum aggregation type, which will allow us to record usage and calculate the total number of tokens used. In this case, the aggregation type is metered. This means that usage is reset to 0 at the beginning of the next billing cycle. For this metric, there are two dimensions that will impact the price of the token:

  • Model: mistral-small, mistral-medium or mistral-large; and
  • Type: Input data or Output data.

Therefore, we propose integrating these two dimensions into our metric as filters:

  • Filter #1: Distinguishes between various models utilized; and
  • Filter #2: Separates input and output types. By implementing these filters, we can assign distinct prices to a single metric, based on events’ properties.

Step 2: Set up a per token pricing

When creating a new plan, the first step is to define the plan model, including billing frequency and subscription fee. Mistral pricing is ‘pay-as-you-go’, which means that there’s no subscription fee (i.e. customers only pay for what they use).

Our plan includes the ‘per 1M tokens’ charge, for which we choose the package pricing model. As we have defined 2 filters (models and type), we can set a specific price for each Model/Type combination. We can apply the same method to create plans for other models, like Embeddings or Open source models. Our plan is ready to be used, now let’s see how Lago handles billing by ingesting usage.

Step 3: Prepay usage with credits

Mistral, following industry practice, employs prepaid credits to facilitate payment collection in advance. Users prepay for credits corresponding to their anticipated model usage. Moreover, Lago actively monitors credit utilization in real-time and offers customizable top-up rules based on predefined thresholds or time periods.

From the user interface:

  • Create a new wallet for prepaid credits;
  • Set a ratio (e.g., 1 credit equals $1);
  • Specify the number of credits for offer or purchase; and
  • Configure recurring top-up rules based on real-time usage (threshold or interval top-ups).

Step 4: Ingest usage in real-time

Mistral records the token usage based on the model used. These activities are converted into events that are pushed to Lago. Let’s take the Optimized Models as an example: Lago will group events according to:

  • The billable metric code;
  • The model; and
  • The type.

For each charge, the billing system will then automatically calculate the total token usage and corresponding price. This breakdown will be displayed in the ‘Usage’ tab of the user interface and on the invoice sent to the customer.

Wrap-up

Per-token pricing offers flexibility and visibility, and allows LLM and Generative AI companies like Mistral to attract more customers. With Lago, you can create your own metric dimensions to adapt this template to your products and services.

Give it a try, click here to get started!