Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getlago.com/llms.txt

Use this file to discover all available pages before exploring further.

Pricing AI features requires per-token billing. Historically that meant building middleware to extract usage from every provider, normalize it across response shapes, batch events to a billing system, and survive outages. That work isn’t your product. The Lago Agent SDK removes that layer. You wrap your existing LLM client and keep calling it the same way: same arguments, same return shape, same exceptions. The SDK extracts normalized token usage from each response and streams events to Lago in the background. Available in two flavors, both open source:

How it works

                  ┌────────────────┐
your code ──────▶ │ wrapped client │ ──────▶ provider (Bedrock / Mistral / …)
                  └────────┬───────┘
                           │ (extract usage)

                  ┌────────────────┐
                  │  Lago events   │ ──────▶ api.getlago.com
                  └────────────────┘
  • Wraps your existing LLM client in place. Your application code does not change.
  • Extracts usage from each response into a normalized shape (CanonicalUsage).
  • Buffers events in memory and flushes them in batches to Lago’s /events/batch endpoint.
  • Survives provider and Lago outages with exponential backoff and a bounded buffer.
  • p99 wrap overhead under 5 ms. Your LLM call is never blocked on Lago.
  • Never breaks your LLM call. Instrumentation errors are caught, logged, and optionally forwarded to your observability stack.

Supported providers

ProviderAccess
AWS BedrockConverse (sync + stream)
AWS BedrockInvokeModel (sync + stream), 7 model families
Mistralnative SDK (chat.complete + chat.stream)
OpenAInative SDK
Anthropicnative SDK
Google Gemininative SDK
Vercel AI SDKwrapLanguageModel middleware (JavaScript)
LiteLLMcallback bridge (Python)

Quickstart

1

Install the SDK

pip install lago-agent-sdk

# Optional extras for your provider:
pip install 'lago-agent-sdk[bedrock]'   # adds boto3
pip install 'lago-agent-sdk[mistral]'   # adds mistralai
2

Initialize and wrap your LLM client

Pass your Lago API key and a default external_subscription_id, then wrap your provider client. The returned object is a drop-in replacement.
import boto3
from lago_agent_sdk import LagoSDK

sdk = LagoSDK(
    api_key="<YOUR_LAGO_API_KEY>",
    default_subscription_id="sub_acme",
)
client = sdk.wrap(boto3.client("bedrock-runtime", region_name="eu-west-1"))
3

Make LLM calls normally

The wrapped client preserves the original signature, return shape, and exceptions. No call-site changes required.
resp = client.converse(
    modelId="eu.amazon.nova-lite-v1:0",
    messages=[{"role": "user", "content": [{"text": "Hello"}]}],
)
Streaming calls work the same way. client.converse_stream(...) in Python and ConverseStreamCommand in TypeScript both flow through the SDK and emit a single event per completed response.
4

Flush events on shutdown

Events flush automatically in the background. Call flush() explicitly at process exit (FastAPI shutdown hook, Express server close, AWS Lambda extension, etc.) so in-flight events are not lost.
sdk.flush()
5

Register billable metrics in Lago

Before events count toward charges, register matching billable metrics in your Lago tenant. The SDK ships with default metric codes (see Captured token dimensions below). Register each one as a sum_agg metric.Follow Create a billable metric to set them up, then attach charges to them in your plan. For a full example, see the per-token pricing template.

Mistral quickstart

The same wrap() call works with the native Mistral SDK:
from mistralai.client import Mistral
from lago_agent_sdk import LagoSDK

sdk = LagoSDK(api_key="...", default_subscription_id="sub_acme")
client = sdk.wrap(Mistral(api_key="..."))

resp = client.chat.complete(
    model="mistral-small-latest",
    messages=[{"role": "user", "content": "Hello"}],
)
sdk.flush()

Captured token dimensions

The SDK normalizes every provider response into a 10-field CanonicalUsage object and emits one event per non-zero field. The default metric codes match Lago’s conventions. Override them in the config if your tenant already uses different names.
Canonical fieldDefault Lago metric codeWhat it represents
inputllm_input_tokensPrompt tokens sent to the model
outputllm_output_tokensCompletion tokens generated
cache_readllm_cached_input_tokensPrompt tokens served from cache
cache_writellm_cache_creation_tokensPrompt tokens written to cache
cache_write_5mllm_cache_write_5m_tokensCache write with 5-minute TTL
cache_write_1hllm_cache_write_1h_tokensCache write with 1-hour TTL
reasoningllm_reasoning_tokensReasoning / thinking tokens (when surfaced)
tool_callsllm_tool_callsNumber of tool / function invocations
image_inputllm_image_input_tokensImage input tokens
audio_inputllm_audio_input_tokensAudio input tokens

Provider coverage

Which fields each adapter populates:
FieldBedrockMistral native
input
output
cache_read✓ (Anthropic on Bedrock)✓ (when cache hits)
cache_write✓ (Anthropic on Bedrock)
cache_write_5m / cache_write_1h✓ (Anthropic InvokeModel)
reasoningfolded into outputfolded into output
tool_calls
image_input / audio_input
Reasoning, image, and audio fields are populated by the native OpenAI, Anthropic, and Gemini adapters.

Multi-tenant: pick a subscription per call

Lago needs to know which customer to bill for each LLM call. The SDK resolves the external_subscription_id in this priority order:
  1. Per-call override: highest precedence, attached to the individual request.
  2. Context-bound: set once per request handler. Propagates safely across async boundaries.
  3. Default at init: fallback if nothing else is set.
# 1. Per-call override (highest precedence)
client.converse(
    modelId="eu.amazon.nova-lite-v1:0",
    messages=[{"role": "user", "content": [{"text": "Hello"}]}],
    extra_lago={
        "subscription": "sub_acme",
        "dimensions": {"feature": "summarize", "user_id": "u_42"},
    },
)

# 2. Context-bound. Propagates across asyncio tasks via contextvars.
sdk.set_subscription("sub_acme")
# all calls in this thread/asyncio task → sub_acme

# 3. Default at init (fallback)
sdk = LagoSDK(api_key="...", default_subscription_id="sub_default")
Any dimensions you pass go through to the event as properties. Useful for filter-based charges (model, feature, region, etc.). See Charges with filters.
If subscription resolution returns nothing (no per-call override, no context, no default), the event is dropped and an ERROR is logged. Make sure at least one of the three levels is set before your first call.

Configuration reference

Both SDKs expose the same configuration surface with idiomatic naming.
Python (LagoConfig)TypeScript (LagoConfig)DefaultPurpose
api_keyapiKey(required)Your Lago API key.
api_urlapiUrlhttps://api.getlago.com/api/v1Override for the EU region or self-hosted instances.
default_subscription_iddefaultSubscriptionIdNone / nullFallback external_subscription_id when none is set per call or context.
metric_codesmetricCodesDEFAULT_METRIC_CODESMap canonical field to your billable metric code.
flush_interval_secondsflushIntervalMs1.0 / 1000How often the background worker flushes the buffer.
max_batch_sizemaxBatchSize100Max events per request to /events/batch.
max_buffer_sizemaxBufferSize10_000In-memory cap. Oldest events drop with a warning when exceeded.
request_timeout_secondsrequestTimeoutMs10.0 / 10_000HTTP timeout per batch request.
max_retry_secondsmaxRetryMs60.0 / 60_000Upper bound on exponential backoff between retries.
on_erroronErrorNone / undefinedCallback for instrumentation failures. Wire it to Sentry, Datadog, etc.

Custom metric codes

If your Lago tenant already uses different metric codes, override them at init time:
from lago_agent_sdk import LagoConfig, LagoSDK

sdk = LagoSDK(
    api_key="...",
    config=LagoConfig(
        api_key="...",
        metric_codes={
            "input": "ai_input_tokens",
            "output": "ai_output_tokens",
        },
    ),
)

Error handling

The SDK never breaks your LLM call. If instrumentation fails (adapter bug, Lago unreachable, network error), the SDK catches the error, logs a warning, and your call returns normally.
Forward instrumentation errors to your observability stack with the on_error / onError hook:
import sentry_sdk
from lago_agent_sdk import LagoConfig, LagoSDK

def on_error(exc: Exception, where: str) -> None:
    sentry_sdk.capture_exception(exc, tags={"sdk_phase": where})

sdk = LagoSDK(
    api_key="...",
    config=LagoConfig(api_key="...", on_error=on_error),
)

Exception hierarchy

Both SDKs export the same error classes for callers that want to handle SDK errors explicitly:
  • LagoSDKError: base class for every SDK-raised error.
  • LagoApiError: non-2xx response from Lago.
  • LagoConfigError: invalid configuration at init time.
  • UnknownClientError: wrap() was called on a client the SDK does not recognize.

Verify the integration

1

Make a wrapped LLM call

Run one end-to-end request through the wrapped client, then call flush() explicitly to push the event immediately.
2

Check the event in Lago

In the Lago dashboard, open Developers → Events and confirm an event appears with the expected metric code and properties.
3

Confirm usage on the customer

Open the customer’s usage view and confirm the metric counter increased.
4

If nothing arrives

Check the on_error / onError callback first. Instrumentation failures are silent by design. The most common causes are an unregistered metric code, a missing external_subscription_id, or an API key without write access.

Resources