Skip to main content

04 – Analytics

Raw credit transactions are a stream of individual events — user X deducted Y credits at time Z. That is hard to read at a glance. ducto's analytics queries aggregate these events into meaningful summaries: total spend per user, breakdown by model, daily trends, and overall statistics. These queries are the foundation for customer-facing dashboards, internal cost analysis, and anomaly detection.

ducto provides five built-in analytics methods through every CreditStore implementation. spend_by_user() groups deductions by user and returns each user's total spend and transaction count. spend_by_model() breaks down credits consumed by AI model name. top_users() returns the highest-spending users, sorted by total consumption. daily_spend() groups transactions by calendar date to reveal trends over time. Finally, aggregate_stats() computes a single summary row with total credits, active user count, average daily spend, top model, and top user.

These queries are designed to be efficient against Postgres backends but work equally well with in-memory stores for testing and development. Together they form a complete analytics toolkit that answers the most common questions any platform operator needs: who is spending, what are they spending on, and how does spending evolve over time.

In this notebook we will seed a realistic dataset and then run each analytics query to see what it reveals. By the end you will understand how to extract actionable insights from raw credit event data using ducto's analytics layer.

Setup

Before we can query analytics, we need a running PostgresStore instance with a seeded database. The start_postgres_store() helper handles the connection lifecycle for us, initializing the schema and returning a ready-to-use store object.

The setup also imports the utilities we need: uuid for generating user identifiers, random for simulating realistic transaction amounts and model choices, and timezone for working with UTC timestamps. These are standard library modules — no additional dependencies required.

from datetime import datetime, timedelta
from ducto.interface.postgres import PostgresStore
from ducto.manager import CreditManager
from ducto.engine import PricingEngine
from ducto.metrics import UsageMetrics, ToolCall
from ducto.interface.models import (
PricingConfigData, PlanDefinition,
CreditMetadata,
)
from shared import start_postgres_store, cleanup

store, pgdata = start_postgres_store()
import uuid, random
from datetime import timezone
print("✔ PostgresStore ready.")

Seed sample data

Analytics queries are only useful when there is data to analyze. In this section we generate a realistic dataset: three users with a large initial credit balance, each performing one random transaction per day for seven days.

Why seven days and three users? This size is large enough to produce meaningful aggregation patterns — you will see variation in spend levels, model preferences, and daily activity — but small enough that every transaction remains interpretable. The randomness ensures that each run produces slightly different results, which is useful for understanding how the aggregation queries behave across different data distributions.

We assign each transaction to a random model from a pool of three: gpt-4o, claude-sonnet-4, and claude-haiku-3.5. This diversity is important because the spend_by_model query breaks down spend by model, and we want to see non-trivial results with multiple models contributing to the totals. In this section we generate the simulated transaction data that will feed all subsequent analytics queries in this notebook.

# Create 3 unique user identifiers for our sample dataset
users = [str(uuid.uuid4()) for _ in range(3)]

# Capture the current UTC time as the reference point for all transactions
now = datetime.now(timezone.utc)

# For each user, deposit a starting balance and simulate 7 days of activity
for u in users:
# Give each user a large initial balance of 100,000 credits
store.add_credits(u, 100_000, type="adjustment")
# Loop over each day in the 7-day sample window
for day_offset in range(7):
# Random transaction amount between 100 and 2,000 credits
amount = random.randint(100, 2_000)
# Step 1 of two-phase commit: reserve credits before deducting
res = store.reserve_credits(u, amount, operation_type="inference")
# Randomly pick which AI model this transaction is attributed to
model = random.choice(["gpt-4o", "claude-sonnet-4", "claude-haiku-3.5"])
# Step 2 of two-phase commit: deduct the reserved credits with model metadata
store.deduct_credits(u, res.reservation_id, amount, metadata=CreditMetadata(model=model))

print(f"Data seeding complete: 3 users with 7 days of randomized AI model transactions")

Spend by user (last 30 days)

The most basic question in any credit system is "who spent what?" The spend_by_user() method answers this by aggregating all deductions for each user within a time window and returning the total spend along with the transaction count.

The total_spend field tells you the raw credit consumption per user. The transaction_count field tells you how many individual operations contributed to that total. Together, these two numbers help distinguish between a user who spent a lot in a few expensive operations versus a user who spent a lot through many small transactions — two very different usage patterns that may warrant different responses.

In this section we query the last 30 days of activity across all three seeded users. Since our sample data only covers 7 days, all seeded transactions will fall within this window.

# Define a 30-day lookback window ending at the current UTC time
from datetime import timedelta, timezone
end = datetime.now(timezone.utc)
start = end - timedelta(days=30)

# Query the store for total spend grouped by user within this time window
rows = store.spend_by_user(start, end)

# Display each user's abbreviated ID, total credits consumed, and transaction count
for r in rows:
print(f" {r.user_id[:8]}{r.total_spend:>7} ({r.transaction_count} txns)")

Spend by model

Knowing who spent the most is useful, but in an AI platform the more important question is often "which models are driving costs?" The spend_by_model() method breaks down total credit consumption by model name, giving you a clear picture of where your infrastructure budget is going.

This query is essential for cost optimization: if one model accounts for 80 percent of spend, you might consider optimizing prompts, switching to a cheaper model for certain tasks, or implementing caching strategies. It also helps with capacity planning and billing analysis at the per-model level.

The output shows three columns: the model identifier, the total credits consumed, and the number of transactions using that model. A high-spend model with few transactions suggests expensive individual calls, while high spend with many transactions suggests high-volume usage at a moderate per-call cost. In this section we run spend_by_model() against our seeded data and examine which models consumed the most credits.

# Query total spend broken down by AI model
rows = store.spend_by_model(start, end)

# Print a formatted table header with columns for model, spend, and transactions
print(f"{'Model':<20} {'Spend':>7} {'Txns':>5}")

# Display each model's total spend and transaction count
for r in rows:
print(f" {r.model:<20} {r.total_spend:>7} {r.transaction_count:>5}")

Top users

The top_users() method is the admin "leaderboard" — it returns the highest-spending users within a time window, ordered by total spend descending. This is the first place you would look when investigating unusual billing activity, identifying your most valuable customers, or auditing for potential abuse.

We request the top 5 users, but since our sample only has 3 users, all of them will appear in the results. The method is designed to scale to thousands or millions of users, returning only the most significant contributors.

This query pairs naturally with spend_by_user: use top_users for the headline view and spend_by_user when you need every user's data for export or detailed analysis. In this section we retrieve and display the highest-spending users from our sample data.

# Retrieve the top 5 highest-spending users in the last 30 days
rows = store.top_users(limit=5, start=start, end=end)

# Print a formatted table header
print(f"{'User':<10} {'Spend':>7}")

# Display each user's abbreviated ID and total spend
for r in rows:
print(f" {r.user_id[:8]}{r.total_spend:>7}")

Daily spend

While per-user and per-model queries tell you "who" and "what," the daily_spend() query tells you "when." It groups transactions by calendar date within the time window, revealing usage patterns over time such as weekday spikes, weekend dips, or sustained growth trends.

A sudden spike in daily spend could indicate a popular new feature, a marketing campaign driving adoption, or a misconfigured application burning through credits in a loop. A gradual decline could signal churn or seasonal variation. For SaaS platforms, daily spend trends are a health metric that deserves a place on every operations dashboard.

In this section we query daily_spend() and examine the per-day totals. With 7 days of sample data you should see 7 rows with varying amounts, reflecting the random transaction sizes we seeded.

# Query total spend grouped by calendar date
rows = store.daily_spend(start, end)

# Print a formatted table header with columns for date, spend, and transactions
print(f"{'Date':<12} {'Spend':>7} {'Txns':>5}")

# Display each day's date, total credits consumed, and transaction count
for r in rows:
print(f"{r.date:<12} {r.total_spend:>7} {r.transaction_count:>5}")

Aggregate stats

The aggregate_stats() method is the executive summary — a single result object that distills the entire time window into five key numbers: total credits consumed, number of active users, average daily spend, the top-spending model, and the top-spending user.

This is the method you would call to populate a billing dashboard's header card. It provides an instant overview without requiring you to run multiple queries and compute the summaries yourself.

The returned AggregateStatsRow object contains all five fields as named attributes. In a real application you would feed these into a charting library, a notification system, or a periodic report sent to stakeholders. In this section we call aggregate_stats() and display the final summary numbers for our sample dataset.

# Compute a single summary row over the entire 30-day window
stats = store.aggregate_stats(start, end)

# Display each aggregate metric with a descriptive label
print(f" Total credits consumed: {stats.total_credits_consumed}")
print(f" Number of active users: {stats.active_users}")
print(f" Average daily spend: {stats.avg_daily_spend}")
print(f" Most expensive model: {stats.top_model}")
print(f" Highest spending user: {stats.top_user}")
cleanup(pgdata)