Skip to main content

09 -- Expression Evaluator

Pricing formulas in ducto are plain strings: input_tokens * 5 + output_tokens * 15. But executing arbitrary strings as code is dangerous -- that is how injection attacks happen. A naive approach would use eval() to turn a string into a number, but eval() can execute any Python expression, including calls to __import__ (to import the os module), open() (to read files), or globals() (to inspect runtime state). This is like giving a stranger the keys to every room in your house.

ducto's evaluate_expression uses Python's ast module, not eval(). It parses the expression into an abstract syntax tree, then validates every node against an explicit whitelist. Only allowed operations pass through. Think of it like airport security: every item is inspected before it gets on the plane. If a passenger tries to bring a banned item, security stops it before it reaches the gate. Similarly, if an expression contains an operation that is not on the whitelist, the evaluator raises a ValueError before any code is executed.

The whitelist includes standard arithmetic operators (addition, subtraction, multiplication, division, exponentiation, modulo), comparison operators (less than, greater than, equal to), Boolean operators (and, or, not), and a curated set of function names: min, max, if, tier, clamp, percentile, ceil, floor, round, sum, abs. Everything else -- __import__, exec, eval, open, lambda, attribute access, function calls on objects -- is blocked.

The expression variables map directly to usage metrics that ducto collects during inference: input_tokens and output_tokens for token-based pricing, cache_read_tokens and cache_write_tokens for cache discounts, tool_calls for tool usage pricing, search_queries and search_results for search/RAG operations, web_search_calls and code_exec_calls for agentic workflows, and fixed_job for flat-rate operations.

This design has a practical benefit: because pricing formulas are stored as strings in the database (in the credit_pricing_config table), you can update pricing without deploying new code. Change a formula in the database, and the next request uses the new price. Combined with the AST safety guarantees, this gives you both agility and security.

What we will do in this section: explore every function available in the expression evaluator -- arithmetic, min/max for floors and caps, conditional logic, volume discounts, bounds clamping, rounding, percentile statistics, and combined expressions -- and verify that dangerous operations are blocked.

Basic arithmetic

The simplest use of the expression evaluator is basic arithmetic. This is the foundation for all pricing formulas: multiply token counts by per-token rates and add them up. The arithmetic supports standard operators: +, -, *, /, //, %, **.

In a real pricing config, each model gets its own formula string. For example, GPT-4o might cost 5 credits per 1 000 input tokens plus 15 credits per 1 000 output tokens, while Claude Haiku costs 1 and 4 credits respectively. The expression evaluator turns these strings into concrete numbers.

What we will do in this section: calculate the cost of a GPT-4o inference with 500 input tokens and 200 output tokens using a simple arithmetic expression.

# Step 1: Import the evaluate_expression function from ducto's
# expression evaluation module. This function takes a formula string
# and a dictionary of variable values, and returns the computed result.
from ducto.expr import evaluate_expression

# Step 2: Calculate the credit cost for a GPT-4o inference.
# The formula "input_tokens * 5 + output_tokens * 15" means:
# - Each input token costs 5 credits
# - Each output token costs 15 credits
# With 500 input tokens and 200 output tokens:
# 500 * 5 = 2 500 for input
# 200 * 15 = 3 000 for output
# Total = 5 500 credits
r = evaluate_expression("input_tokens * 5 + output_tokens * 15",
{"input_tokens": 500, "output_tokens": 200})
print(f" 500 input tokens * 5 + 200 output tokens * 15 = {r} credits (expected: 5500)")
assert r == 5500

Minimum charge with max

The max(expression, minimum) function enforces a minimum charge per request. This is commonly known as a floor price. Even if a request is tiny -- for example, a single token response -- the minimum charge ensures the provider recovers at least their base cost.

Real-world use case: "Charge at least 1 credit per request, even for tiny responses." Without a floor, a 10-token response would cost 0.03 credits, which may be below the cost of processing the request overhead. The max function guarantees the price never falls below a specified minimum.

The function takes two arguments: the expression to evaluate and the minimum floor value. If the expression evaluates to less than the floor, the floor is returned instead. This is equivalent to max(computed_price, minimum_price).

What we will do in this section: test the max function with 100 input tokens (where the computed cost is below the minimum) and 500 input tokens (where it exceeds the minimum).

# Use max(expression, minimum) to guarantee a minimum charge.
# First test: 100 input tokens at 0.003 credits each = 0.3 credits.
# Since 0.3 is below the minimum of 1 credit, max() returns 1.
r = evaluate_expression("max(input_tokens * 0.003, 1)",
{"input_tokens": 100})
print(f" max(100 * 0.003, 1) = {r} (expected: 1) -- below floor, minimum applies")

# Second test: 500 input tokens at 0.003 credits each = 1.5 credits.
# Since 1.5 is above the minimum of 1 credit, max() returns 1.5.
r = evaluate_expression("max(input_tokens * 0.003, 1)",
{"input_tokens": 500})
print(f" max(500 * 0.003, 1) = {r} (expected: 1.5) -- above floor, computed value returned")

Price cap with min

The min(expression, cap) function enforces a maximum charge per request. This is a price ceiling. No matter how large the request, the price never exceeds the cap. This protects users from unexpectedly large bills due to unusually long responses or runaway loops.

Real-world use case: "Cap the model cost at 10 credits per request, even for very long outputs." Without a cap, a 10 000-token response would cost 200 credits at 0.02 credits per token. With a cap of 10 credits, the user pays at most 10 credits regardless of output length.

The function takes two arguments: the expression to evaluate and the maximum cap value. If the expression exceeds the cap, the cap is returned instead. This is equivalent to min(computed_price, max_price). Capped pricing is commonly offered by providers as a "price ceiling" feature for enterprise customers.

What we will do in this section: test the min function with 500 input tokens (where the computed cost exceeds the cap) and 200 input tokens (where it stays under).

# Use min(expression, cap) to enforce a maximum charge.
# First test: 500 input tokens at 0.02 credits each = 10 credits.
# Since 10 equals the cap of 10, min() returns 10 (the cap is reached).
r = evaluate_expression("min(input_tokens * 0.02, 10)",
{"input_tokens": 500})
print(f" min(500 * 0.02, 10) = {r} (expected: 10) -- at cap limit")

# Second test: 200 input tokens at 0.02 credits each = 4 credits.
# Since 4 is below the cap of 10, min() returns 4 (no cap applied).
r = evaluate_expression("min(input_tokens * 0.02, 10)",
{"input_tokens": 200})
print(f" min(200 * 0.02, 10) = {r} (expected: 4) -- under cap, no clamping")

Conditional pricing with if

The if(condition, then_value, else_value) function provides ternary conditional logic in expressions. This enables different pricing paths based on the usage metrics. The condition supports standard comparison operators: <, >, <=, >=, ==, !=. You can also combine conditions with and, or, and not.

Real-world use case: "Free for requests under 100 tokens, pay 0.01 per token above that threshold." This is a common pattern for freemium tiers where small requests are free to encourage experimentation. The conditional evaluates whether the condition is true; if so, it returns then_value, otherwise else_value.

The function performs lazy evaluation in the sense that only the selected branch is computed. However, since the arguments are already evaluated by the expression parser, both then_value and else_value must be valid expressions. For simple constant values like 0, this is trivially safe.

What we will do in this section: test the if function with 50 input tokens (free tier) and 500 input tokens (paid tier).

# Use if(condition, then_value, else_value) for conditional pricing.
# First test: 50 input tokens is under 100, so the condition
# "input_tokens < 100" is true, and the cost is 0 (free tier).
r = evaluate_expression("if(input_tokens < 100, 0, input_tokens * 0.01)",
{"input_tokens": 50})
print(f" if(50 < 100, 0, 50 * 0.01) = {r} (expected: 0) -- free tier activated")

# Second test: 500 input tokens is over 100, so the condition
# "input_tokens < 100" is false, and the cost is 500 * 0.01 = 5.
r = evaluate_expression("if(input_tokens < 100, 0, input_tokens * 0.01)",
{"input_tokens": 500})
print(f" if(500 < 100, 0, 500 * 0.01) = {r} (expected: 5) -- paid tier activated")

Volume discount with tier

The tier(value, threshold1, rate1, threshold2, rate2, ..., default_rate) function implements multi-threshold volume pricing. As usage increases, the rate decreases. This is the classic "economies of scale" pricing model: the more you use, the cheaper each unit becomes.

Real-world use case: "First 10 000 tokens at 0.02 per 1k, next 90 000 tokens at 0.01 per 1k, everything beyond at 0.005 per 1k." This pricing structure rewards high-volume users with progressively lower rates. The tier function selects the rate based on which threshold bucket the value falls into: if value < threshold1, use rate1; if value < threshold2, use rate2; otherwise use the default_rate.

The rate is then multiplied by the usage amount to compute the total cost. This is a separate step: tier() returns the applicable rate, and you multiply it by the usage to get the final price. This separation lets you apply the tiered rate to any dimension of usage.

What we will do in this section: test the tier function with 5 000 tokens (first tier), 50 000 tokens (second tier), and 200 000 tokens (third tier).

# tier(token_count, threshold1, rate1, threshold2, rate2, default_rate)
# First test: 5 000 tokens is below the first threshold (10 000),
# so the first tier rate of 0.02 per 1k tokens applies.
# Calculation: 0.02 * 5 000 / 1 000 = 0.1000
r = evaluate_expression(
"tier(input_tokens, 10000, 0.02, 100000, 0.01, 0.005) * input_tokens / 1000",
{"input_tokens": 5_000},
)
print(f" tier(5k tokens): rate = 0.02, total = {r:.4f} (expected: 0.1000 -- first tier)")

# Second test: 50 000 tokens is above the first threshold (10 000)
# but below the second threshold (100 000), so the second tier
# rate of 0.01 per 1k applies.
# Calculation: 0.01 * 50 000 / 1 000 = 0.5000
r = evaluate_expression(
"tier(input_tokens, 10000, 0.02, 100000, 0.01, 0.005) * input_tokens / 1000",
{"input_tokens": 50_000},
)
print(f" tier(50k tokens): rate = 0.01, total = {r:.4f} (expected: 0.5000 -- second tier)")

# Third test: 200 000 tokens is above both thresholds, so the
# default rate of 0.005 per 1k applies.
# Calculation: 0.005 * 200 000 / 1 000 = 1.0000
r = evaluate_expression(
"tier(input_tokens, 10000, 0.02, 100000, 0.01, 0.005) * input_tokens / 1000",
{"input_tokens": 200_000},
)
print(f" tier(200k tokens): rate = 0.005, total = {r:.4f} (expected: 1.0000 -- third tier)")

Bound values with clamp

The clamp(x, lo, hi) function restricts a value to the range [lo, hi]. If the value is below the lower bound, it is raised to lo. If it is above the upper bound, it is lowered to hi. If it is within range, it is returned unchanged.

Real-world use case: "Ensure the number of input tokens used for billing is between 100 and 500." This can be useful for minimum billing units: for example, charging for a minimum of 100 tokens even if fewer were used, but never charging for more than 500 tokens regardless of the actual count. This combines a floor and a cap in a single function.

The clamp function is a convenient shorthand for combining max and min: clamp(x, lo, hi) is equivalent to max(min(x, hi), lo) or min(max(x, lo), hi) -- both produce the same result. It is commonly used in utility billing, telecommunications, and any domain where quantities have both minimum and maximum charges.

What we will do in this section: test clamp with a value below the range (50, clamped to 100), above the range (1000, clamped to 500), and within the range (300, unchanged).

# Use clamp(x, lo, hi) to keep a value within [lo, hi].
# Test 1: input_tokens = 50, lo = 100, hi = 500.
# 50 is below 100, so the result is clamped to 100 (raised to minimum).
r = evaluate_expression("clamp(input_tokens, 100, 500)",
{"input_tokens": 50})
print(f" clamp(50, 100, 500) = {r} (expected: 100) -- raised to minimum")

# Test 2: input_tokens = 1000, lo = 100, hi = 500.
# 1000 is above 500, so the result is clamped to 500 (lowered to maximum).
r = evaluate_expression("clamp(input_tokens, 100, 500)",
{"input_tokens": 1000})
print(f" clamp(1000, 100, 500) = {r} (expected: 500) -- lowered to maximum")

# Test 3: input_tokens = 300, lo = 100, hi = 500.
# 300 is within [100, 500], so the result is unchanged at 300.
r = evaluate_expression("clamp(input_tokens, 100, 500)",
{"input_tokens": 300})
print(f" clamp(300, 100, 500) = {r} (expected: 300) -- within range, unchanged")

Rounding functions

ducto provides three rounding functions for whole-number credit billing: ceil (round up to the nearest integer), floor (round down to the nearest integer), and round(x, ndigits) (round to the nearest value with ndigits decimal places).

Real-world use case: "Round up all credit charges to the nearest whole credit." Ceil rounding ensures the provider always collects at least the computed cost, never less. Floor rounding provides a small discount to the user. Standard rounding to 2 decimal places is useful for fractional credit billing where sub-cent precision is acceptable.

These rounding functions are essential when pricing formulas produce fractional credit costs. For example, if your formula produces 0.999 credits per request, ceil would round to 1.0 and floor would round to 0.0. The choice depends on your business model: ceiling gives higher revenue, flooring gives better user experience.

What we will do in this section: test all three rounding functions with the same input value (333 input tokens at 0.003 credits per token = 0.999 credits).

# Test ceil: rounds 0.999 up to 1.0 (nearest whole credit).
# Ceil is useful for ensuring every request generates at least
# a minimum revenue unit, even if the computed cost is fractional.
r = evaluate_expression("ceil(input_tokens * 0.003)",
{"input_tokens": 333})
print(f" ceil(333 * 0.003) = {r} (expected: 1.0) -- round up to nearest integer")

# Test floor: rounds 0.999 down to 0.0 (nearest whole credit).
# Floor provides a discount by dropping the fractional portion.
r = evaluate_expression("floor(input_tokens * 0.003)",
{"input_tokens": 333})
print(f" floor(333 * 0.003) = {r} (expected: 0.0) -- round down to nearest integer")

# Test round with 2 decimal places: rounds 0.999 to 1.0.
# The second argument (2) specifies the number of decimal places.
# Standard round uses banker's rounding (round half to even).
r = evaluate_expression("round(input_tokens * 0.003, 2)",
{"input_tokens": 333})
print(f" round(333 * 0.003, 2 decimals) = {r} (expected: 1.0) -- round to 2 decimal places")

Percentile function

The percentile(p, v1, v2, ...) function computes the p-th percentile of the provided values using linear interpolation. Percentiles are useful for statistical pricing models where the rate depends on where the current usage falls in a distribution.

Real-world use case: "Charge based on the 90th percentile of recent latency measurements." If you have a set of historical latency values, the 90th percentile tells you the value below which 90 percent of observations fall. This is commonly used in pay-per-use API billing where the price is based on latency percentiles rather than raw token counts.

The percentile function uses linear interpolation between adjacent values when the percentile falls between two data points. This gives a smooth, continuous result rather than a discrete step function. The values can be any numeric expressions, not just constants.

What we will do in this section: compute the 90th percentile of the values (100, 200, 300) with input_tokens = 90, meaning we want the 90th percentile.

# Compute the 90th percentile of the values (100, 200, 300).
# The first argument (input_tokens = 90) is the percentile rank (0-100).
# The remaining arguments are the data points.
# For 3 data points, the 90th percentile falls between the 2nd and 3rd,
# and linear interpolation gives 280.
r = evaluate_expression("percentile(input_tokens, 100, 200, 300)",
{"input_tokens": 90})
print(f" percentile(90th, values=[100, 200, 300]) = {r}")
# 90th percentile of (100, 200, 300) = 280 (linear interpolation)

Combined expression

Real-world pricing formulas often combine multiple functions. For example, a model cost formula might multiply tokens by their rates, add a tool usage surcharge, and apply a floor with max -- all in a single expression string.

The power of the expression evaluator is that you can compose any combination of whitelisted functions and operators. This lets you express complex pricing logic as a single string stored in the database, without writing any application code.

What we will do in this section: evaluate a combined expression that calculates GPT-4o model cost (3 credits per input token, 15 per output token) plus a tool call surcharge (10 credits per tool call, floored at 0).

# A combined expression that calculates total cost from model
# usage and tool calls. The formula:
# - Input tokens: 3 credits each (1 000 * 3 = 3 000)
# - Output tokens: 15 credits each (400 * 15 = 6 000)
# - Tool calls: 10 credits each, floored at 0 (1 * 10 = 10)
# - Total = 3 000 + 6 000 + 10 = 9 010
expr = "input_tokens * 3 + output_tokens * 15 + max(tool_calls, 0) * 10"
r = evaluate_expression(expr, {"input_tokens": 1000, "output_tokens": 400, "tool_calls": 1})
print(f" Combined expression result: {r} credits")
# = 3000 + 6000 + 1*10 = 9010

Safety -- blocked operations

The most important feature of the expression evaluator is not what it can do, but what it cannot do. The AST whitelist blocks all dangerous operations by default. This includes __import__ (which could import the os module to execute system commands), open() (which could read arbitrary files), globals() (which could inspect or modify runtime state), and lambda (which could create callable objects).

When you attempt to use a blocked operation, the evaluator raises a ValueError with a clear message. It never falls through to eval() or exec(). The blocked operations are removed during the AST validation phase, before any code is executed. This is a compile-time check, not a runtime sandbox -- if the parser rejects the expression, it never runs.

You can verify this behavior by uncommenting any of the test lines below. Each one will raise a ValueError before the print statement is reached. The whitelist is intentionally restrictive: only arithmetic operators, comparison operators, Boolean operators, and the explicitly allowed function names (min, max, if, tier, clamp, percentile, ceil, floor, round, sum, abs) are permitted.

What we will do in this section: verify that the dangerous operations are blocked by checking that the print statement executes (indicating the blocked expressions never executed and the safe code path succeeded).

# These operations are blocked by the AST whitelist.
# Uncomment any of the following lines to see the ValueError:
# evaluate_expression("__import__('os').system('ls')", {})
# evaluate_expression("open('/etc/passwd').read()", {})
# evaluate_expression("globals()", {})
# evaluate_expression("lambda x: x", {})
print("All dangerous operations blocked by AST whitelist.")