How to build a cost-effective AI agent with tiered model routing.

Most AI agent tutorials make the same mistake: they relegate every task to the most expensive model available.

GPT-4 is not required for character counting. A sonnet is not required to check for presence. Regex does not require anything other than Python.

The mistake is not using AI – not knowing when to stop using it.

This tutorial shows you how to create a tiered routing system that sends tasks to the cheapest model that can solve them. The pattern is called a cost curve. This comes from a comment thread on a DEV.to article, implemented by three developers over the weekend, and has effectively reduced the cost per URL of a real SEO audit agent from $0.006) to $0 for most pages.

Ultimately, you have to work. cost_curve.py You can drop modules into any agent project.

What will you make?

A three-level routing function that:

Deterministic Python checks first — zero API cost
Only for truly ambiguous cases does Cloud extend to Haiku — ~$0.0001 per call.
Claude increases in Sonnet only when a semantic decision is required — ~$0.006 per call
Returns gracefully when a rank fails.
Returns a consistent result schema regardless of which tier handled the request.

is part of the full implementation. dannwaneri/seo-agentan open core SEO audit agent. The cost curve module is the premium routing layer, and this principle applies to any agent with tasks of mixed complexity.

Conditions

The problem with calling Claude on everything
Cost curves defined.
Project setup
Tier 1: Deterministic Python
Tier 2: Claude Haikou for ambiguous cases
Tier 3: Clade Sonnet for Semantic Judgment
router: audit_url()
Graceful fallback
Examining the cost curve
Applying this pattern to your agent

The problem with calling Claude on everything

Here’s what most agent code looks like:

def audit_url(snapshot: dict) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        messages=({"role": "user", "content": build_prompt(snapshot)})
    )
    return parse_response(response)

It works. It also calls Sonnet for each URL in the list — including those where the title is 142 characters long and the response obviously fails without a model being included.

Claude Sonnet 4 costs $3 per million input tokens and $15 per million output tokens. A typical page snapshot is around 500 input tokens. That’s $0.0015 per URL for input only — before output tokens. In a 20-URL weekly audit, the total is about $0.12. Not expensive. But most of these pages have mechanical SEO problems: missing description, titles longer than 60 characters, no canonical tags. A character count captures them all. You don’t need a model.

The cost curve adjusts by routing based on what the work actually needs, not what the model is capable of.

Cost curves defined.

In a cost curve, we have three levels, three utilities, and three price points:

Tier 1 – Deterministic Python. Cost: $0. Check title length, description length, H1 count, canonical presence. These are not judgment calls. They are string operations. If title length > 60, fail. No model required.

Tier 2 – Claude Haiku. Cost: ~$0.0001 per call. The title exists but is only 4 characters long. Description is there but only 30 characters. The status code is a redirect. It passes the mechanical audit but something is off. Haiku is so fast and cheap that raising ambiguous cases costs less debugging time than you spend on false positives.

Tier 3 – Claude Sonnet. Cost: ~$0.006 per call. Pages flags haiku as requiring meaningful judgment. “This title goes by length but reads like a navigation label.” “This description reproduces the title word for word.” Sonnet earns its value on truly difficult cases — not every URL on the list.

Routing is decided before any API call. The resulting schema is the same regardless of which tier handled the request.

Project setup

mkdir cost-curve-demo && cd cost-curve-demo
pip install anthropic

Set your API key:

# macOS/Linux
export ANTHROPIC_API_KEY="sk-ant-..."

# Windows PowerShell
$env:ANTHROPIC_API_KEY = "sk-ant-..."

make cost_curve.py – You will build this module step by step.

Tier 1: Deterministic Python

Tier 1 runs first on each URL. It checks four fields using only Python string operations. There are no API calls, no latency, and no cost.

import json
import logging
import os
import re
from datetime import datetime, timezone

import anthropic

logger = logging.getLogger(__name__)

REDIRECT_CODES = {301, 302, 307, 308}

# Fields that trigger Tier 2 escalation
# Title or description present but suspiciously short
AMBIGUOUS_TITLE_MAX = 10   # chars — present but too short to be real
AMBIGUOUS_DESC_MAX = 50    # chars — present but too short to be useful


def _now_iso() -> str:
    return datetime.now(timezone.utc).isoformat()


def _build_result(snapshot: dict, method: str) -> dict:
    """Base result skeleton — same schema regardless of tier."""
    return {
        "url": snapshot.get("final_url", ""),
        "final_url": snapshot.get("final_url", ""),
        "status_code": snapshot.get("status_code"),
        "title": {"value": None, "length": 0, "status": "PASS"},
        "description": {"value": None, "length": 0, "status": "PASS"},
        "h1": {"count": 0, "value": None, "status": "PASS"},
        "canonical": {"value": None, "status": "PASS"},
        "flags": (),
        "human_review": False,
        "audited_at": _now_iso(),
        "method": method,
        "needs_tier3": False,
    }


def tier1_check(snapshot: dict) -> dict:
    """
    Pure Python SEO checks. Zero API calls.

    Returns a result dict with method="deterministic".
    Sets needs_tier3=False always — Tier 1 never escalates to Tier 3 directly.
    Escalation to Tier 2 is decided by the router, not here.
    """
    result = _build_result(snapshot, "deterministic")

    title = snapshot.get("title") or ""
    description = snapshot.get("meta_description") or ""
    h1s = snapshot.get("h1s") or ()
    canonical = snapshot.get("canonical") or ""

    # Title check
    result("title")("value") = title or None
    result("title")("length") = len(title)
    if not title or len(title) > 60:
        result("title")("status") = "FAIL"
        msg = "Title is missing" if not title else f"Title is {len(title)} characters (max 60)"
        result("flags").append(msg)

    # Description check
    result("description")("value") = description or None
    result("description")("length") = len(description)
    if not description or len(description) > 160:
        result("description")("status") = "FAIL"
        msg = "Meta description is missing" if not description else f"Meta description is {len(description)} characters (max 160)"
        result("flags").append(msg)

    # H1 check
    result("h1")("count") = len(h1s)
    result("h1")("value") = h1s(0) if h1s else None
    if len(h1s) == 0:
        result("h1")("status") = "FAIL"
        result("flags").append("H1 tag is missing")
    elif len(h1s) > 1:
        result("h1")("status") = "FAIL"
        result("flags").append(f"Multiple H1 tags found ({len(h1s)})")

    # Canonical check
    result("canonical")("value") = canonical or None
    if not canonical:
        result("canonical")("status") = "FAIL"
        result("flags").append("Canonical tag is missing")

    return result

Key design decisions: tier1_check() Never decides whether to grow or not. It only runs checks and returns. The router decides the increment based on the result.

Tier 2: Claude Haikou for ambiguous cases

Tier 2 runs when Tier 1 detects a mechanical failure but the result may require a second look. A 4 letter title exists but is clearly incorrect. A 30-character description that technically exists but is useless. A redirect status that requires a human-readable explanation.

Haiku is the correct model here. It is fast, cheap ($1 input / $5 output per million tokens), and adequate for triage-level judgments. The prompt asks a narrow question: Is it vague enough to require a sonnet?

def tier2_check(snapshot: dict) -> dict:
    """
    Claude Haiku call for ambiguous cases.

    Returns result with method="haiku".
    Sets needs_tier3=True if Haiku determines the case needs semantic judgment.
    Falls back to Tier 1 result on API error.
    """
    api_key = os.environ.get("ANTHROPIC_API_KEY")
    if not api_key:
        raise OSError("ANTHROPIC_API_KEY is not set.")

    client = anthropic.Anthropic(api_key=api_key)

    title = snapshot.get("title") or ""
    description = snapshot.get("meta_description") or ""
    status_code = snapshot.get("status_code")

    prompt = f"""You are an SEO auditor doing a quick triage check.

Page data:
- Title: {repr(title)} ({len(title)} chars)
- Meta description: {repr(description)} ({len(description)} chars)
- Status code: {status_code}

Answer these two questions with only "yes" or "no":
1. Does this page need semantic judgment beyond simple length/presence checks? 
   (e.g. title is present but clearly wrong, description is present but meaningless)
2. Is the status code a redirect that needs investigation?

Respond in this exact JSON format and nothing else:
{{"needs_tier3": true_or_false, "reason": "one sentence explanation"}}"""

    try:
        response = client.messages.create(
            model="claude-haiku-4-5-20251001",
            max_tokens=150,
            messages=({"role": "user", "content": prompt}),
        )
        raw = response.content(0).text.strip()
        # Strip markdown fences if present
        if raw.startswith("```"):
            lines = raw.splitlines()
            raw = "\n".join(lines(1:-1) if lines(-1).strip() == "```" else lines(1:))
        parsed = json.loads(raw)

        result = _build_result(snapshot, "haiku")
        # Copy Tier 1 field checks — Haiku doesn't redo those
        t1 = tier1_check(snapshot)
        result("title") = t1("title")
        result("description") = t1("description")
        result("h1") = t1("h1")
        result("canonical") = t1("canonical")
        result("flags") = t1("flags")
        result("needs_tier3") = parsed.get("needs_tier3", False)
        if result("needs_tier3"):
            result("flags").append(f"Escalated to Tier 3: {parsed.get('reason', '')}")

        return result

    except Exception as exc:
        logger.warning("(tier2) Haiku API error: %s — falling back to Tier 1 result", exc)
        fallback = tier1_check(snapshot)
        fallback("method") = "haiku-fallback"
        return fallback

Fallback is the key piece. If Haiku fails—rate limit, network error, invalid response—the function returns a Tier 1 result instead of crashing. Audit is in progress. The flag is set with the URL. method="haiku-fallback" so that you can identify it later.

Tier 3: Clade Sonnet for Semantic Judgment

Tier 3 is where the full withdrawal signal goes. This is the same call you would make in a simple process – the difference is that only a small fraction of URLs make it to this level.

def tier3_check(snapshot: dict) -> dict:
    """
    Claude Sonnet call for semantic judgment.

    Returns result with method="sonnet".
    This is the full extraction prompt — same as calling the model directly.
    """
    api_key = os.environ.get("ANTHROPIC_API_KEY")
    if not api_key:
        raise OSError("ANTHROPIC_API_KEY is not set.")

    client = anthropic.Anthropic(api_key=api_key)

    prompt = f"""You are an SEO auditor. Analyze this page snapshot and return ONLY a JSON object.
No prose. No explanation. No markdown fences. Raw JSON only.

Page data:
- URL: {snapshot.get('final_url')}
- Status code: {snapshot.get('status_code')}
- Title: {snapshot.get('title')}
- Meta description: {snapshot.get('meta_description')}
- H1 tags: {snapshot.get('h1s')}
- Canonical: {snapshot.get('canonical')}

Return this exact schema:
{{
  "url": "string",
  "final_url": "string",
  "status_code": number,
  "title": {{"value": "string or null", "length": number, "status": "PASS or FAIL"}},
  "description": {{"value": "string or null", "length": number, "status": "PASS or FAIL"}},
  "h1": {{"count": number, "value": "string or null", "status": "PASS or FAIL"}},
  "canonical": {{"value": "string or null", "status": "PASS or FAIL"}},
  "flags": ("array of strings describing specific issues"),
  "human_review": false,
  "audited_at": "ISO timestamp"
}}

PASS/FAIL rules:
- title: FAIL if null or length > 60 characters, or if present but clearly not a real title
- description: FAIL if null or length > 160 characters, or if present but meaningless
- h1: FAIL if count is 0 or count > 1
- canonical: FAIL if null
- audited_at: use current UTC time"""

    try:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1000,
            messages=({"role": "user", "content": prompt}),
        )
        raw = response.content(0).text.strip()
        if raw.startswith("```"):
            lines = raw.splitlines()
            raw = "\n".join(lines(1:-1) if lines(-1).strip() == "```" else lines(1:))

        result = json.loads(raw)
        result("method") = "sonnet"
        result("needs_tier3") = False
        return result

    except Exception as exc:
        logger.warning("(tier3) Sonnet API error: %s — falling back to Tier 1 result", exc)
        fallback = tier1_check(snapshot)
        fallback("method") = "sonnet-fallback"
        return fallback

Note the quick increase in Tier 3 that Tier 1 does not have: "or if present but clearly not a real title" And "or if present but meaningless". This is the meaningful decision that Haiku identifies as necessary. Tier 3 follows this.

router: audit_url()

The router is the public interface. Everything else is an implementation detail.

def audit_url(snapshot: dict, tiered: bool = False) -> dict:
    """
    Route a page snapshot through the appropriate audit tier.

    Args:
        snapshot: Page data from browser.py — must contain final_url,
                  status_code, title, meta_description, h1s, canonical.
        tiered: If False, delegates directly to Tier 3 (Sonnet).
                If True, routes through the cost curve.

    Returns:
        Audit result dict with method field indicating which tier ran.
    """
    if not tiered:
        # Non-tiered mode: call Sonnet directly, same as v1 behavior
        return tier3_check(snapshot)

    # Tier 1: always runs first
    t1_result = tier1_check(snapshot)

    # Check if escalation to Tier 2 is warranted
    title = snapshot.get("title") or ""
    description = snapshot.get("meta_description") or ""
    status_code = snapshot.get("status_code")

    needs_tier2 = (
        # Title present but suspiciously short
        (title and len(title) < AMBIGUOUS_TITLE_MAX) or
        # Description present but suspiciously short
        (description and len(description) < AMBIGUOUS_DESC_MAX) or
        # Redirect status — may need explanation
        (status_code in REDIRECT_CODES)
    )

    if not needs_tier2:
        # Tier 1 result is definitive — return without any API call
        return t1_result

    # Tier 2: Haiku triage
    t2_result = tier2_check(snapshot)

    if not t2_result.get("needs_tier3", False):
        # Haiku determined no semantic judgment needed
        return t2_result

    # Tier 3: Sonnet for semantic judgment
    return tier3_check(snapshot)

The logic of the router is clear and readable. Each decision point is a designated condition. when tiered=FalseThe behavior is identical to v1’s simple implementation — it’s a backward compatibility guarantee that lets you incrementally add a cost curve without breaking existing audits.

Graceful fallback

The fallback pattern appears in both Tier 2 and Tier 3. It is worth clarifying:

# Pattern used in both tier2_check() and tier3_check()
except Exception as exc:
    logger.warning("(tierN) API error: %s — falling back to Tier 1 result", exc)
    fallback = tier1_check(snapshot)
    fallback("method") = "tierN-fallback"
    return fallback

It does three things:

Logs the error with enough context for later debugging.
Returns a valid result — the Tier 1 deterministic check always runs.
Tag the result with a fallback method so you can filter them in your report

An agent that crashes on API errors is not ready for production. An agent that degrades gracefully and persists.

Examining the cost curve

make test_cost_curve.py To verify routing behavior without live API calls:

import json
from unittest import mock

from cost_curve import audit_url, tier1_check


def make_snapshot(title="Normal Title Under 60 Chars",
                  description="A normal meta description that is under 160 characters and describes the page content well.",
                  h1s=("Single H1"),
                  canonical="
                  status_code=200,
                  final_url="
    return {
        "title": title,
        "meta_description": description,
        "h1s": h1s,
        "canonical": canonical,
        "status_code": status_code,
        "final_url": final_url,
    }


def test_clean_page_returns_tier1_no_api_calls():
    """Clean page: all checks pass deterministically — no API call."""
    snapshot = make_snapshot()
    with mock.patch("anthropic.Anthropic") as mock_client:
        result = audit_url(snapshot, tiered=True)
        assert result("method") == "deterministic"
        mock_client.assert_not_called()
    print("PASS: clean page → Tier 1, zero API calls")


def test_long_title_returns_tier1_fail_no_api_call():
    """Title >60 chars: FAIL from Tier 1, no API call."""
    snapshot = make_snapshot(title="A" * 70)
    with mock.patch("anthropic.Anthropic") as mock_client:
        result = audit_url(snapshot, tiered=True)
        assert result("method") == "deterministic"
        assert result("title")("status") == "FAIL"
        mock_client.assert_not_called()
    print("PASS: title >60 → Tier 1 FAIL, zero API calls")


def test_suspiciously_short_title_escalates_to_tier2():
    """Title present but 4 chars: escalates to Tier 2."""
    snapshot = make_snapshot(title="SEO")  # 3 chars — under AMBIGUOUS_TITLE_MAX
    mock_response = mock.MagicMock()
    mock_response.content = (mock.MagicMock(
        text="{"needs_tier3": false, "reason": "title is short but not ambiguous"}"
    ))
    with mock.patch("anthropic.Anthropic") as mock_client:
        mock_client.return_value.messages.create.return_value = mock_response
        result = audit_url(snapshot, tiered=True)
        assert result("method") == "haiku"
        assert mock_client.return_value.messages.create.call_count == 1
    print("PASS: short title → Tier 2 (Haiku called once)")


def test_tiered_false_calls_sonnet_directly():
    """tiered=False: Sonnet called regardless of snapshot content."""
    snapshot = make_snapshot()  # clean page, would be Tier 1 in tiered mode
    mock_response = mock.MagicMock()
    mock_response.content = (mock.MagicMock(text=json.dumps({
        "url": "
        "final_url": "
        "status_code": 200,
        "title": {"value": "Normal Title Under 60 Chars", "length": 27, "status": "PASS"},
        "description": {"value": "desc", "length": 4, "status": "PASS"},
        "h1": {"count": 1, "value": "Single H1", "status": "PASS"},
        "canonical": {"value": " "status": "PASS"},
        "flags": (),
        "human_review": False,
        "audited_at": "2026-04-01T00:00:00+00:00",
    })))
    with mock.patch("anthropic.Anthropic") as mock_client:
        mock_client.return_value.messages.create.return_value = mock_response
        result = audit_url(snapshot, tiered=False)
        assert result("method") == "sonnet"
        assert mock_client.return_value.messages.create.call_count == 1
    print("PASS: tiered=False → Sonnet called directly")


def test_haiku_api_failure_falls_back_to_tier1():
    """Haiku failure: falls back to Tier 1 result, no crash."""
    snapshot = make_snapshot(title="SEO")  # triggers Tier 2
    with mock.patch("anthropic.Anthropic") as mock_client:
        mock_client.return_value.messages.create.side_effect = Exception("rate limit")
        result = audit_url(snapshot, tiered=True)
        assert result("method") == "haiku-fallback"
    print("PASS: Haiku failure → fallback to Tier 1, no crash")


if __name__ == "__main__":
    test_clean_page_returns_tier1_no_api_calls()
    test_long_title_returns_tier1_fail_no_api_call()
    test_suspiciously_short_title_escalates_to_tier2()
    test_tiered_false_calls_sonnet_directly()
    test_haiku_api_failure_falls_back_to_tier1()
    print("\nAll tests passed.")

run it:

python test_cost_curve.py

Expected Production:

PASS: clean page → Tier 1, zero API calls
PASS: title >60 → Tier 1 FAIL, zero API calls
PASS: short title → Tier 2 (Haiku called once)
PASS: tiered=False → Sonnet called directly
PASS: Haiku failure → fallback to Tier 1, no crash

Applying this pattern to your agent

The cost curve is not specific to SEO. Any agent with mixed complex tasks can use it.

Principle: Before deciding which model to use, categorize tasks according to their actual need.

Customer Support Agent:

Tier 1: Similarity of keywords to leading query topics – no model.
Tier 2: Haiku for rating intent on ambiguous questions
Tier 3: Sonnets for complex complaints requiring adjudication.

Code Review Agent:

Tier 1: lint rules, syntax checking – no model.
Tier 2: Haiku to detect common patterns
Tier 3: Sonnet for Architectural Review

Content Moderation Agent:

Tier 1: Blocklist matching – no model.
Tier 2: Haiku for borderline cases
Tier 3: Sonnets for context-dependent judgment

The implementation method is same in all the three cases. gave audit_url() becomes a router route_task(). Grade functions change their notation and addition conditions. The fallback logic remains the same.

Important question to ask before writing any agent code: What part of my inputs can be solved mechanically? That part goes to Tier 1. The rest grows. The cost curve constrains everything.

wrap up

The full implementation — including the SEO audit agent that uses this module in production — is on. dannwaneri/seo-agent. gave core/ The directory is MIT licensed. Tiered routing resides in it. premium/cost_curve.py.

This tutorial is its companion part. I was paying $0.006 per url for SEO audit until I realized that most needed $0 at DEV.to, which covers architecture decisions behind the cost curve.

What will you make?

Conditions

Table of Contents

The problem with calling Claude on everything

Cost curves defined.

Project setup

Tier 1: Deterministic Python

Tier 2: Claude Haikou for ambiguous cases

Tier 3: Clade Sonnet for Semantic Judgment

router: audit_url()

Graceful fallback

Examining the cost curve

Applying this pattern to your agent

wrap up

Editor's pick

Get latest news

How to build a cost-effective AI agent with tiered model routing.

What will you make?

Conditions

Table of Contents

The problem with calling Claude on everything

Cost curves defined.

Project setup

Tier 1: Deterministic Python

Tier 2: Claude Haikou for ambiguous cases

Tier 3: Clade Sonnet for Semantic Judgment

router: audit_url()

Graceful fallback

Examining the cost curve

Applying this pattern to your agent

wrap up

Career Operations on Cloud: An AI-powered job search system built on Cloud Code

Recost | Your API costs are fully visible.

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news