5 Powerful Python Decorators to Improve LLM Applications

Photo by editor

# Introduction

Python decorators are tailor-made solutions designed to help simplify complex software logic in a variety of applications, including LLM-based applications. Dealing with LLMs often involves dealing with unpredictable, slow, and often expensive third-party APIs, and decorators have a lot to do to make this work cleaner by wrapping, for example, API calls with optimized logic.

Let’s take a look at five useful Python decorators that will help you improve your LLM-based applications without significant overhead.

The accompanying examples illustrate the syntax and approach for using each decorator. They are sometimes shown without actual LLM usage, but are snippets of code designed to eventually become part of larger applications.

# 1. In-Memory Caching

This solution comes from Python. functools standard library, and is useful for expensive functions using LLMs. If we had an LLM API call in the function described below, wrapping it in an LRU (Least Recently Used) decorator adds a cache mechanism that prevents redundant requests containing the same input (prompts) in the same process or session. This is an elegant way to improve latency issues.

This example illustrates its use:

from functools import lru_cache
import time

@lru_cache(maxsize=100)
def summarize_text(text: str) -> str:
    print("Sending text to LLM...")
    time.sleep(1) # A simulation of network delay
    return f"Summary of {len(text)} characters."

print(summarize_text("The quick brown fox.")) # Takes one second
print(summarize_text("The quick brown fox.")) # Instant

# 2. Caching on persistent disk

Speaking of caching, the external library diskcache It takes this a step further by implementing a persistent cache on disk, ie via an SQLite database: very useful for storing the results of time-consuming functions such as LLM API calls. This way, results can be quickly retrieved in subsequent calls if needed. Consider using this decorator pattern when in-memory caching is insufficient because script or application execution may stall.

import time
from diskcache import Cache

# Creating a lightweight local SQLite database directory
cache = Cache(".local_llm_cache")

@cache.memoize(expire=86400) # Cached for 24 hours
def fetch_llm_response(prompt: str) -> str:
    print("Calling expensive LLM API...") # Replace this by an actual LLM API call
    time.sleep(2) # API latency simulation
    return f"Response to: {prompt}"

print(fetch_llm_response("What is quantum computing?")) # 1st function call
print(fetch_llm_response("What is quantum computing?")) # Instant load from disk happens here!

# 3. Flexible apps for the network

Since LLMs can often fail due to transient errors as well as timeouts and “502 Bad Gateway” responses to the Internet, using a network resilience library e.g. tenacity as well as @retry Decorator can help prevent these common network failures.

The example below illustrates this implementation of resilient behavior by randomly simulating a 70% probability of network failure. Try it a few times, and sooner or later you’ll see this error pop up: totally expected and intended!

import random
from tenacity import retry, wait_exponential, stop_after_attempt, retry_if_exception_type

class RateLimitError(Exception): pass

# Retrying up to 4 times, waiting 2, 4, and 8 seconds between each attempt
@retry(
    wait=wait_exponential(multiplier=2, min=2, max=10),
    stop=stop_after_attempt(4),
    retry=retry_if_exception_type(RateLimitError)
)
def call_flaky_llm_api(prompt: str):
    print("Attempting to call API...")
    if random.random() < 0.7: # Simulating a 70% chance of API failure
        raise RateLimitError("Rate limit exceeded! Backing off.")
    return "Text has been successfully generated!"

print(call_flaky_llm_api("Write a haiku"))

# 4. Client-side throttling

It uses shared decorators. ratelimit To control the frequency of calls to a library (typically very demanding) function: useful to avoid client-side limitations when using external APIs. The following example does this by specifying a Requests Per Minute (RPM) limit. The provider will reject prompts from the client application when too many concurrent prompts are initiated.

from ratelimit import limits, sleep_and_retry
import time

# Strictly enforcing a 3-call limit per 10-second window
@sleep_and_retry
@limits(calls=3, period=10)
def generate_text(prompt: str) -> str:
    print(f"({time.strftime('%X')}) Processing: {prompt}")
    return f"Processed: {prompt}"

# First 3 print immediately, the 4th pauses, thereby respecting the limit
for i in range(5):
    generate_text(f"Prompt {i}")

# 5. Structured Output Binding

Uses the fifth decorator in the list. magentic Library in association with Pydantic To provide an efficient interaction mechanism with LLMs through API, and receive structured responses. This simplifies the process of calling the LLM APIs. This process is important for LLMs to return formatted data such as JSON objects in a reliable manner. Decorator will handle core system notation and analysis led by Pydantic, resulting in improved token usage and a cleaner code base.

To try this example, you’ll need an OpenAI API key.

# IMPORTANT: An OPENAI_API_KEY set is required to run this simulated example
from magentic import prompt
from pydantic import BaseModel

class CapitalInfo(BaseModel):
    capital: str
    population: int

# A decorator that easily maps the prompt to the Pydantic return type
@prompt("What is the capital and population of {country}?")
def get_capital_info(country: str) -> CapitalInfo:
    ... # No function body needed here!

info = get_capital_info("France")
print(f"Capital: {info.capital}, Population: {info.population}")

# wrap up

In this article, we listed and illustrated five Python decorators based on various libraries that are of particular importance when used in the context of LLM-based applications to simplify logic, make processes more efficient, or improve network resiliency.

Iván Palomares Carrascosa He is a leader, author, speaker, and consultant in AI, Machine Learning, Deep Learning and LLMs. He trains and guides others in using AI in the real world.

# Introduction

# 1. In-Memory Caching

# 2. Caching on persistent disk

# 3. Flexible apps for the network

# 4. Client-side throttling

# 5. Structured Output Binding

# wrap up

Editor's pick

Get latest news

5 Powerful Python Decorators to Improve LLM Applications

# Introduction

# 1. In-Memory Caching

# 2. Caching on persistent disk

# 3. Flexible apps for the network

# 4. Client-side throttling

# 5. Structured Output Binding

# wrap up

How to deploy AI-generated code on a PaaS platform

There are 2 types of devs. One of them is bad. Interview with Justin Searle (Podcast #210)

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news