
Photo by author
# Introduction
Python is one of the most beginner-friendly languages out there. But if you’ve worked with it for a while, you’ve probably run into loops that take minutes to finish, data processing tasks that gobble up all your memory, and more.
You don’t need to be a performance optimization expert to make significant improvements. Most slow Python code is due to a handful of common problems that are easy to fix once you know what to look for.
In this article, you’ll learn five practical techniques for speeding up slow Python code, with before-and-after examples that show the difference.
You can find the code in this article. GitHub.
# Conditions
Before we begin, make sure you have:
- Python 3.10 or higher is installed.
- Familiarity with functions, loops and lists
- Some familiarity with time Modules from the standard library
For a few examples, you’ll also need the following libraries.
# 1. Measuring before optimization
Before editing a single line of code, you need to know where Cheap is actually. Optimizing the wrong part of your code wastes time and can even break things.
Python’s standard library includes a simple way to time any block of code: time For more detailed profiling of the module, C profile Shows you exactly which functions are taking the most time.
Suppose you have a script that processes a list of sales records. Here’s how to find the slow part:
import time
def load_records():
# Simulate loading 100,000 records
return list(range(100_000))
def filter_records(records):
return (r for r in records if r % 2 == 0)
def generate_report(records):
return sum(records)
# Time each step
start = time.perf_counter()
records = load_records()
print(f"Load : {time.perf_counter() - start:.4f}s")
start = time.perf_counter()
filtered = filter_records(records)
print(f"Filter : {time.perf_counter() - start:.4f}s")
start = time.perf_counter()
report = generate_report(filtered)
print(f"Report : {time.perf_counter() - start:.4f}s")Output:
Load : 0.0034s
Filter : 0.0060s
Report : 0.0012sNow you know where to focus. filter_records() The slowest step is, after that load_records(). So this is where any optimization effort will pay off. Without measuring, you will have spent time improving. generate_report()which was already fast.
gave time.perf_counter() The function is more accurate than this. time.time() For short measurements. Use it whenever you’re performing timing code.
Rule of thumb: Never guess where the bottleneck is. Measure first, then optimize.
# 2. Using built-in functions and standard library tools
Python’s built-in functions – sum(), map(), filter(), sorted(), min(), max() – are implemented in C under the hood. They are significantly faster than writing equivalent logic in pure Python loops.
Let’s compare summarizing the list manually versus using the built-in:
import time
numbers = list(range(1_000_000))
# Manual loop
start = time.perf_counter()
total = 0
for n in numbers:
total += n
print(f"Manual loop : {time.perf_counter() - start:.4f}s → {total}")
# Built-in sum()
start = time.perf_counter()
total = sum(numbers)
print(f"Built-in : {time.perf_counter() - start:.4f}s → {total}")Output:
Manual loop : 0.1177s → 499999500000
Built-in : 0.0103s → 499999500000As you can see, using the built-in functions is about 6x faster.
The same principle applies to setting. If you need to sort a list of dictionaries by key, Python’s sorted() with a key The argument is faster and cleaner than sorting manually. Here is another example:
orders = (
{"id": "ORD-003", "amount": 250.0},
{"id": "ORD-001", "amount": 89.99},
{"id": "ORD-002", "amount": 430.0},
)
# Slow: manual comparison logic
def manual_sort(orders):
for i in range(len(orders)):
for j in range(i + 1, len(orders)):
if orders(i)("amount") > orders(j)("amount"):
orders(i), orders(j) = orders(j), orders(i)
return orders
# Fast: built-in sorted()
sorted_orders = sorted(orders, key=lambda o: o("amount"))
print(sorted_orders)Output:
({'id': 'ORD-001', 'amount': 89.99}, {'id': 'ORD-003', 'amount': 250.0}, {'id': 'ORD-002', 'amount': 430.0})As an exercise, try to time the above approach.
Rule of thumb: Before writing a loop to do something general—summarizing, sorting, max-finding—check to see if Python already has a built-in for this. It almost always happens, and it’s almost always fast.
# 3. Avoiding repeated operations inside loops
One of the most common performance mistakes is doing expensive work inside a loop that can be done once outside of it. Every iteration pays off, even when the result never changes.
Here’s an example: Validating a list of product codes against an approved list.
import time
approved = ("SKU-001", "SKU-002", "SKU-003", "SKU-004", "SKU-005") * 1000
incoming = (f"SKU-{str(i).zfill(3)}" for i in range(5000))
# Slow: len() and list membership check on every iteration
start = time.perf_counter()
valid = ()
for code in incoming:
if code in approved: # list search is O(n) — slow
valid.append(code)
print(f"List check : {time.perf_counter() - start:.4f}s → {len(valid)} valid")
# Fast: convert approved to a set once, before the loop
start = time.perf_counter()
approved_set = set(approved) # set lookup is O(1) — fast
valid = ()
for code in incoming:
if code in approved_set:
valid.append(code)
print(f"Set check : {time.perf_counter() - start:.4f}s → {len(valid)} valid")Output:
List check : 0.3769s → 5 valid
Set check : 0.0014s → 5 validThe second approach is much faster, and the fix was just moving one change out of the loop.
The same pattern applies to anything expensive that doesn’t change between iterations, such as reading a config file, setting a regex pattern, or opening a database connection. Do this once before the loop, not once per iteration.
import re
# Slow: recompiles the pattern on every call
def extract_slow(text):
return re.findall(r'\d+', text)
# Fast: compile once, reuse
DIGIT_PATTERN = re.compile(r'\d+')
def extract_fast(text):
return DIGIT_PATTERN.findall(text)Rule of thumb: If a line inside your loop produces the same result in each iteration, move it out.
# 4. Choosing the right data structure
Python provides you with several built-in data structures — lists, sets, dictionaries, tuples — and choosing the wrong one for the task can make your code slower than it needs to be.
The most important difference between lists and sets is using membership checks. in Operator:
- Checking if an item exists in the list takes longer as the list grows, because you have to scan it one by one.
- A set uses hashing to answer the same query in constant time, regardless of size.
Let’s look at an example: finding which customer IDs have already placed an order from a large dataset.
import time
import random
all_customers = (f"CUST-{i}" for i in range(100_000))
ordered = (f"CUST-{i}" for i in random.sample(range(100_000), 10_000))
# Slow: ordered is a list
start = time.perf_counter()
repeat_customers = (c for c in all_customers if c in ordered)
print(f"List : {time.perf_counter() - start:.4f}s → {len(repeat_customers)} found")
# Fast: ordered is a set
ordered_set = set(ordered)
start = time.perf_counter()
repeat_customers = (c for c in all_customers if c in ordered_set)
print(f"Set : {time.perf_counter() - start:.4f}s → {len(repeat_customers)} found")Output:
List : 16.7478s → 10000 found
Set : 0.0095s → 10000 foundThe same logic applies to dictionaries when you need a fast keyword search, and Collections of the module deque When you’re frequently adding or removing items from both ends of a continuum — some lists are slow.
Here’s a quick reference on which structure to reach when:
| the need | The data structure to use |
|---|---|
| Sorted Sort, Indexed Access | list |
| Fast Membership Check | set |
| Key value search | dict |
| Counting events | collections.Counter |
| Queue or deck operations | collections.deque |
Rule of thumb: If you are checking. if x in something inside a loop and something There are over a few hundred items, this should probably be a set.
# 5. Vectorizing operations on numerical data
If your code processes numbers — calculations on rows of data, statistical operations, transformations — you should write Python loops. approx Always the slowest possible approach. Libraries like NumPy And Pandey That’s exactly what they’re designed for: applying operations to entire arrays at once, in optimized C code, without Python loops.
It is called Vectorization. Instead of asking Python to process each element one at a time, you hand off the entire array to a function that handles everything internally at the speed of C.
import time
import numpy as np
import pandas as pd
prices = (round(10 + i * 0.05, 2) for i in range(500_000))
discount_rate = 0.15
# Slow: Python loop
start = time.perf_counter()
discounted = ()
for price in prices:
discounted.append(round(price * (1 - discount_rate), 2))
print(f"Python loop : {time.perf_counter() - start:.4f}s")
# Fast: NumPy vectorization
prices_array = np.array(prices)
start = time.perf_counter()
discounted = np.round(prices_array * (1 - discount_rate), 2)
print(f"NumPy : {time.perf_counter() - start:.4f}s")
# Fast: pandas vectorization
prices_series = pd.Series(prices)
start = time.perf_counter()
discounted = (prices_series * (1 - discount_rate)).round(2)
print(f"Pandas : {time.perf_counter() - start:.4f}s")Output:
Python loop : 1.0025s
NumPy : 0.0122s
Pandas : 0.0032sNumPy is about 100x faster for this operation. The code is also short and clean. No loop, no append()just an expression.
If you are already working with Pandas. DataFramethe same principle applies to column operations. Always prefer column-level operations to looping through rows. iterrows():
df = pd.DataFrame({"price": prices})
# Slow: row-by-row with iterrows
start = time.perf_counter()
for idx, row in df.iterrows():
df.at(idx, "discounted") = round(row("price") * 0.85, 2)
print(f"iterrows : {time.perf_counter() - start:.4f}s")
# Fast: vectorized column operation
start = time.perf_counter()
df("discounted") = (df("price") * 0.85).round(2)
print(f"Vectorized : {time.perf_counter() - start:.4f}s")Output:
iterrows : 34.5615s
Vectorized : 0.0051sgave iterrows() The function is one of the most common performance traps in Pandas. If you see this in your code and you’re working on more than a few thousand rows, it’s almost always feasible to replace it with a column operation.
Rule of thumb: If you are looping over numbers or DataFrame Queues, ask whether? NumPy or Pandey Can do the same as a vectorized operation.
# The result
Slow Python code is usually a pattern problem. Measuring before optimizing, leaning on built-ins, avoiding repeated operations in loops, choosing the right data structure, and using vectorization for numerical work will cover most of the performance problems you’ll encounter as a beginner.
Start with tip one every time: measure. Find the original obstruction, fix it, and measure again. You’d be surprised how much headroom there is before you need anything more.
This article covers the most common causes of slow Python code in five techniques. But sometimes you need to go further:
- Multiprocessing – If your work is CPU bound and you have a multi-core machine, then Python
multiprocessingA module can divide work into cores. - Async I/O — If your code spends most of its time waiting for network requests or file reads,
asyncioCan handle many tasks simultaneously. - Disc or The polar – For datasets too large to fit in memory, these libraries scale well beyond what pandas can handle.
These are worth exploring once you’ve got the basics down and still need more headroom. Happy coding!
Bala Priya c is a developer and technical writer from India. She loves working at the intersection of mathematics, programming, data science, and content creation. His areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding and coffee! Currently, she’s working on learning lessons and sharing her knowledge with the developer community, writing tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource reviews and coding tutorials.