

Picture by the writer
. Introduction
Stress testing is very important to understand how your application behaves under heavy loads. Machine Learning APIS LT, This is especially important because the model can be estimated to be related to CPU. By imitating a large number of users, we can identify performance barriers, determine our system’s ability, and ensure reliability.
In this tutorial, we will use:
- Fastep: A modern, fast (high performance) web framework to make APIS with Azigar.
- uvicorn: An ASGI server to run our fast API application.
- Teddy: An open source load test tool. You describe the user’s behavior with the code, and crowded your system with hundreds simultaneously.
- Skate Learn: Our example machine learning model.
. 1. Project setup and dependent
Set the project structure and install the necessary dependence.
- Create
requirements.txt
File and add the following Pacific Packages: - Open your terminal, create a virtual environment, and activate it.
- Install all the Packages using this
requirements.txt
File
fastapi==0.115.12
locust==2.37.10
numpy==2.3.0
pandas==2.3.0
pydantic==2.11.5
scikit-learn==1.7.0
uvicorn==0.34.3
orjson==3.10.18
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
. 2. Fast PI application construction
In this section, we will create a file for registration model training, Padintic Models, and Fast API applications.
These ml_model.py
The machine handles the learning model. It uses single pattern to ensure just one example of the model. Model California Housing Dataste is registered with a random jungle trained. If a pre -trained model (model dot PKL and Scale Dot PKL) is not available, it gives and saves a new training.
app/ml_model.py
:
import os
import threading
import joblib
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
class MLModel:
_instance = None
_lock = threading.Lock()
def __new__(cls):
if cls._instance is None:
with cls._lock:
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
def __init__(self):
if not hasattr(self, "initialized"):
self.model = None
self.scaler = None
self.model_path = "model.pkl"
self.scaler_path = "scaler.pkl"
self.feature_names = None
self.initialized = True
self.load_or_create_model()
def load_or_create_model(self):
"""Load existing model or create a new one using California housing dataset"""
if os.path.exists(self.model_path) and os.path.exists(self.scaler_path):
self.model = joblib.load(self.model_path)
self.scaler = joblib.load(self.scaler_path)
housing = fetch_california_housing()
self.feature_names = housing.feature_names
print("Model loaded successfully")
else:
print("Creating new model...")
housing = fetch_california_housing()
X, y = housing.data, housing.target
self.feature_names = housing.feature_names
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
self.scaler = StandardScaler()
X_train_scaled = self.scaler.fit_transform(X_train)
self.model = RandomForestRegressor(
n_estimators=50, # Reduced for faster predictions
max_depth=8, # Reduced for faster predictions
random_state=42,
n_jobs=1, # Single thread for consistency
)
self.model.fit(X_train_scaled, y_train)
joblib.dump(self.model, self.model_path)
joblib.dump(self.scaler, self.scaler_path)
X_test_scaled = self.scaler.transform(X_test)
score = self.model.score(X_test_scaled, y_test)
print(f"Model R² score: {score:.4f}")
def predict(self, features):
"""Make prediction for house price"""
features_array = np.array(features).reshape(1, -1)
features_scaled = self.scaler.transform(features_array)
prediction = self.model.predict(features_scaled)(0)
return prediction * 100000
def get_feature_info(self):
"""Get information about the features"""
return {
"feature_names": list(self.feature_names),
"num_features": len(self.feature_names),
"description": "California housing dataset features",
}
# Initialize model as singleton
ml_model = MLModel()
pydantic_models.py
The file explains the Pydantic model for application and response data verification and serialization.
app/pydantic_models.py
:
from typing import List
from pydantic import BaseModel, Field
class PredictionRequest(BaseModel):
features: List(float) = Field(
...,
description="List of 8 features: MedInc, HouseAge, AveRooms, AveBedrms, Population, AveOccup, Latitude, Longitude",
min_length=8,
max_length=8,
)
model_config = {
"json_schema_extra": {
"examples": (
{"features": (8.3252, 41.0, 6.984, 1.024, 322.0, 2.556, 37.88, -122.23)}
)
}
}
app/main.py
: This file is the basic phosphate application, which explains the closing points of the API.
import asyncio
from contextlib import asynccontextmanager
from fastapi import FastAPI, HTTPException
from fastapi.responses import ORJSONResponse
from .ml_model import ml_model
from .pydantic_models import (
PredictionRequest,
)
@asynccontextmanager
async def lifespan(app: FastAPI):
# Pre-load the model
_ = ml_model.get_feature_info()
yield
app = FastAPI(
title="California Housing Price Prediction API",
version="1.0.0",
description="API for predicting California housing prices using Random Forest model",
lifespan=lifespan,
default_response_class=ORJSONResponse,
)
@app.get("/health")
async def health_check():
"""Health check endpoint"""
return {"status": "healthy", "message": "Service is operational"}
@app.get("/model-info")
async def model_info():
"""Get information about the ML model"""
try:
feature_info = await asyncio.to_thread(ml_model.get_feature_info)
return {
"model_type": "Random Forest Regressor",
"dataset": "California Housing Dataset",
"features": feature_info,
}
except Exception:
raise HTTPException(
status_code=500, detail="Error retrieving model information"
)
@app.post("/predict")
async def predict(request: PredictionRequest):
"""Make house price prediction"""
if len(request.features) != 8:
raise HTTPException(
status_code=400,
detail=f"Expected 8 features, got {len(request.features)}",
)
try:
prediction = ml_model.predict(request.features)
return {
"prediction": float(prediction),
"status": "success",
"features_used": request.features,
}
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
except Exception:
raise HTTPException(status_code=500, detail="Prediction error")
Key points:
lifespan
Manager: The ML model is filled during the application startup.asyncio.to_thread
: This is very important because the prediction of skate learning is connected to the CPU. Running it in a separate thread prevents it from stopping the unnecessary event loop of the fastpress, which allows the server to handle other requests simultaneously.
End Tips:
/health
: A simple test of health./model-info
: ML provides about metadata about the model./predict
: A list of features accepts and returns home price forecast.
run_server.py
: It contains the script that is used to operate the fast API application using Uvakorin.
import uvicorn
if __name__ == "__main__":
uvicorn.run("app.main:app", host="localhost", port=8000, workers=4)
All files and configurations are available on the Gut Hub Repeasary: Kingbzpro/Stress Testing Fastep
. 3. Write the tension test of
Now, make a stress test script using locusts.
tests/locustfile.py
: This file describes the behavior of artificial users.
import json
import logging
import random
from locust import HttpUser, task
# Reduce logging to improve performance
logging.getLogger("urllib3").setLevel(logging.WARNING)
class HousingAPIUser(HttpUser):
def generate_random_features(self):
"""Generate random but realistic California housing features"""
return (
round(random.uniform(0.5, 15.0), 4), # MedInc
round(random.uniform(1.0, 52.0), 1), # HouseAge
round(random.uniform(2.0, 10.0), 2), # AveRooms
round(random.uniform(0.5, 2.0), 2), # AveBedrms
round(random.uniform(3.0, 35000.0), 0), # Population
round(random.uniform(1.0, 10.0), 2), # AveOccup
round(random.uniform(32.0, 42.0), 2), # Latitude
round(random.uniform(-124.0, -114.0), 2), # Longitude
)
@task(1)
def model_info(self):
"""Test health endpoint"""
with self.client.get("/model-info", catch_response=True) as response:
if response.status_code == 200:
response.success()
else:
response.failure(f"Model info failed: {response.status_code}")
@task(3)
def single_prediction(self):
"""Test single prediction endpoint"""
features = self.generate_random_features()
with self.client.post(
"/predict", json={"features": features}, catch_response=True, timeout=10
) as response:
if response.status_code == 200:
try:
data = response.json()
if "prediction" in data:
response.success()
else:
response.failure("Invalid response format")
except json.JSONDecodeError:
response.failure("Failed to parse JSON")
elif response.status_code == 503:
response.failure("Service unavailable")
else:
response.failure(f"Status code: {response.status_code}")
Key points:
- Each artificial user will wait between 0.5 and 2 seconds to perform tasks.
- Creates a realistic random feature data for forecasting requests.
- Each user will make a precision application and 3 single -prediction requests.
. 4. The tension test is running
- To assess the performance of your application under the load, start your contradictory machine learning application in a terminal.
- Open your browser and navigate the interactive API documents to test your closing points and make sure they are working properly.
- Open a new terminal window, activate the virtual environment, and go to your Project Route Directory to run Teddy with Web UI:
- In the Teddy Web UI, set the total number of users to 500, the spoon rate on 10 users in 10 seconds, and run it for a minute.
- During the test, Teddy will show real -time statistics, including applications, failures and number of reactions for each closing point.
- Once the test is completed, click on the chart tab to view the interactive graph, which shows the number of users, the applications per second, and the response.
- Use the following command to run the Teddy without web UI and to automatically prepare the HTML report:
Model loaded successfully
INFO: Started server process (26216)
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on (Press CTRL+C to quit)


locust -f tests/locustfile.py --host
Access to the Teddy Web UI In your browser






locust -f tests/locustfile.py --host --users 500 --spawn-rate 10 --run-time 60s --headless --html report.html
After the test is over, a HTML report called report. HTML will be stored in your project directory later to review.


. The final views
Our app can handle a large number of users as we are using a simple machine learning model. The results show that the model info is more reaction than the closing point forecast, which is impressive. This is the best situation to check your request locally before pushing your application into production.
If you want to experience this setup, please visit Kingbzpro/Stress Testing Fastep Store and follow the instructions in the documents.
Abid Ali Owan For,,,,,,,,,, for,, for,,,, for,,,, for,,, for,,,, for,,,, for,,,, for,,, for,,, for,,, for,,, for,,,, for,,, for,,, for,,,, for,,, for,,,, for,,, for,,, for,,,, for,,, for,,, for,,,, for,,, for,,,, for,,, for,,,, for,,, for,,,, for,,, for,,,, for,,,, for,,,, for,,,, for,,,, for,,,, for,,,, for,,, for,,, for,,, for,,, for,,,,, for,,,, for,,,, for,,,, for,, for,.@1abidaliawan) A certified data scientist is a professional who loves to create a machine learning model. Currently, he is focusing on creating content and writing technical blogs on machine learning and data science technologies. Abid has a master’s degree in technology management and a bachelor’s degree in telecommunications engineering. Its vision is to create AI products using a graph neural network for students with mental illness.