Creating practical MLOps for a personal ML project

Photo by author

# Introduction

You’ve probably completed your fair share of data science and machine learning projects.

They are great for sharpening skills and showing off what you know and have learned. But here’s the thing: they often stop looking like real-world, production-level data science.

In this article, we take a project. An analysis of US occupational wages – and turn it into something that says, “This is ready for real-world use.”

For this, we’ll walk through a simple yet solid Machine Learning Operations (MLOps) setup that covers everything from version control to deployment.

It’s great for early career data people, freelancers, portfolio builders, or anyone who wants their work to look like it came from a professional setup, even if it doesn’t.

In this article, we’ll go beyond notebook projects: we’ll set up our MLOps structure, learn reproducible pipelines, model artifacts, a simple local application programming interface (API), logging, and finally learn how to create useful documentation.

Photo by author

# Understanding the task and data set

The project scenario consists of a national US dataset containing annual occupational wage and employment data for all 50 US states and territories. The data also provide details of employment totals, average wages, occupational groups, wage headings, and geographic identifiers.

Your main goals are:

Comparing differences in wages across states and job categories
Running statistical tests (t-test, z-test, f-test)
Constructing regressions to understand the relationship between employment and wages
Visualization of wage distribution and occupation trends

Some key columns of the dataset:

OCC_TITLE – Occupation Name
TOT_EMP – total employment
A_MEAN – Average annual wages
PRIM_STATE – State Abbreviation
O_GROUP – Occupation Category (Major, General, Detailed)

MLOps for a personal ML project

Your mission here is to generate reliable insights into wage disparities, job distribution, and statistical relationships, but it doesn’t stop there.

The challenge is also to structure the project in such a way that it is reusable, reproducible and clean. This is a very important skill that is required for all data scientists today.

# Getting Started with Version Control

Let’s not skip the basics. Even small projects deserve a clean structure and proper version control. Here’s a folder setup that’s intuitive and reviewer-friendly:

MLOps for a personal ML project

Some best practices:

Keep raw data immutable. You don’t need to touch it, just copy it for processing.
Consider using Git LFS If your datasets become large and complex.
Place each script inside src/ Focused on one thing. Your future self will thank you.
Make frequent commitments and use clear messages such as:
feat: add T-test comparison between management and production wages.

Even with this simple structure, you’re hiring managers are thinking and planning like a professional, not a junior.

# Creating Reproducible Pipelines (and Leaving Notebook Chaos Behind)

Notebooks are amazing for exploration. You try something, tweak a filter, replay a cell, copy a chart, and before you know it, you have 40 cells and no idea what the final answer actually is.

To give this project a “production-ish” feel, we’ll take the logic that’s already in the notebook and wrap it into a single preprocessing function. That function becomes the canonical space where the US occupational wage data is:

Loaded from an Excel file.
Cleared and converted to numeric.
Routines (States, Occupancy Groups, Occupancy Codes)
Enriched with helpful columns like Total Payroll

From then on, every analysis — plot, t-test, regression, correlation, Z-test — will reuse the same cleaned data frame.

// From top-of-the-line notebook sales to reusable functions

Right now, the notebook does approximately:

Loads the file: state_M2024_dl.xlsx
Parses the first sheet into a data frame.
As changes columns. A_MEAN, TOT_EMP Up to the number
It uses these columns in:
- State-level wage comparisons
- Linear regression (TOT_EMP → A_MEAN)
- Pearson Correlation (Q6)
- Z-Test for Tech vs. Non-Tech (Q7)
- Levene test for wage variation

We’ll turn this into a single function called preprocess_wage_data which you can call from anywhere in the project:

from src.preprocessing import preprocess_wage_data
df = preprocess_wage_data("data/raw/state_M2024_dl.xlsx")

Now your notebooks, scripts, or future API calls all agree on what “clean data” means.

// What the preprocessing pipeline actually does.

For this dataset, the preprocessing pipeline would:

1. Load the Excel file once.

xls = pd.ExcelFile(file_path)
df_raw = xls.parse(xls.sheet_names(0))
df_raw.head()

2. Convert key numeric columns to numeric.

These are the columns that your analysis actually uses:

Employment and Intensity: TOT_EMP, EMP_PRSE, JOBS_1000, LOC_QUOTIENT
Wage Measures: H_MEAN, A_MEAN, MEAN_PRSE
Percentage of wages: H_PCT10, H_PCT25, H_MEDIAN, H_PCT75, H_PCT90, A_PCT10, A_PCT25, A_MEDIAN, A_PCT75, A_PCT90

We force them safely:

df = df_raw.copy()
numeric_cols = (
        "TOT_EMP", "EMP_PRSE", "JOBS_1000", "LOC_QUOTIENT" ….)
for col in numeric_cols:
        if col in df.columns:
            df(col) = pd.to_numeric(df(col), errors="coerce")

If the future file contains odd values (eg ‘**’ or ‘N/A’), your code won’t explode, it’ll just treat them as missing, and the pipeline won’t break.

3. Normalize text identifiers.

For continuous grouping and filtering:

PRIM_STATE in capital letters (eg “ca” → “CA”)
O_GROUP minor (eg “major” → “major”)
OCC_CODE for the string (for .str.startswith("15") (in tech vs. non-tech z-test)

4. Add helper columns used in analyses.

They are simple but easy. A helpful estimate for the total salary per row, using average wages, is:

 df("TOTAL_PAYROLL") = df("A_MEAN") * df("TOT_EMP")

The wage-to-employment ratio is useful for finding high-wage/low-employment areas, protecting against division by zero:

 df("WAGE_EMP_RATIO") = df("A_MEAN") / df("TOT_EMP").replace({0: np.nan})

5. Return a clean data frame for the rest of the project.

Your subsequent code for:

Planning up/down states
T-Test (Management vs. Production)
regression (TOT_EMP → A_MEAN)
Correlation (Q6)
Z-Tests (Q7)
Levin’s test

All can start with:

 df = preprocess_wage_data("state_M2024_dl.xlsx")

Full pre-processing function:

Put it in. src/preprocessing.py:

import pandas as pd
import numpy as np
def preprocess_wage_data(file_path: str = "state_M2024_dl.xlsx") -> pd.DataFrame:
    """Load and clean the U.S. occupational wage data from Excel.
    - Reads the first sheet of the Excel file.
    - Ensures key numeric columns are numeric.
    - Normalizes text identifiers (state, occupation group, occupation code).
    - Adds helper columns used in later analysis.
    """
    # Load raw Excel file
    xls = pd.ExcelFile(file_path)

Check the rest of the code. Here.

# Saving your statistical models and samples

What are model samples? Some examples: regression models, correlation matrices, cleaned datasets, and statistics.

import joblib
joblib.dump(model, "models/employment_wage_regression.pkl")

Why save samples?

You avoid recalculating results during API calls or dashboards.
You save versions for future comparison.
You keep analysis and estimation separate

These little habits take your project from research to production.

# Making it work natively (with an API or small web UI)

You don’t have to jump straight into Docker and Kubernetes to “deploy” it. For most real-world analytics work, your first API is simply:

A neat preprocessing function
A few well-known analysis functions
A small script or notebook cell that links them together.

This makes it easy to call your project:

Another notebook
A Streamlit/Gradio dashboard
Future Fast API or Flask App

// Converting your analytics into a mini “analytics API”

You already have the basic logic in the notebook:

T-Test: Management vs Productivity Wage
regression: TOT_EMP → A_MEAN
Pearson Correlation (Q6)
Z-Test Tech vs. Non-Tech (Q7)
Levene’s test for wage variation

We’ll wrap at least one of these in a function so that it behaves like a small API endpoint.

Example: “Compare managerial vs. productive wages”

This is the function version of the T-test code already in the notebook:

from scipy.stats import ttest_ind
import pandas as pd
def compare_management_vs_production(df: pd.DataFrame):
    """Two-sample T-test between Management and Production occupations."""
    # Filter for relevant occupations
    mgmt = df(df("OCC_TITLE").str.contains("Management", case=False, na=False))
    prod = df(df("OCC_TITLE").str.contains("Production", case=False, na=False))
    # Drop missing values
    mgmt_wages = mgmt("A_MEAN").dropna()
    prod_wages = prod("A_MEAN").dropna()
    # Perform two-sample T-test (Welch's t-test)
    t_stat, p_value = ttest_ind(mgmt_wages, prod_wages, equal_var=False)
    return t_stat, p_value

Now this test can be used again:

An important script
A Streamlit slider
Fast API route of the future

Without copying any notebook cells.

// A simple local entry point

Here’s how all the pieces fit into a simple Python script, which you can call. main.py Or run in a notebook cell:

from preprocessing import preprocess_wage_data
from statistics import run_q6_pearson_test, run_q7_ztest  # move these from the notebook
from analysis import compare_management_vs_production      # the function above
if __name__ == "__main__":
    # 1. Load and preprocess the data
    df = preprocess_wage_data("state_M2024_dl.xlsx")
    # 2. Run core analyses
    t_stat, p_value = compare_management_vs_production(df)
    print(f"T-test (Management vs Production) -> t={t_stat:.2f}, p={p_value:.4f}")
    corr_q6, p_q6 = run_q6_pearson_test(df)
    print(f"Pearson correlation (TOT_EMP vs A_MEAN) -> r={corr_q6:.4f}, p={p_q6:.4f}")
    z_q7 = run_q7_ztest(df)
    print(f"Z-test (Tech vs Non-tech median wages) -> z={z_q7:.4f}")

It doesn’t look like a web API yet, but conceptually it’s:

Input: Cleaned data frame
Operations: Named analytic functions
Output: Well-defined numbers that you can display in a dashboard, report, or later, in a REST endpoint.

# Logging everything (even details)

Most people ignore logging, but it’s how you make your project debuggable and reliable.
Even in a beginner-friendly analytics project like this, it’s useful to know:

What file did you load?
How many rows survived preprocessing.
Which was tested.
What were the key test statistics?

Instead of manually printing everything and scrolling through notebook output, we’ll set up a simple logging configuration that you can reuse in scripts and notebooks.

// Basic logging setup

Make a logs/ Create a folder in your project, and then add it somewhere at the beginning of your code (e.g. at the top). main.py or in a waqf logging_config.py):

import logging
from pathlib import Path
# Make sure logs/ exists
Path("logs").mkdir(exist_ok=True)
logging.basicConfig(
    filename="logs/pipeline.log",
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

Now, every time you run your pipeline, a logs/pipeline.log The file will be updated.

// Logging pre-processing and analyses

We can extend the main example from step 5 to log what’s happening:

from preprocessing import preprocess_wage_data
from statistics import run_q6_pearson_test, run_q7_ztest
from analysis import compare_management_vs_production
import logging
if __name__ == "__main__":
    logging.info("Starting wage analysis pipeline.")
    # 1. Preprocess data
    df = preprocess_wage_data("state_M2024_dl.xlsx")
    logging.info("Loaded cleaned dataset with %d rows and %d columns.", df.shape(0), df.shape(1))
    # 2. T-test: Management vs Production
    t_stat, p_value = compare_management_vs_production(df)
    logging.info("T-test (Mgmt vs Prod) -> t=%.3f, p=%.4f", t_stat, p_value)
    # 3. Pearson correlation (Q6)
    corr_q6, p_q6 = run_q6_pearson_test(df)
    logging.info("Pearson (TOT_EMP vs A_MEAN) -> r=%.4f, p=%.4f", corr_q6, p_q6)
    # 4. Z-test (Q7)
    z_q7 = run_q7_ztest(df)
    logging.info("Z-test (Tech vs Non-tech median wages) -> z=%.3f", z_q7)
    logging.info("Pipeline finished successfully.")

Now, instead of having to guess what happened the last time you ran the notebook, you can open logs/pipeline.log And check out his timeline:

When preprocessing started.
How many rows/columns do you have?
What were the test statistics?

It’s a small step, but a very “MLOps” thing: you’re not just analyzing, you’re observing them.

# Storytelling (AKA Writing for Humans)

Documentation matters, especially when dealing with wages, occupations, and regional comparisons, topics that real decision makers care about.

Your README or final notebook should include:

Why this analysis matters
Summary of wage and employment patterns
Key concepts (up/down states, wage distribution, group comparisons)
A description of each statistical test and why it was chosen.
Clear interpretations of regression and correlation results
Limitations (eg, missing state records, sampling variability);
Next steps for deep analysis or dashboard deployment

Good documentation turns a dataset project into something that anyone can use and understand.

# The result

Why does this all matter?

Because in the real world, data science doesn’t exist in a vacuum. Your beautiful model isn’t helpful if no one else can run it, understand it, or trust it. That’s where MLOps come in, not as a buzzword, but as a bridge between a cool experience and a real, usable product.

In this article, we start with a simple notebook-based assignment and show how to structure and empower it. We introduced:

Version control to keep our work organized
Clean, reproducible pipelines for preprocessing and detection
Model serialization so we can reuse (not retrain) our models.
A lightweight API for local deployment
Logging in to find out what’s going on behind the scenes
And finally, documents that speak to both technical and business people.

Photo by author

Net Rosiedi A data scientist and product strategist. He is also an adjunct professor teaching analytics, and the founder of StrataScratch, a platform that helps data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers all things SQL.

# Introduction

# Understanding the task and data set

# Getting Started with Version Control

# Creating Reproducible Pipelines (and Leaving Notebook Chaos Behind)

// From top-of-the-line notebook sales to reusable functions

// What the preprocessing pipeline actually does.

# Saving your statistical models and samples

# Making it work natively (with an API or small web UI)

// Converting your analytics into a mini “analytics API”

// A simple local entry point

# Logging everything (even details)

// Basic logging setup

// Logging pre-processing and analyses

# Storytelling (AKA Writing for Humans)

# The result

Editor's pick

Get latest news

Creating practical MLOps for a personal ML project

# Introduction

# Understanding the task and data set

# Getting Started with Version Control

# Creating Reproducible Pipelines (and Leaving Notebook Chaos Behind)

// From top-of-the-line notebook sales to reusable functions

// What the preprocessing pipeline actually does.

# Saving your statistical models and samples

# Making it work natively (with an API or small web UI)

// Converting your analytics into a mini “analytics API”

// A simple local entry point

# Logging everything (even details)

// Basic logging setup

// Logging pre-processing and analyses

# Storytelling (AKA Writing for Humans)

# The result

Top 5 Embedding Models for Your RAG Pipeline

My honest and candid review of Abacus AI Deep Agent

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news