Weight and prejudice: a kdnuggets crash course

by SkillAiNest

Weight and prejudice: a kdnuggets crash courseWeight and prejudice: a kdnuggets crash course
Picture by the writer

If you train models out of a single notebook, you probably hit the same headaches: you comply with five nobes, re -train, and until Friday you do not remember which run has developed the “good” ROC curve or which data piece you have used. Weight and prejudice .

Below is a practical tour. It has been given the opinion, the light on the event, and is ready for the teams who want a clean experience history without building their platform. Let’s call it a fool walkthrough.

. Why W&B Absolutely?

Notebook grows in experiments. Experiences multiply. Soon you are asking: Which run did you use this data slice? Why is today’s ROC curve more? Can I re -reproduce last week’s baseline?

W&B gives you a place:

  • Log matrix, configurations, plots and system data
  • Model with version datasis and samples
  • Run Hyper Parameter
  • Share Dashboards without screenshots

You can start small and layer features when needed.

. Setup in 60 seconds

Start by installing the library and logging in with your API key. If you don’t have yet, You can find it here.

pip install wandb
wandb login # paste your API key once

Weight and prejudice: a kdnuggets crash courseWeight and prejudice: a kdnuggets crash course
Picture by the writer

!! The least serious check

import wandb, random, time

wandb.init(project="kdn-crashcourse", name="hello-run", config={"lr": 0.001, "epochs": 5})
for epoch in range(wandb.config.epochs):
    loss = 1.0 / (epoch + 1) + random.random() * 0.05
    wandb.log({"epoch": epoch, "loss": loss})
    time.sleep(0.1)
wandb.finish()

Now you should see something like that:

Weight and prejudice: a kdnuggets crash courseWeight and prejudice: a kdnuggets crash course
Picture by the writer

Now let’s go for useful bits.

. To keep track of experiments

!! Log hyperpressors and matrix

Treatment wandb.config As the only source of truth for your experience’s nobes. Give the matrix clear name so that there are chart auto groups.

cfg = dict(arch="resnet18", lr=3e-4, batch=64, seed=42)
run = wandb.init(project="kdn-mlops", config=cfg, tags=("baseline"))

# training loop ...
for step, (x, y) in enumerate(loader):
    # ... compute loss, acc
    wandb.log({"train/loss": loss.item(), "train/acc": acc, "step": step})

# log a final summary
run.summary("best_val_auc") = best_auc

Some points:

  • Use the name space train/loss Or val/auc Automatically the group chart
  • Add tags like "lr-finder" Or "fp16" So you can filter the later runs
  • Use run.summary(...) Once the results of Lou You want to see on the run card

!! Login Images, Confusion Matrix, and Customs Plot

wandb.log({
    "val/confusion": wandb.plot.confusion_matrix(
        preds=preds, y_true=y_true, class_names=classes)
})

You can also save any metaplatlib plot:

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(history)
wandb.log({"training/curve": fig})

!! Model with version datasis and samples

Samples answer questions such as, “What files did this run use?” And “What did we train?” No more final_final_v3.parquet Mystery

import wandb

run = wandb.init(project="kdn-mlops")

# Create a dataset artifact (run once per version)
raw = wandb.Artifact("imdb_reviews", type="dataset", description="raw dump v1")
raw.add_dir("data/raw") # or add_file("path")
run.log_artifact(raw)

# Later, consume the latest version
artifact = run.use_artifact("imdb_reviews:latest")
data_dir = artifact.download() # folder path pinned to a hash

Login your model similarly:

import torch
import wandb

run = wandb.init(project="kdn-mlops")

model_path = "models/resnet18.pt"
torch.save(model.state_dict(), model_path)

model_art = wandb.Artifact("sentiment-resnet18", type="model")
model_art.add_file(model_path)
run.log_artifact(model_art)

Now, the lineage is clear: This model comes from this data under the low pledge of this code.

!! Tables for diagnosis and error analysis

wandb.Table The results, a light data frame for predictions and pieces.

table = wandb.Table(columns=("id", "text", "pred", "true", "prob"))
for r in batch_results:
    table.add_data(r.id, r.text, r.pred, r.true, r.prob)
wandb.log({"eval/preds": table})

Filter the table in the UI to find failure patterns (eg, short reviews, rare classes, etc.).

!! The hyperparameter gives the sweep

Describe the search space in Yamal, launch agents, and let the W&B connect.

# sweep.yaml
method: bayes
metric: {name: val/auc, goal: maximize}
parameters:
  lr: {min: 1e-5, max: 1e-2}
  batch: {values: (32, 64, 128)}
  dropout: {min: 0.0, max: 0.5}

Start the sweep:

wandb sweep sweep.yaml # returns a SWEEP_ID
wandb agent // # run 1+ agents

Should read your training script wandb.config For lrFor, for, for,. batchEtc. Dashboard shows top trials, parallel coordinates, and excellent structure.

. Drop -in integration

Choose what you use and keep moving.

!! Piturch electric

from pytorch_lightning.loggers import WandbLogger
logger = WandbLogger(project="kdn-mlops")
trainer = pl.Trainer(logger=logger, max_epochs=10)

!! Carrus

import wandb
from wandb.keras import WandbCallback

wandb.init(project="kdn-mlops", config={"epochs": 10})
model.fit(X, y, epochs=wandb.config.epochs, callbacks=(WandbCallback()))

!! Skate

from sklearn.metrics import roc_auc_score
wandb.init(project="kdn-mlops", config={"C": 1.0})
# ... fit model
wandb.log({"val/auc": roc_auc_score(y_true, y_prob)})

. Model Registry and Staging

Think of the Registry as a nominee for your best models. You press a sample once, then poin the alias names staging Or production Therefore, the flow code can be pulled to the right without evaluating the file routes.

run = wandb.init(project="kdn-mlops")
art = run.use_artifact("sentiment-resnet18:latest")
registry = wandb.sdk.artifacts.model_registry.ModelRegistry()
entry = registry.push(art, name="sentiment-classifier")
entry.aliases.add("staging")

When you promote a new construction, flip the alias. Users always read sentiment-classifier:production.

. List of reproductive capacity

  • Construction: Store in every hyper parameter wandb.config
  • Code and Commitment: Use wandb.init(settings=wandb.Settings(code_dir=".")) To rely on CI to attach a snapshot code or gut Shaw
  • Environment: Log requirements.txt Or doer tag and add it to a sample
  • Bead: Login them and set them

Minimum Badge Wizard:

def set_seeds(s=42):
    import random, numpy as np, torch
    random.seed(s)
    np.random.seed(s)
    torch.manual_seed(s)
    torch.cuda.manual_seed_all(s)

. Cooperation and sharing without screenshots

Add notes and tags so fellow can find. Use the sewing charts, tables and commentary reports of commentary in the link that you can fall into the slack or PR. Stakeholders can walk together without opening a notebook.

. Ci and automation tips

  • Drive wandb agent On training nodes to give a sweep from CI
  • Log in Datastic Article after your ETL job. Train jobs can clearly rely on this version
  • After the diagnosis, promote the model nickname (staging → → production) After a small
  • So WANDB_API_KEY With a secret and group -related runs WANDB_RUN_GROUP

. The points of privacy and reliability

  • Use private plans for teams as default
  • Use an offline form for air -driven runs. Then train usually wandb sync Shortly
export WANDB_MODE=offline
  • Do not log on to the raw PII. If needed, the hash ID before logging.
  • LIB LIGHT FIRES, Store them as sample instead of attaching them wandb.log.

. Ordinary snags (and quick fixes)

  • “My run didn’t log in.” The script may have crashed before wandb.finish() Was called Nos, check that you haven’t set WANDB_DISABLED=true In your environment
  • Logging feels slow. Log in every step, but save heavy assets such as photos or tables for the end of a pledge. You can also pass commit=False to wandb.log() And together with more than one log.
  • Watching duplicate runs in UI? If you’re getting restarted from a checkpoint, set id And resume="allow" I wandb.init() To continue the same run.
  • Experience of growing mysterious data? Put each Datastic Snapshot in a sample and keep your runs on a clear version.

. Pocket chat sheet

!! 1. Start a run

wandb.init(project="proj", config=cfg, tags=("baseline"))

!! 2. Log matrix, images, or tables

wandb.log({"train/loss": loss, "img": (wandb.Image(img))})

!! 3. Version a datastate or model

art = wandb.Artifact("name", type="dataset")
art.add_dir("path")
run.log_artifact(art)

!! 4. Use a sample

path = run.use_artifact("name:latest").download()

!! 5. Run a broom

wandb sweep sweep.yaml && wandb agent //

. Wrap

Start Little: Start a run, log in some matrix, and make your model file as a sample. When it feels natural, add a broom and a short report. You will end up with reproductive experiences, detective data and models, and a dashboard that explains your work without a slide show.

Jozep Ferrer Barcelona is an analytical engineer. He graduated in physics engineering and is currently working in the data science field applied to humanitarian movement. He is a part -time content creator focused on data science and technology. Joseph writes everything on AI, covering the application of the explosion at the field.

You may also like

Leave a Comment

At Skillainest, we believe the future belongs to those who embrace AI, upgrade their skills, and stay ahead of the curve.

Get latest news

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

@2025 Skillainest.Designed and Developed by Pro