How to Build a Real-Time AI Gym Coach with Vision Agents

by SkillAiNest

From home workouts to smart gym mirrors, computer vision is changing how people train.

Imagine walking into your home gym, turning on your camera, and having an AI coach watch your movements, count your reps, and correct your form in real time.

That’s exactly what we’re building in this tutorial: a real-time gym companion and fitness coach.

We will be connected Vision Agent‘Detecting movement patterns, counting reps, and instant voice feedback like “Straighten your back!” For low-latency video evaluation or “keep your shape tight!” , just like a human trainer would.

Here is one Demo video AI Gym Companion during a workout session:

https://www.youtube.com/watch?v=etqq68p-rge

What we will cover:

  1. Conditions

  2. Project establishment

  3. How to run the app

  4. Next Steps

Conditions

Project establishment

Create a new directory on your computer gym_buddy. You can also do this directly in your terminal with this command:

mkdir gym_buddy

Then open the directory in your IDE (for this guide, I’m using Windsurf Id)

If you don’t have UV (a fast Python package installer and resolver) installed on your computer, install it with this command:

pip install uv

Note: After installing UV, you can also run uv -init To configure the project with sample files and a .toml File with metadata.

Next, we will create pyproject.toml This file is a configuration file for Python projects that defines system requirements and other project metadata. This is a standard file used by modern Python packaging tools.

Enter the code below:

(project)
name = "gym-buddy"
version = "0.1.0"
requires-python = ">=3.13"
dependencies = (
    "python-dotenv>=1.0",
    "vision-agents",
    "vision-agents-plugins-openai",
    "vision-agents-plugins-getstream",
    "vision-agents-plugins-ultralytics",
    "vision-agents-plugins-gemini",
)

(tool.uv.sources)
"vision-agents" = {path = "../../agents-core", editable=true}
"vision-agents-plugins-deepgram" = {path = "../../plugins/deepgram", editable=true}
"vision-agents-plugins-ultralytics" = {path = "../../plugins/ultralytics", editable=true}
"vision-agents-plugins-openai" = {path = "../../plugins/openai", editable=true}
"vision-agents-plugins-getstream" = {path = "../../plugins/getstream", editable=true}
"vision-agents-plugins-gemini" = {path = "../../plugins/gemini", editable=true}

You can also make a requirements.in file with only direct dependencies, such as:

python-dotenv>=1.0
vision-agents
vision-agents-plugins-openai
vision-agents-plugins-getstream
vision-agents-plugins-ultralytics
vision-agents-plugins-gemini

Then install the dependencies using UV and any of these commands:

uv sync

It will produce uv.lock From the UV package manager that handles project dependencies and builds.

If you are using a Windows OS, you may have a dependency installation error, especially with numpy. This is due to missing build tools in your system.

To fix this, install Visual Studio Build Tools (required for building Python packages with the C extension). During installation, make sure you select “Desktop Development with C++”. This installs all the necessary build tools.

This is what Visual Studio looks like after the installation is complete. You may need to restart your computer for the updates to take effect.

81D76AB4-9CD8-48F6-8CD9-83654AB27071

Now run this command in your terminal:

python -m pip install -e .

The above command installs all necessary dependencies for the project.

How to get your API keys

For this project, we need to get API keys from Stream and Gemini/Opnai.

To get your Stream API key, go ahead and Sign up with your preferred method.

B46C8CC0-0F2F-448F-B7C5-F723FEE94FB5

Then, go to yours Dashboard And click on ‘Create App’ to create a new app for AI Gym Companion.

8C8C51D5-46FE-44FE-8A2C-2336D3492DA4

Enter an app name, choose an environment (development/production), select a region, and click ‘Create App’.

529DF7E3-BBDD-4D84-8023-3CB8024104B

After creating the app, click the Dashboard Overview tab in the left sidebar, then go to the Video tab and click “API Keys”. Copy your API key and secret, and save them safely.

to get you Gemini API Key, see Google AI Studio websitethen click Get Started.

9D588512-C2EA-42BC-8E72-D2C213587CF0

After that, go to your dashboard and click on ‘Create API Key’.

99F73092-5E05-47C4-A3DE-6350DFEC50F0

Enter a name for the key, then create a new project for the API key.

40C7E61C-6BE3-40A6-8E61-236E334241D9

After creating a new API key, copy it and save it safely.

Now that you have the API keys you will need for AI Gym Companion, create an .ENV file in the root directory of the project and add all the API keys like this:

GEMINI_API_KEY=your_gemini_key
STREAM_API_KEY=your_stream_key
STREAM_API_SECRET=your_stream_secret

If you are using Open Eye Instead of Gemini, add:

OPENAI_API_KEY=your_openai_key

In the root directory, create an empty _init.py file This file makes Python treat the directory as a package. You can add a comment to the file to remember, like:


Next, create a gym_buddy.py This file is the main app file, which contains the agent setup and the logic involved in the call. Enter the code below in the file:

import logging
from dotenv import load_dotenv
from vision_agents.core import User, Agent, cli
from vision_agents.core.agents import AgentLauncher
from vision_agents.plugins import getstream, ultralytics, gemini
logger = logging.getLogger(__name__)
load_dotenv()
async def create_agent(**kwargs) -> Agent:
    agent = Agent(
        edge=getstream.Edge(),  
        agent_user=User(name="AI gym companion"),
        instructions="Read @gym_buddy.md",  
        llm=gemini.Realtime(fps=3),  
        
        processors=(
            ultralytics.YOLOPoseProcessor(model_path="yolo11n-pose.pt")
        ),  
    )
    return agent
async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    call = await agent.create_call(call_type, call_id)
    
    with await agent.join(call):
        await agent.llm.simple_response(
            text="Say hi. After the user does their exercise, offer helpful feedback."
        )
        await agent.finish()  
if __name__ == "__main__":
    cli(AgentLauncher(create_agent=create_agent, join_call=join_call))

Then make a gym_buddy.md File This is the instruction file for Gym Agent’s coaching guide, which will analyze workouts and provide real-time feedback. Enter the Markdown code below:

You are a voice fitness coach. You will watch the user's workout and offer feedback.
The video clarifies the body position using Yolo's pose analysis, so you'll see their exact movement.
Speak with a high-energy, motivating tone. Be strict about form but encouraging. Do not give feedback if you are not sure or do not see an exercise.
# Gym Workout Coaching Guide
## 1. Introduction
A fitness coach's primary responsibility is to ensure safety and efficacy in every movement. While everybody is different, the fundamental mechanics of human movement—stability, alignment, and range of motion—remain constant. By monitoring key checkpoints like spinal alignment, joint tracking, and tempo, coaches can guide athletes toward stronger, injury-free workouts. The following guidelines break down the core compound movements into phases, with clear teaching points and coaching cues.
## 2. The Squat: Setup and Stance
The squat is the king of lower-body exercises, but it starts before the descent. The athlete should stand with feet shoulder-width apart or slightly wider, toes pointed slightly outward (5-30 degrees). The spine must be neutral, chest proud, and core braced. Coaches should watch for collapsing arches in the feet or a rounded upper back. A solid setup creates the tension needed for a powerful lift.
## 3. The Squat: Descent (Eccentric Phase)
The movement begins by breaking at the hips and knees simultaneously. The hips should travel back and down, as if sitting in a chair, while the knees track in line with the toes. Coaches must ensure the heels stay glued to the floor. Common errors include "knee valgus" (knees caving in) or the torso collapsing forward. The descent should be controlled and deliberate.
## 4. The Squat: Depth and Reversal
"Depth" is achieved when the hip crease drops below the top of the knee (parallel). While not everyone has the mobility for this, it is the standard for a full range of motion. At the bottom, the athlete should maintain tension—no bouncing or relaxing. The reversal (concentric phase) is driven by driving the feet into the floor and extending the hips and knees, exhaling forcefully.
## 5. The Push-up: The Plank Foundation
A perfect push-up is essentially a moving plank. The setup requires hands placed slightly wider than shoulder-width, directly under the shoulders. The body must form a straight line from head to heels. Coaches should watch for sagging hips (lumbar extension) or piking hips (flexion). Glutes and quads should be squeezed tight to lock the body into a rigid lever.
## 6. The Push-up: Mechanics
As the athlete lowers themselves, the elbows should track back at roughly a 45-degree angle to the torso, forming an arrow shape, not a "T". The chest should descend until it nearly touches the floor. The neck must remain neutral—no reaching with the chin. The push back up should be explosive, fully extending the arms without locking the elbows violently.
## 7. The Lunge: Step and Stability
The lunge challenges balance and unilateral strength. Whether forward or reverse, the step should be long enough to allow both knees to bend to approximately 90 degrees at the bottom. The feet should remain hip-width apart throughout the movement, like moving on train tracks, not a tightrope. Coaches should look for wobbling or the front heel lifting off the ground.
## 8. The Lunge: Alignment
In the bottom position, the front knee should be directly over the ankle, not shooting far past the toes (though some forward travel is acceptable). The torso should remain upright or have a very slight forward lean; collapsing over the front thigh is a fault. The back knee should hover just an inch off the ground. Drive through the front heel to return to the start.
## 9. Tempo and Control
Time under tension builds muscle and control. Coaches should encourage a specific tempo, such as 2-0-1 (2 seconds down, 0 pause, 1 second up). Rushing through reps often masks muscle imbalances and relies on momentum rather than strength. If an athlete speeds up, cue them to "slow down and own the movement."
## 10. Breathing Mechanics
Proper breathing stabilises the core. The general rule is to inhale during the eccentric phase (lowering) and exhale during the concentric phase (lifting/pushing). For heavy lifts, the Valsalva manoeuvre (bracing the core with a held breath) may be appropriate, but for general fitness, rhythmic breathing ensures oxygen delivery and blood pressure management.
## 11. Common Faults and Fixes
- **Squat - Butt Wink**: Posterior pelvic tilt at the bottom. Fix: Limit depth or improve hamstring/ankle mobility.
- **Push-up - Winging Scapula**: Shoulder blades popping up. Fix: Push the floor away at the top (protraction) and engage serratus anterior.
- **Lunge - Valgus Knee**: Front knee collapsing in. Fix: Cue "push the knee out" and engage the glute medius.
- **General - Ego Lifting**: Sacrificing form for reps or weight. Fix: Regress the exercise or slow the tempo

Now we have the instruction file for AI agent setup. Let’s see how the code works with the AI ​​agent creation and Markdown instruction file above. i gym_buddy.pythe agent is created and initialized with specific components such as:

def create_agent() -> Agent:
    
    video_transport = StreamVideoTransport()

    
    gemini = GeminiRealtime()
    pose_processor = YOLOPoseProcessor(model_path="yolo11n-pose.pt")

    
    return Agent(
        name="AI Gym Buddy",
        instructions="gym_buddy.md",  
        video_transport=video_transport,
        llm=gemini,
        processors=(pose_processor)
    )

gym_buddy.md The file contains structural instructions that guide the behavior of Jim’s companion agent.

## Coaching Style
- Be encouraging and positive
- Provide clear, actionable feedback
- Focus on one correction at a time

## Squat Form
- Keep chest up and back straight
- Knees should track over toes
- Lower until thighs are parallel to ground
- Push through heels to stand

## Safety Guidelines
- Stop user if a dangerous form is detected
- Suggest modifications for beginners
- Remind to keep core engaged

These instructions are packed with it instructions="gym_buddy.md" I parameter gym_buddy.py The file is then analyzed by the agent and provides feedback to understand how to analyze your form during the exercise session.


async def process_frame(self, frame):
    
    poses = await self.pose_processor.process(frame)

    
    feedback = await self.llm.generate_feedback(
        poses=poses,
        instructions=self.instructions
    )
    return feedback

When giving feedback, the agent compares the detected pose with the ideal look from Markdown. Then, it generates natural language feedback using specific tone and style. Safety guidelines gym_buddy.md First the check is done, then specific form corrections are mentioned by the agent.

To add a new workout, you can update gym_buddy.md File with a new section like this:

## Push-up Form
- Keep body in a straight line
- Lower until chest nearly touches floor
- Push through palms to return up
- Keep core engaged

The agent will automatically add these instructions the next time it runs. This makes it easy to update and expand the agent’s capabilities by editing the Markdown file.

Here’s the project and codebase structure for the Gym Companion app we’re developing:

836E0BB0-2620-4182-B2DE-74B15F6AAF48

You can see the full code for the AI ​​gym companion in here GitHub repository.

How to run the app

First, create a virtual environment in Python with this command:

python -m venv venv

It creates .venv Directory

Then activate the virtual python environment like so:

.\venv\Scripts\activate

Now run AI Agent with this command:

uv run gym_buddy.py

You can also start the app with this command:

python gym_buddy.py

It starts loading like this:

7FA5FA8E-4286-40D1-9F34-86E7E7E6182B

The AI ​​agent will:

  1. Make a video call

  2. Open the demo UI in your browser

  3. Join the call and start watching

  4. Have you do squats

  5. Analyze your moves and positions, and then provide feedback

From the above command terminal output, it also shows that Gemini AI is connected.

The agent then loads into your browser such that:

E32B1B35-7356-4C23-8B8B-A8513DD9AABB

It also displays a popup modal that introduces the vision agents. You can skip or click the introduction Next To move on

Vision Agent uses global edge to ensure maximum call latency. This is useful for an AI gym companion to provide real-time feedback on the exercises users are performing.

4BD395CA-ED40-46D7-AB3B-0EDECED23F1D0C

AI Gym Companion can also provide chat messages on exercises through a chat box displayed on the right side of the UI. This is provided by the Chat SDK/API.

7586BDDC-A830-4BF4-8AF3-146DCCF0F337

When you perform a squat, Vision Agent (powered by Gemini) analyzes the video frames in real time. It detects the completion of movement and activates it send_rep_count The tool instantly updates the workout counter on your screen and provides a motivational text and voice response!

Here is one Demo video AI Gym Companion during a workout session:

https://www.youtube.com/watch?v=etqq68p-rge

You can also copy and share the link to test Gym Mate on your mobile phone, or scan the QR code below.

A6C7B56E-9B0B-4819-AE9F-61A32CE71280

Install if you want to test it on your phone Stream video calls app For iOS devices for a better mobile experience.

Next Steps

In this tutorial, you learned how to create an AI gym companion using vision agents.

The real-time gym companion illustrates how vision AI integrates and unlocks human-like interactivity:

  • Video impression (viewing)

  • LLM Understanding (Thinking)

  • Speech Feedback (Speaking)

This low-latency technology allows you to create real-time fitness apps that give instant feedback, just like a personal trainer.

You can check out more use cases of the project with Vision Agents GitHub repository.

You may also like

Leave a Comment

At Skillainest, we believe the future belongs to those who embrace AI, upgrade their skills, and stay ahead of the curve.

Get latest news

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

@2025 Skillainest.Designed and Developed by Pro