I Vibe coded a tool that analyzes customer sentiment and themes from call recordings.

Photo by author

# Introduction

Every day, customer service centers record thousands of conversations. These audio files contain gold mines of hidden information. Are customers satisfied? What problems do they often mention? How do emotions change during a call?
Analyzing these recordings manually is difficult. However, with advanced artificial intelligence (AI), we can automatically transcribe calls, detect sentiment, and extract recurring themes—all offline and with open-source tools.

In this article, I’ll walk you through a complete customer sentiment analysis project. You will learn how to:

Convert audio files to text The whisper
Detect emotions (positive, negative, neutral) and emotions (frustration, satisfaction, urgency)
Extracting titles using automatic BERTopic
Displaying results in an interactive dashboard.

The best part is that everything runs locally. Your sensitive customer data never leaves your machine.

Figure 1: Dashboard overview showing sentiment measurement, sentiment radar, and topic segmentation

# Understanding why native AI matters to customer data

Cloud-based AI services such as API of OpenAI are powerful, but they come with concerns such as privacy issues, where customer calls often contain personal information. High cost, where you pay a per-API-call price, which increases exponentially for higher volumes. and dependence on Internet rate limits. By running locally, data residency requirements are easier to meet.

This native AI speech-to-text tutorial puts everything on your hardware. Models are downloaded once and run offline forever.

A system architecture overview shows how well each component handles a task. This modular design makes the system easy to understand, test and extend.

Figure 2: System architecture overview shows how well each component handles a task. This modular design makes the system easy to understand, test and extend.

// Conditions

Before you begin, make sure you have the following:

Python 3.9+ is installed on your machine.
You should be FFMPEG Installed for audio processing.
You should have basic familiarity with Python and machine learning concepts.
You need about 2GB of disk space for the AI models.

// Setting up your project

Clone the repository and configure your environment:

git clone

Create a virtual environment:

Activate (Windows):

Activate (Mac/Linux):

Install dependencies:

pip install -r requirements.txt

The first run downloads the AI models. (Total ~1.5 GB). After that, everything works offline.

Figure 3: Terminal showing successful installation.

# Mimicking Audio with Whisper

In a customer sentiment analyzer, the first step is to convert the spoken words from the call recording to text. It is an automatic speech recognition (ASR) system developed by Whisper. Open AI. Let’s see how it works, why it’s a great choice, and how we use it in a project.

Whisper is a transformer-based encoder-decoder model trained on 680,000 hours of multilingual audio. When you feed it an audio file, it:

Resamples audio to 16kHz mono.
Mail produces a spectrogram—a visual representation of frequency over time—that serves as a picture of the sound.
Divides the spectrogram into 30 second windows.
Each window is passed through an encoder that creates a hidden representation.
Translates these representations into text tokens one word (or subword) at a time.

Think of a mail spectrogram as how machines “see” sound. The x-axis represents time, the y-axis represents frequency, and the color intensity represents volume. The result is a highly accurate reproduction, even with background noise or accents.

Code enforcement

Here is the basic replication logic:

import whisper

class AudioTranscriber:
    def __init__(self, model_size="base"):
        self.model = whisper.load_model(model_size)
   
    def transcribe_audio(self, audio_path):
        result = self.model.transcribe(
            str(audio_path),
            word_timestamps=True,
            condition_on_previous_text=True
        )
        return {
            "text": result("text"),
            "segments": result("segments"),
            "language": result("language")
        }

gave model_size The parameter controls accuracy versus speed.

Model	Parameters	speed	Best for
small	39M	the fastest	Quick check
The basis	74M	fast	Development
small	244M	medium	production
big	1550M	slow	Maximum accuracy

In most use cases, base or small Offers excellent balance.

Figure 4: Transcription output showing time-stamped segments.

# Analyzing sentiment with transformers

With the extracted text, we perform sentiment analysis using it. Hug Face Transformers. We use Cardiff NLP. Roberta The model, trained on social media text, is perfect for user interactions.

// Comparison of Emotions and Emotions

Sentiment analysis classifies text as positive, neutral or negative. We use a robust Roberta model because it understands context better than simple keyword matching.

The replica is tokenized and passed through the transformer. The last layer uses softmax activation, which outputs the probability of being 1. For example, if positive is 0.85, neutral is 0.10, and negative is 0.05, then the overall sentiment is positive.

passion: Overall polarity (positive, negative, or neutral) Answering the question: “Is it good or bad?”
Emotions: Specific feelings (anger, happiness, fear) Answering the question: “What exactly are they feeling?”

We explore both for complete insight.

// Code implementation for sentiment analysis

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch.nn.functional as F

class SentimentAnalyzer:
    def __init__(self):
        model_name = "cardiffnlp/twitter-roberta-base-sentiment-latest"
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
   
    def analyze(self, text):
        inputs = self.tokenizer(text, return_tensors="pt", truncation=True)
        outputs = self.model(**inputs)
        probabilities = F.softmax(outputs.logits, dim=1)
       
        labels = ("negative", "neutral", "positive")
        scores = {label: float(prob) for label, prob in zip(labels, probabilities(0))}
       
        return {
            "label": max(scores, key=scores.get),
            "scores": scores,
            "compound": scores("positive") - scores("negative")
        }

gave compound Scores range from -1 (very negative) to +1 (very positive), making it easy to track emotional trends over time.

// Why avoid simple dictionary methods?

Like traditional methods Vader Count positive and negative words. However, they often lack context:

“It’s not good.” The lexicon sees “good” as positive.
A transformer treats negation (“not”) as negative.

Transformers understand the relationships between words, making them much more accurate to real-world text.

# Extracting topics with BERTopic

Knowing sentiment is useful, but what are customers talking about? BERTopic Automatically discovers themes in text without having to explain them to you beforehand.

// How BERTopic works

Embeddings: convert each transcript into a vector using Phrase Transformers
Dimensional reduction: UMAP compresses these vectors into a low-dimensional space.
cluster: HDBSCAN Groups similar duplicates together.
Subject Representation: For each cluster, extract the most relevant words using c-TF-IDF.

The result is a collection of topics such as “Billing Issues,” “Technical Support,” or “Product Feedback.” As opposed to the old ways Latent Dirichlet Allocation (LDA)BERTopic understands literal meaning. “Shipping delay” and “late delivery” cluster together because they mean the same thing.

Code enforcement

from topics.py:

from bertopic import BERTopic

class TopicExtractor:
    def __init__(self):
        self.model = BERTopic(
            embedding_model="all-MiniLM-L6-v2",
            min_topic_size=2,
            verbose=True
        )
   
    def extract_topics(self, documents):
        topics, probabilities = self.model.fit_transform(documents)
       
        topic_info = self.model.get_topic_info()
        topic_keywords = {
            topic_id: self.model.get_topic(topic_id)(:5)
            for topic_id in set(topics) if topic_id != -1
        }
       
        return {
            "assignments": topics,
            "keywords": topic_keywords,
            "distribution": topic_info
        }

Note: Multiple documents (at least 5-10) are required to extract the topic to find meaningful patterns. Single calls are analyzed using a fitted model.

Figure 5: Topic distribution bar chart showing the billing, shipping and technical support categories.

# Creating an Interactive Dashboard with Streamlit

Processing raw data is difficult. We made one Stream Light dashboard (app.py) that allows business users to discover results. Streamlit turns Python scripts into web applications with minimal code. Our dashboard provides:

Upload interface for audio files.
Real-time processing with progress indicators
Using interactive visualizations tactfully
Drill-down capability to explore individual calls

// Code implementation for dashboard structure

import streamlit as st

def main():
    st.title("Customer Sentiment Analyzer")
   
    uploaded_files = st.file_uploader(
        "Upload Audio Files",
        type=("mp3", "wav"),
        accept_multiple_files=True
    )
   
    if uploaded_files and st.button("Analyze"):
        with st.spinner("Processing..."):
            results = pipeline.process_batch(uploaded_files)
       
        # Display results
        col1, col2 = st.columns(2)
        with col1:
            st.plotly_chart(create_sentiment_gauge(results))
        with col2:
            st.plotly_chart(create_emotion_radar(results))

Caching of Streamlit @st.cache_resource This ensures that models load once and persist across interactions, which is critical to a responsive user experience.

Figure 7: Complete dashboard with sidebar options and multiple visualization tabs

// Key Features

Upload audio (or use sample transcripts for testing)
Watch the transcript with emotional highlights
Timeline of emotions (if the call is long enough)
Topic visualization using plotly interactive charts

// Caching for performance

Streamlit replays the script on each interaction. To avoid reprocessing heavy models, we use @st.cache_resource:

@st.cache_resource
def load_models():
    return CallProcessor()

processor = load_models()

// Real-time processing

When a user uploads a file, we show a spinner during processing, then immediately show the results:

if uploaded_file:
    with st.spinner("Transcribing and analyzing..."):
        result = processor.process_file(uploaded_file)
    st.success("Done!")
    st.write(result("text"))
    st.metric("Sentiment", result("sentiment")("label"))

# Reviewing practical lessons

Audio Processing: From Waveform to Text

Whisper’s magic lies in the variation of its mail spectrogram. Human hearing is logarithmic, meaning we are better at distinguishing low frequencies as much as possible. Mel scale mimics this, so the model “sounds” more like a human. A spectrogram is essentially a 2D image (time vs. frequency), which the transformer encoder processes in the same way it processes an image patch. This is why Whisper handles noisy audio so well. It looks at the whole picture.

// Transformer Outputs: Softmax vs Sigmoid

Softmax (Passion): forces the probability to 1. This is ideal for mutually exclusive classes, because a sentence is usually not both positive and negative.
sigmoid (emotions): treats each class independently. A sentence can delight and surprise at the same time. The sigmoid allows for this overlap.

Choosing the right activation for your problem domain is critical.

// Communicating insight with insight

A good dashboard does more than just show numbers. It tells a story. Plotly charts are interactive; Users can hover to view details, zoom into time ranges, and click legends to toggle data series. It turns raw analytics into actionable insights.

// Run the application

To run the application, follow the steps from the beginning of this article. Check sentiment and sentiment analysis without audio files:

This sample runs text through natural language processing (NLP) models and displays the results in a terminal.

Analyze a recording:

python main.py --audio path/to/call.mp3

Batch process directory:

python main.py --batch data/audio/

For the full interactive experience:

python main.py --dashboard

open in your browser.

Figure 8: Terminal output showing successful analysis with sentiment score.

# The result

We’ve built a complete, offline-capable system that transcribes customer calls, analyzes sentiment and sentiment, and extracts recurring themes—all with open-source tools. This is a production-ready foundation:

Customer support teams identify pain points.
Product managers are gathering feedback at scale.
Quality Assurance Monitoring Agent Performance

The best part? Everything runs natively, respecting user privacy and eliminating API costs.

The full code is available on GitHub: An-AI-that-analyzes-customer-sentiment. Clone the repository, follow this native AI speech-to-text tutorial, and start extracting insights from your customer calls today.

Shatu Olomide A software engineer and technical writer with a knack for simplifying complex concepts and a keen eye for detail, passionate about leveraging modern technology to craft compelling narratives. You can also search on Shittu. Twitter.

# Introduction

# Understanding why native AI matters to customer data

// Conditions

// Setting up your project

# Mimicking Audio with Whisper

# Analyzing sentiment with transformers

// Comparison of Emotions and Emotions

// Code implementation for sentiment analysis

// Why avoid simple dictionary methods?

# Extracting topics with BERTopic

// How BERTopic works

# Creating an Interactive Dashboard with Streamlit

// Code implementation for dashboard structure

// Key Features

// Caching for performance

// Real-time processing

# Reviewing practical lessons

// Transformer Outputs: Softmax vs Sigmoid

// Communicating insight with insight

// Run the application

# The result

Editor's pick

Get latest news

I Vibe coded a tool that analyzes customer sentiment and themes from call recordings.

# Introduction

# Understanding why native AI matters to customer data

// Conditions

// Setting up your project

# Mimicking Audio with Whisper

# Analyzing sentiment with transformers

// Comparison of Emotions and Emotions

// Code implementation for sentiment analysis

// Why avoid simple dictionary methods?

# Extracting topics with BERTopic

// How BERTopic works

# Creating an Interactive Dashboard with Streamlit

// Code implementation for dashboard structure

// Key Features

// Caching for performance

// Real-time processing

# Reviewing practical lessons

// Transformer Outputs: Softmax vs Sigmoid

// Communicating insight with insight

// Run the application

# The result

AI Mode in Chrome | Browse and search side by side without switching tabs.

AI tool that turns your gestures into real video games – ParallaxPro

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news