
Photo by author
# Introduction
Before we get started, I want you to watch this video:
Isn’t it amazing? I mean now you can run a full native model that you can talk to on your machine and it works out of the box. This Feels like talking to a real person Because the system can listen and speak at the same time, just like a natural conversation.
This is not the “you speak then wait then answer” routine. Personaplex Real-time speech-to-speech is conversational AI. which handles interruptions, overlaps and natural conversational gestures like “uh-huh” or “right” as you speak.
PersonaPlex is designed to be full duplex so it can listen and create speech simultaneously without forcing the user to pause first. It makes conversations feel much more fluid and human than traditional voice assistants.
In this tutorial, we’ll learn how to set up a Linux environment, install PersonaPlex locally, and then start the PersonaPlex web server so you can interact with the AI ​​in real time in your browser.
# Using PersonaPlex Natively: A Step-by-Step Guide
In this section, we’ll see how we install PersonaPlex on Linux, launch the real-time WebUI, and start talking to a full-duplex speech-to-speech AI model running natively on our machine.
// Step 1: Accept the terms of the model and generate a token
Before you can download and run PersonaPlex, you must accept the terms of use for the model on Hugging Face. The PersonaPlex-7B-v1 speech-to-speech model from NVIDIA is gated, which means you can’t access the weights unless you agree to the license terms on the model page.
Go to PersonaPlex. Model page On the face hugger and login. You will see a notice saying that you need to share your contact information and accept the license terms to access the files. Review the NVIDIA Open Model License and accept the terms to unlock the repository.
Once access is granted, create an access token for the hugger face:
- go to Settings → Access Tokens
- Create a new token with Read on permission
- Copy the generated token.
Then export it to your terminal:
export HF_TOKEN="YOUR_HF_TOKEN"This token allows your local machine to authenticate and download the PersonaPlex model.
// Step 2: Installing Linux Dependencies
Before installing PersonaPlex, you must install Opus Audio Codec Development Library. PersonaPlex relies on Opus to handle real-time audio encoding and decoding, so this dependency must be available on your system.
Run on Ubuntu or Debian based systems:
sudo apt update
sudo apt install -y libopus-dev// Step 3: Building PersonaPlex from Source
Now we will clone the PersonaPlex repository and install the required moshi package from source.
Clone the official NVIDIA repository:
git clone
cd personaplexOnce inside the project directory, install moshi:
This will compile and install PersonaPlex components with all required dependencies, including PyTorch, CUDA libraries, NCCL, and audio tooling.
You should see packages like torch, nvidia-cublas-cu12, nvidia-cudnn-cu12, punishment-fragments, and moshi-personaplex installed successfully.
Tip: Do this in a virtual environment if you’re on your own machine.
// Step 4: Starting the WebUI Server
Before launching the server, install the faster Hugging Face downloader:
Now start the PersonaPlex Realtime Server:
python -m moshi.server --host 0.0.0.0 --port 8998The first run will download the full PersonaPlex model, which is about 16.7 GB. This may take some time depending on your internet speed.
After the download is complete, the model will be loaded into memory and the server will start.
// Step 5: Talking to PersonaPlex in a Browser
Now that the server is up and running, it’s time to actually talk to PersonaPlex.
If you are running it on your local machine, copy and paste this link into your browser: http://localhost:8998.
This will load the WebUI interface in your browser.
After the page opens:
- Select a sound.
- Click connect
- Allow microphone.
- Start speaking.
The interface includes conversation templates. For this demo, we chose astronaut (entertainment) Templates to make interactions more playful. You can also create your own template by editing the initial system prompt text. It allows you to fully customize the AI’s personality and behavior.
For voice selection, we switched from default and chose Natural F3 Just trying something different.

And honestly, it feels surprisingly natural.
You can pause it while it’s speaking.
You can ask follow-up questions.
You can change topics mid-sentence.
It easily handles the flow of conversations and responds intelligently in real time. I even tested it by simulating a bank customer service call, and the experience felt realistic.

PersonaPlex includes several sound presets:
- Natural (female): NATF0, NATF1, NATF2, NATF3
- Natural (Male): NATM0, NATM1, NATM2, NATM3
- Varieties (Female): VARF0, VARF1, VARF2, VARF3, VARF4
- Varieties (Male): VARM0, VARM1, VARM2, VARM3, VARM4
You can experiment with different sounds to match the personality you like. Some feel more communicative, others more expressive.
# Concluding Remarks
After going through this whole setup and talking to PersonaPlex in real time, one thing becomes very clear.
It feels different.
We are used to chat-based AI. You type. This gives the answer. You wait your turn. This transaction is felt.
Speech-to-speech is fully animated.
With PersonaPlex running natively, you’re no longer waiting your turn. You can interrupt it. You can change direction in the middle of a sentence. You can naturally ask follow-up questions. The conversation flows. It feels closer to how humans actually talk.
And this is why I truly believe that the future of AI is speech-to-speech.
But even this is only half the story.
The real change will happen when these real-time communication systems are deeply connected to agents and tools. Imagine talking to your AI and saying, “Book me a ticket for Friday morning.” Check stock price and trade. Write that email and send it. Schedule a meeting. Pull the report.
Tabs are not being changed. Not copy paste. Not typing commands.
Just talking.
PersonaPlex already solves one of the most difficult problems, which is natural, full-duplex conversation. The next layer is execution. Once speech-to-speech systems connect to APIs, automation tools, browsers, trading platforms, and productivity apps, they stop being assistants and start becoming operators.
In short, it becomes something like OpenClaw on steroids.
A system that doesn’t just talk like humans but works on your behalf in real time.
Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in Technology Management and a Bachelor’s degree in Telecommunication Engineering. His vision is to create an AI product using graph neural networks for students struggling with mental illness.