Modern chat applications are increasingly adding sound input capabilities as they offer more attractive and versatile user experience. It also improves access, which allows users with different requirements to interact more comfortably with such applications.
In this tutorial, I will guide you in the process of creating a communicating AI application that integrates real -time chat functionality with sound recognition. By taking advantage of the Web Speech API for stream chat and text conversion for strong messaging, you will create a multi -dimensional chat application that supports both sound and text conversations.
The table of content
Provisions
Before we start, make sure you have the following:
Read an API key and a stream account with secret (Read on Way to Get them HereJes
Access to LLM API (such as Openi, Entropic).
Node. JS and NPM/yarn are installed.
The basic knowledge of the reaction and type script.
Modern Browser with Web Speech API Support (such as Chrome, Edge)
Peek stealthily
Let us take a brief look at the app made in this tutorial. That way, before you jump into the details, you feel what it does.
If you are excited now, let’s enter it straight!
Basic technologies
This application is powered by three key players: stream chat, web speech API, and one node Dot JS + Express Passead.
Stream Chat There is a platform that helps you to easily and integrate the real -time chat and messaging experiences in your applications. It offers different platforms (such as Android, iOS, reactions) and various types of SDKS (software development kits) for pre -constructed UI components to smooth development. Its firm and attractive chat functionality of this app makes it a great choice – we don’t have to make anything from the beginning.
Web speech api There is a browser standard that allows you to connect the sound input and output into your apps, enables you to identify speech (speaking speech into text) and the synthesis of speech (converting the text into speech). We will use speech identification feature in this project.
Node Dot JS + Express Back & Correct Agent Manages Institution and Our LLM takes action to respond to the conversation created by API.
Passead implemented guide
Let’s start with your back, engine room – where the user’s input is moved towards the appropriate AI model, and a processed response is returned. Our backpack supports numerous AI models, especially Openi and Entropic.
Project Setup
Create a folder, call it ‘My chat application‘.
Clone it Gut Hub Ripozetry
After cloning, rename the folder ‘Hill‘
Open
.env.example
Provide file and required keys (you will need to provide an open or interstropic key – the open weather key is optional).Change the name
env.example
To file.env
Install dependent by running this command:
npm install
Run the project by inserting this command:
npm start
Your backdoid is running easily
localhost:3000
.
Front & Implementation Guide
This section detects two wide, interconnected ingredients: the identification of the chat structure and the speech.
Project Setup
We will create and set up our react project with the Stream Chat React SD. We will use the White with the type script template. To do this, go to your own My chat application Folder, open your terminal and enter this command:
npm create vite frontend -- --template react-ts
cd chat-example
npm i stream-chat stream-chat-react
With our Front & Project, we can now run the app:
npm run dev
Understand the app’s component
The main focus here is to start the chat client, connect the user, make a channel, and present the chat interface. We will move step by step by all these processes to help you understand them better:
Specify the permanent
First of all, we need to provide some important credentials that we need user creation and chat client setup. You can find these credentials on your stream Dashboard.
const apiKey = "xxxxxxxxxxxxx";
const userId = "111111111";
const userName = "John Doe";
const userToken = "xxxxxxxxxx.xxxxxxxxxxxx.xx_xxxxxxx-xxxxx_xxxxxxxx";
Note: These are dummy credentials. Make sure you use your credentials.
Create a user
Next, we need to create a user object. We will make it using ID, name and created avatar URL:
const user: User = ()).filter((watcherId) => watcherId !== userId),
));
;
Set up to a client
We need to track the condition of the active chat channel using it useState
Hook to ensure smooth real -time messaging in this stream chat application. Called a custom hook useCreateChatClient
Chat client starts with an API key, user token, and user data:
const (channel, setChannel) = useState();
const client = useCreateChatClient( ()).filter((watcherId) => watcherId !== userId)
);
);
Start the channel
Now, we start a messaging channel to enable real time communication in the Stream Chat application. When the chat client is ready useEffect
Hook stimulates the creation of a messaging channel my_channel
Adding the user as a member. The channel is then stored in the channel state, making sure that the app has been presented for dynamic conversation.
useEffect(() =>
setWatchers((prevWatchers) => (
userId,
...(prevWatchers , (client));
Offer Chat Interface
With all the essential parts of our chat application, we will return a JSX to describe the chat interface structures and components:
if (!client) return <div>Setting up client & connection...div>;
return (
<Chat client=
setWatchers((prevWatchers) =>
(prevWatchers >
<Channel channel=
setWatchers((prevWatchers) => (
userId,
...(prevWatchers >
<Window>
<MessageList />
<MessageInput />
Window>
<Thread />
Channel>
Chat>
);
In this JSX structure:
If the client is not ready, it shows the message “client and connection …”.
Once the client is ready, it offers using a chat interface:
: Wrap the stream chat context with the early client.
: Determines the active channel.
: The main chat contains UI components:
: Raders threaded answers.
With this, we have compiled our chat interface and channel, and we have a client ready. Our interface looks like so far:
Adding AI to the channel
Remember, this chat application is designed to communicate with an AI, so we need to be able to add and remove the AI ​​from the channel. On the UI, we will add a button to the channel header to add and remove AI to users. But we still need to decide whether we already have the channel to know which option is to show.
We will prepare a custom hook useWatchers
. It monitors the presence of AI whose name uses a concept watchers
:
import from 'react';
import { Channel } from 'stream-chat';
export const useWatchers = ({ channel }: { channel: Channel }) => {
const (watchers, setWatchers) = useState(());
const (error, setError) = useState<Error | null>(null);
const queryWatchers = useCallback(async () => {
setError(null);
try {
const result = await channel.query({ watchers: { limit: 5, offset: 0 } });
setWatchers(result?.watchers?.map((watcher) => watcher.id).filter((id): id is string => id !== undefined) || ())
return;
} catch (err) {
setError(err as Error);
}
}, (channel));
useEffect(() => {
queryWatchers();
}, (queryWatchers));
useEffect(() => {
const watchingStartListener = channel.on('user.watching.start', (event) => {
const userId = event?.user?.id;
if (userId && userId.startsWith('ai-bot')) {
setWatchers((prevWatchers) => (
userId,
...(prevWatchers || ()).filter((watcherId) => watcherId !== userId),
));
}
});
const watchingStopListener = channel.on('user.watching.stop', (event) => {
const userId = event?.user?.id;
if (userId && userId.startsWith('ai-bot')) {
setWatchers((prevWatchers) =>
(prevWatchers || ()).filter((watcherId) => watcherId !== userId)
);
}
});
return () => {
watchingStartListener.unsubscribe();
watchingStopListener.unsubscribe();
};
}, (channel));
return { watchers, error };
};
Now we can create a new channel header component by using useChannelStateContext
Hook to access the channel and start customs useWatchers
Using Hook Viewers’ data, we make a description aiInChannel
Variable to display the relevant text. Based on this variable, we either demand it start-ai-agent
Or stop-ai-agent
The closing point at Node.T. JS Passeid.
import { useChannelStateContext } from 'stream-chat-react';
import { useWatchers } from './useWatchers';
export default function ChannelHeader() {
const { channel } = useChannelStateContext();
const { watchers } = useWatchers({ channel });
const aiInChannel =
(watchers ?? ()).filter((watcher) => watcher.includes('ai-bot')).length > 0;
return (
<div className='my-channel-header'>
<h2>{(channel?.data as { name?: string })?.name ?? 'Voice-and-Text AI Chat'}h2>
<button onClick={addOrRemoveAgent}>
{aiInChannel ? 'Remove AI' : 'Add AI'}
button>
div>
);
async function addOrRemoveAgent() {
if (!channel) return;
const endpoint = aiInChannel ? 'stop-ai-agent' : 'start-ai-agent';
await fetch(`http://127.0.0.1:3000/${endpoint}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ channel_id: channel.id, platform: 'openai' }),
});
}
}
Adding AI state indicators
AIS takes a little time to process information, so when doing AI processing, we add an indicator to reflect its status. We create a AIStateIndicator
It does for us:
import { AIState } from 'stream-chat';
import { useAIState, useChannelStateContext } from 'stream-chat-react';
export default function MyAIStateIndicator() {
const { channel } = useChannelStateContext();
const { aiState } = useAIState(channel);
const text = textForState(aiState);
return text && <p className='my-ai-state-indicator'>{text}p>;
function textForState(aiState: AIState): string {
switch (aiState) {
case 'AI_STATE_ERROR':
return 'Something went wrong...';
case 'AI_STATE_CHECKING_SOURCES':
return 'Checking external resources...';
case 'AI_STATE_THINKING':
return "I'm currently thinking...";
case 'AI_STATE_GENERATING':
return 'Generating an answer for you...';
default:
return '';
}
}
}
Build speech for text functionality
To this point, we have a functional chat application that sends messages and receives feedback from AI. Now, we want to enable sound interaction, which allows users to talk to AI instead of manually typing.
To achieve this l, we will compose text functionality from speech within one CustomMessageInput
Let the ingredients go through this whole process, step by step, to understand how to get it.
Forming early states
While CustomMessageInput
The first mounts of the ingredient, it begins by setting up its basic state structure:
const (isRecording, setIsRecording) = useState(false);
const (isRecognitionReady, setIsRecognitionReady) = useState(false);
const recognitionRef = useRef(null);
const isManualStopRef = useRef(false);
const currentTranscriptRef = useRef("");
This initial phase is very important because it establishs the ability to track the component of multiple harmony states: whether the recording is active, whether the speech is ready API, and the various perseverance method to handle the life cycle of speech identification.
Integration of context
In the stream chat, MessageInputContext
Is established within MessageInput
Ingredients provide data to the input UI component and its children. Since we want to use the values ​​stored in it MessageInputContext
We will call the custom input UI component, we will call useMessageInputContext
Customs hook:
const { handleSubmit, textareaRef } = useMessageInputContext();
This step ensures that the voice input feature connects with existing chat infrastructure without interruption, shared the same way textarea
References and submission methods that use other input methods.
Web speech API detection and origin
Web Speech API is not supported by some browsers, which is why we need to check whether the application running the application is compatible. The first major process of the ingredient includes the web speech API detection and their launch:
const SpeechRecognition = (window as any).SpeechRecognition||(window as any).webkitSpeechRecognition;
Once the API was detected, the component forms speech identification service with maximum settings.
Event Handler Configure
We will have two event handlers of the event: Result processing handler and Life Cycle event handlers.
The result processing handler acts on the output of speech identification. This reflects the two -phase processing approach where interim results provide immediate effects, while final results are collected for accuracy.
recognition.onresult = (event: any) => {
let finalTranscript = "";
let interimTranscript = "";
for (let i = event.resultIndex; i < event.results.length; i++) {
const transcriptSegment = event.results(i)(0).transcript;
if (event.results(i).isFinal) {
finalTranscript += transcriptSegment + " ";
} else {
interimTranscript += transcriptSegment;
}
}
if (finalTranscript) {
currentTranscriptRef.current += finalTranscript;
}
const combinedTranscript = (currentTranscriptRef.current + interimTranscript).trim();
if (combinedTranscript) {
updateTextareaValue(combinedTranscript);
}
};
The Life Cycle event handler ensures that the component speech identification life responds to every step of every step of the cycle events (onstart
For, for, for,. onend
And onerror
::
recognition.onstart = () => {
console.log("Speech recognition started");
setIsRecording(true);
currentTranscriptRef.current = "";
};
recognition.onend = () => {
console.log("Speech recognition ended");
setIsRecording(false);
if (!isManualStopRef.current && isRecording) {
try {
recognition.start();
} catch (error) {
console.error("Error restarting recognition:", error);
}
}
isManualStopRef.current = false;
};
recognition.onerror = (event: any) => {
console.error("Speech recognition error:", event.error);
setIsRecording(false);
isManualStopRef.current = false;
switch (event.error) {
case "no-speech":
console.warn("No speech detected");
break;
case "not-allowed":
alert(
"Microphone access denied. Please allow microphone permissions.",
);
break;
case "network":
alert("Network error occurred. Please check your connection.");
break;
case "aborted":
console.log("Speech recognition aborted");
break;
default:
console.error("Speech recognition error:", event.error);
}
};
recognitionRef.current = recognition;
setIsRecognitionReady(true);
} else {
console.warn("Web Speech API not supported in this browser.");
setIsRecognitionReady(false);
}
To start sound input
When a user clicks the microphone button, the component starts a multi -faceted process that includes requesting microphone permission and supplying clear error if users refuse access.
const toggleRecording = async (): Promise<void> => {
if (!recognitionRef.current) {
alert("Speech recognition not available");
return;
}
if (isRecording) {
isManualStopRef.current = true;
recognitionRef.current.stop();
} else {
try {
await navigator.mediaDevices.getUserMedia({ audio: true });
currentTranscriptRef.current = "";
updateTextareaValue("");
recognitionRef.current.start();
} catch (error) {
console.error("Microphone access error:", error);
alert(
"Unable to access microphone. Please check permissions and try again.",
);
}
}
};
Resetting the state and starting recognition
Before starting the speech, the component resets its internal condition. This reset ensures that each new voice input session begins with a clean slate, which prevents interference from previous sessions.
currentTranscriptRef.current = "";
updateTextareaValue("");
recognitionRef.current.start();
Real Time Speech Processing
Two things happen simultaneously during this process:
Permanent Results Processing: As the user speaks, the ingredient permanently follows the speech data coming through a sophisticated pipeline:
Each speech is classified as either interim (temporary) or final (certified).
The final results are submitted in a permanent transcript reference.
The interim results are mixed with the immediate display for the display.
Dynamic Textieria Update: The component refreshes
textarea
In Real Time using the Customs Dom Diamond Reconstruction Purpose:const updateTextareaValue = (value: string) => { const nativeInputValueSetter = Object.getOwnPropertyDescriptor( window.HTMLTextAreaElement.prototype, 'value' )?.set; if (nativeInputValueSetter) { nativeInputValueSetter.call(textareaRef.current, value); const inputEvent = new Event('input', { bubbles: true }); textareaRef.current.dispatchEvent(inputEvent); } };
The move involves ignoring the traditional control component behavior of the reaction to provide immediate impression, while still maintaining compatibility with the reacting program system.
User interface feedback
We will add some visual feedback features to make sound interaction smooth for users. They include:
Toggle between mic and stop icons
When we are active, we show a microphone icon on a useless and stop icon. This recording provides a clear indication of the state.
Recording Notification Banner
A notification banner appears in the upper part of the screen to indicate that the sound recording is underway. This notification ensures that when the microphone is active, consumers should be informed, to address confidentiality and use concerns.
{isRecording && ( <div className="recording-notification show"> <span className="recording-icon">🎤span> Recording... Click stop when finished div> )}
Message integration and submission
Dipped text integrates with existing chat system without interruptions by joint textarea
References and contexts provided by submission handler:
This integration means that voice -generated messages go on the path to the same submission, such as typed messages, consistency with the conduct of the chat system. After submitting the message, the component ensures proper cleaning of its internal condition, preparing the next sound input session.
Customers to move the input component
By making our custom messaging input component, we will now send it to him Input
Support MessageInput
The ingredient in us App.tsx
:
The flow of full action
How does the request work here:
After the component mounts, add you by clicking AI in a chat Add AI Button
Click on Mike icon To start recording.
Your browser will ask for permission to use microphone.
If you Refusal Permission, recording will not start.
If you Allow Permission, recording and copy begins simultaneously.
Click on Stop (square) icon To end the recording.
Click on Send the button To submit your message.
AI takes action on your input and responds.
Conclusion
In this tutorial, you have learned how to make a powerful conversation chat boot using stream chat and reaction. The application supports both text and sound inputs.
If you want to create your own engagement experiences of chat, you can find stream Chat And Video Features for carrying your projects to the next level.
Get the full source code for this project Here. If you feel great to read this article, contact me Linked Or follow me X For more programming posts and articles.
Will meet on the next one!