How to make AI chat boot communicating with stream chat and reaction

Modern chat applications are increasingly adding sound input capabilities as they offer more attractive and versatile user experience. It also improves access, which allows users with different requirements to interact more comfortably with such applications.

In this tutorial, I will guide you in the process of creating a communicating AI application that integrates real -time chat functionality with sound recognition. By taking advantage of the Web Speech API for stream chat and text conversion for strong messaging, you will create a multi -dimensional chat application that supports both sound and text conversations.

The table of content

Provisions

Before we start, make sure you have the following:

Read an API key and a stream account with secret (Read on Way to Get them HereJes
Access to LLM API (such as Openi, Entropic).
Node. JS and NPM/yarn are installed.
The basic knowledge of the reaction and type script.
Modern Browser with Web Speech API Support (such as Chrome, Edge)

Peek stealthily

Let us take a brief look at the app made in this tutorial. That way, before you jump into the details, you feel what it does.

5228AE93-FF56-4B0F-8EA8-C7A160973191

If you are excited now, let’s enter it straight!

Basic technologies

This application is powered by three key players: stream chat, web speech API, and one node Dot JS + Express Passead.

Stream Chat There is a platform that helps you to easily and integrate the real -time chat and messaging experiences in your applications. It offers different platforms (such as Android, iOS, reactions) and various types of SDKS (software development kits) for pre -constructed UI components to smooth development. Its firm and attractive chat functionality of this app makes it a great choice – we don’t have to make anything from the beginning.

Web speech api There is a browser standard that allows you to connect the sound input and output into your apps, enables you to identify speech (speaking speech into text) and the synthesis of speech (converting the text into speech). We will use speech identification feature in this project.

Node Dot JS + Express Back & Correct Agent Manages Institution and Our LLM takes action to respond to the conversation created by API.

Passead implemented guide

Let’s start with your back, engine room – where the user’s input is moved towards the appropriate AI model, and a processed response is returned. Our backpack supports numerous AI models, especially Openi and Entropic.

Project Setup

Create a folder, call it ‘My chat application‘.
Clone it Gut Hub Ripozetry
After cloning, rename the folder ‘Hill‘
Open .env.example Provide file and required keys (you will need to provide an open or interstropic key – the open weather key is optional).
Change the name env.exampleTo file .env
Install dependent by running this command:
```
 npm install
```
Run the project by inserting this command:
```
 npm start
```
Your backdoid is running easily localhost:3000.

Front & Implementation Guide

This section detects two wide, interconnected ingredients: the identification of the chat structure and the speech.

Project Setup

We will create and set up our react project with the Stream Chat React SD. We will use the White with the type script template. To do this, go to your own My chat application Folder, open your terminal and enter this command:

npm create vite frontend -- --template react-ts
cd chat-example
npm i stream-chat stream-chat-react

With our Front & Project, we can now run the app:

npm run dev

Understand the app’s component

The main focus here is to start the chat client, connect the user, make a channel, and present the chat interface. We will move step by step by all these processes to help you understand them better:

Specify the permanent

First of all, we need to provide some important credentials that we need user creation and chat client setup. You can find these credentials on your stream Dashboard.

const apiKey = "xxxxxxxxxxxxx";
const userId = "111111111";
const userName = "John Doe";
const userToken = "xxxxxxxxxx.xxxxxxxxxxxx.xx_xxxxxxx-xxxxx_xxxxxxxx";

Note: These are dummy credentials. Make sure you use your credentials.

Create a user

Next, we need to create a user object. We will make it using ID, name and created avatar URL:

const user: User =  ()).filter((watcherId) => watcherId !== userId),
        ));
      ;

Set up to a client

We need to track the condition of the active chat channel using it useState Hook to ensure smooth real -time messaging in this stream chat application. Called a custom hook useCreateChatClient Chat client starts with an API key, user token, and user data:

  const (channel, setChannel) = useState();
  const client = useCreateChatClient( ()).filter((watcherId) => watcherId !== userId)
        );
      );

Start the channel

Now, we start a messaging channel to enable real time communication in the Stream Chat application. When the chat client is ready useEffect Hook stimulates the creation of a messaging channel my_channelAdding the user as a member. The channel is then stored in the channel state, making sure that the app has been presented for dynamic conversation.

  useEffect(() => 
        setWatchers((prevWatchers) => (
          userId,
          ...(prevWatchers , (client));

Offer Chat Interface

With all the essential parts of our chat application, we will return a JSX to describe the chat interface structures and components:

 if (!client) return <div>Setting up client & connection...div>;

  return (
    <Chat client=
        setWatchers((prevWatchers) =>
          (prevWatchers >
      <Channel channel=
        setWatchers((prevWatchers) => (
          userId,
          ...(prevWatchers >
        <Window>
          <MessageList />
          <MessageInput />
        Window>
        <Thread />
      Channel>
    Chat>
  );

In this JSX structure:

If the client is not ready, it shows the message “client and connection …”.
Once the client is ready, it offers using a chat interface:
- : Wrap the stream chat context with the early client.
- : Determines the active channel.
- : The main chat contains UI components:
- : Raders threaded answers.

With this, we have compiled our chat interface and channel, and we have a client ready. Our interface looks like so far:

Stream Chat interface

Adding AI to the channel

Remember, this chat application is designed to communicate with an AI, so we need to be able to add and remove the AI from the channel. On the UI, we will add a button to the channel header to add and remove AI to users. But we still need to decide whether we already have the channel to know which option is to show.

We will prepare a custom hook useWatchers. It monitors the presence of AI whose name uses a concept watchers:

import  from 'react';
import { Channel } from 'stream-chat';

export const useWatchers = ({ channel }: { channel: Channel }) => {
  const (watchers, setWatchers) = useState(());
  const (error, setError) = useState<Error | null>(null);

  const queryWatchers = useCallback(async () => {
    setError(null);

    try {
      const result = await channel.query({ watchers: { limit: 5, offset: 0 } });
      setWatchers(result?.watchers?.map((watcher) => watcher.id).filter((id): id is string => id !== undefined) || ())
      return;
    } catch (err) {
      setError(err as Error);
    }
  }, (channel));

  useEffect(() => {
    queryWatchers();
  }, (queryWatchers));

  useEffect(() => {
    const watchingStartListener = channel.on('user.watching.start', (event) => {
      const userId = event?.user?.id;
      if (userId && userId.startsWith('ai-bot')) {
        setWatchers((prevWatchers) => (
          userId,
          ...(prevWatchers || ()).filter((watcherId) => watcherId !== userId),
        ));
      }
    });

    const watchingStopListener = channel.on('user.watching.stop', (event) => {
      const userId = event?.user?.id;
      if (userId && userId.startsWith('ai-bot')) {
        setWatchers((prevWatchers) =>
          (prevWatchers || ()).filter((watcherId) => watcherId !== userId)
        );
      }
    });

    return () => {
      watchingStartListener.unsubscribe();
      watchingStopListener.unsubscribe();
    };
  }, (channel));

  return { watchers, error };
};

Now we can create a new channel header component by using useChannelStateContext Hook to access the channel and start customs useWatchers Using Hook Viewers’ data, we make a description aiInChannel Variable to display the relevant text. Based on this variable, we either demand it start-ai-agent Or stop-ai-agent The closing point at Node.T. JS Passeid.

import { useChannelStateContext } from 'stream-chat-react';
import { useWatchers } from './useWatchers';

export default function ChannelHeader() {
  const { channel } = useChannelStateContext();
  const { watchers } = useWatchers({ channel });

  const aiInChannel =
    (watchers ?? ()).filter((watcher) => watcher.includes('ai-bot')).length > 0;
  return (
    <div className='my-channel-header'>
      <h2>{(channel?.data as { name?: string })?.name ?? 'Voice-and-Text AI Chat'}h2>
      <button onClick={addOrRemoveAgent}>
        {aiInChannel ? 'Remove AI' : 'Add AI'}
      button>
    div>
  );

  async function addOrRemoveAgent() {
    if (!channel) return;
    const endpoint = aiInChannel ? 'stop-ai-agent' : 'start-ai-agent';
    await fetch(`http://127.0.0.1:3000/${endpoint}`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ channel_id: channel.id, platform: 'openai' }),
    });
  }
}

Adding AI state indicators

AIS takes a little time to process information, so when doing AI processing, we add an indicator to reflect its status. We create a AIStateIndicator It does for us:

import { AIState } from 'stream-chat';
import { useAIState, useChannelStateContext } from 'stream-chat-react';

export default function MyAIStateIndicator() {
  const { channel } = useChannelStateContext();
  const { aiState } = useAIState(channel);
  const text = textForState(aiState);
  return text && <p className='my-ai-state-indicator'>{text}p>;

  function textForState(aiState: AIState): string {
    switch (aiState) {
      case 'AI_STATE_ERROR':
        return 'Something went wrong...';
      case 'AI_STATE_CHECKING_SOURCES':
        return 'Checking external resources...';
      case 'AI_STATE_THINKING':
        return "I'm currently thinking...";
      case 'AI_STATE_GENERATING':
        return 'Generating an answer for you...';
      default:
        return '';
    }
  }
}

Build speech for text functionality

To this point, we have a functional chat application that sends messages and receives feedback from AI. Now, we want to enable sound interaction, which allows users to talk to AI instead of manually typing.

To achieve this l, we will compose text functionality from speech within one CustomMessageInput Let the ingredients go through this whole process, step by step, to understand how to get it.

Forming early states

While CustomMessageInput The first mounts of the ingredient, it begins by setting up its basic state structure:

  const (isRecording, setIsRecording) = useState(false);
  const (isRecognitionReady, setIsRecognitionReady) = useState(false);
  const recognitionRef = useRef(null);
  const isManualStopRef = useRef(false);
  const currentTranscriptRef = useRef("");

This initial phase is very important because it establishs the ability to track the component of multiple harmony states: whether the recording is active, whether the speech is ready API, and the various perseverance method to handle the life cycle of speech identification.

Integration of context

In the stream chat, MessageInputContext Is established within MessageInput Ingredients provide data to the input UI component and its children. Since we want to use the values stored in it MessageInputContext We will call the custom input UI component, we will call useMessageInputContext Customs hook:

  
  const { handleSubmit, textareaRef } = useMessageInputContext();

This step ensures that the voice input feature connects with existing chat infrastructure without interruption, shared the same way textarea References and submission methods that use other input methods.

Web speech API detection and origin

Web Speech API is not supported by some browsers, which is why we need to check whether the application running the application is compatible. The first major process of the ingredient includes the web speech API detection and their launch:

 const SpeechRecognition = (window as any).SpeechRecognition||(window as any).webkitSpeechRecognition;

Once the API was detected, the component forms speech identification service with maximum settings.

Event Handler Configure

We will have two event handlers of the event: Result processing handler and Life Cycle event handlers.

The result processing handler acts on the output of speech identification. This reflects the two -phase processing approach where interim results provide immediate effects, while final results are collected for accuracy.

      recognition.onresult = (event: any) => {
        let finalTranscript = "";
        let interimTranscript = "";

        
        for (let i = event.resultIndex; i < event.results.length; i++) {
          const transcriptSegment = event.results(i)(0).transcript;
          if (event.results(i).isFinal) {
            finalTranscript += transcriptSegment + " ";
          } else {
            interimTranscript += transcriptSegment;
          }
        }

        
        if (finalTranscript) {
          currentTranscriptRef.current += finalTranscript;
        }

        
        const combinedTranscript = (currentTranscriptRef.current + interimTranscript).trim();

        
        if (combinedTranscript) {
          updateTextareaValue(combinedTranscript);
        }
      };

The Life Cycle event handler ensures that the component speech identification life responds to every step of every step of the cycle events (onstartFor, for, for,. onend And onerror::

      recognition.onstart = () => {
        console.log("Speech recognition started");
        setIsRecording(true);
        currentTranscriptRef.current = ""; 
      };

      recognition.onend = () => {
        console.log("Speech recognition ended");
        setIsRecording(false);

        
        if (!isManualStopRef.current && isRecording) {
          try {
            recognition.start();
          } catch (error) {
            console.error("Error restarting recognition:", error);
          }
        }

        isManualStopRef.current = false;
      };

      recognition.onerror = (event: any) => {
        console.error("Speech recognition error:", event.error);
        setIsRecording(false);
        isManualStopRef.current = false;

        switch (event.error) {
          case "no-speech":
            console.warn("No speech detected");
            
            break;
          case "not-allowed":
            alert(
              "Microphone access denied. Please allow microphone permissions.",
            );
            break;
          case "network":
            alert("Network error occurred. Please check your connection.");
            break;
          case "aborted":
            console.log("Speech recognition aborted");
            break;
          default:
            console.error("Speech recognition error:", event.error);
        }
      };

      recognitionRef.current = recognition;
      setIsRecognitionReady(true);
      } else {
      console.warn("Web Speech API not supported in this browser.");
      setIsRecognitionReady(false);
      }

To start sound input

When a user clicks the microphone button, the component starts a multi -faceted process that includes requesting microphone permission and supplying clear error if users refuse access.

 const toggleRecording = async (): Promise<void> => {
    if (!recognitionRef.current) {
      alert("Speech recognition not available");
      return;
    }

    if (isRecording) {
      
      isManualStopRef.current = true;
      recognitionRef.current.stop();
    } else {
      try {
        
        await navigator.mediaDevices.getUserMedia({ audio: true });

        
        currentTranscriptRef.current = "";
        updateTextareaValue("");

        
        recognitionRef.current.start();
      } catch (error) {
        console.error("Microphone access error:", error);
        alert(
          "Unable to access microphone. Please check permissions and try again.",
        );
      }
    }
  };

Resetting the state and starting recognition

Before starting the speech, the component resets its internal condition. This reset ensures that each new voice input session begins with a clean slate, which prevents interference from previous sessions.

currentTranscriptRef.current = "";
updateTextareaValue("");
recognitionRef.current.start();

Real Time Speech Processing

Two things happen simultaneously during this process:

Permanent Results Processing: As the user speaks, the ingredient permanently follows the speech data coming through a sophisticated pipeline:
- Each speech is classified as either interim (temporary) or final (certified).
- The final results are submitted in a permanent transcript reference.
- The interim results are mixed with the immediate display for the display.

Dynamic Textieria Update: The component refreshes textarea In Real Time using the Customs Dom Diamond Reconstruction Purpose:

 const updateTextareaValue = (value: string) => {
   const nativeInputValueSetter = Object.getOwnPropertyDescriptor(
     window.HTMLTextAreaElement.prototype,
     'value'
   )?.set;

   if (nativeInputValueSetter) {
     nativeInputValueSetter.call(textareaRef.current, value);
     const inputEvent = new Event('input', { bubbles: true });
     textareaRef.current.dispatchEvent(inputEvent);
   }
 };

The move involves ignoring the traditional control component behavior of the reaction to provide immediate impression, while still maintaining compatibility with the reacting program system.

User interface feedback

We will add some visual feedback features to make sound interaction smooth for users. They include:

Toggle between mic and stop icons

When we are active, we show a microphone icon on a useless and stop icon. This recording provides a clear indication of the state.

Recording Notification Banner
A notification banner appears in the upper part of the screen to indicate that the sound recording is underway. This notification ensures that when the microphone is active, consumers should be informed, to address confidentiality and use concerns.
```
 {isRecording && (
   <div className="recording-notification show">
     <span className="recording-icon">🎤span>
     Recording... Click stop when finished
   div>
 )}
```

Message integration and submission

Dipped text integrates with existing chat system without interruptions by joint textarea References and contexts provided by submission handler:

This integration means that voice -generated messages go on the path to the same submission, such as typed messages, consistency with the conduct of the chat system. After submitting the message, the component ensures proper cleaning of its internal condition, preparing the next sound input session.

Customers to move the input component

By making our custom messaging input component, we will now send it to him Input Support MessageInput The ingredient in us App.tsx:

The flow of full action

How does the request work here:

After the component mounts, add you by clicking AI in a chat Add AI Button
Click on Mike icon To start recording.
Your browser will ask for permission to use microphone.
If you Refusal Permission, recording will not start.
If you Allow Permission, recording and copy begins simultaneously.
Click on Stop (square) icon To end the recording.
Click on Send the button To submit your message.
AI takes action on your input and responds.

Conclusion

In this tutorial, you have learned how to make a powerful conversation chat boot using stream chat and reaction. The application supports both text and sound inputs.

If you want to create your own engagement experiences of chat, you can find stream Chat And Video Features for carrying your projects to the next level.

Get the full source code for this project Here. If you feel great to read this article, contact me Linked Or follow me X For more programming posts and articles.

Will meet on the next one!

The table of content

Provisions

Peek stealthily

Basic technologies

Passead implemented guide

Project Setup

Front & Implementation Guide

Project Setup

Understand the app’s component

Specify the permanent

Create a user

Set up to a client

Start the channel

Offer Chat Interface

Adding AI to the channel

Adding AI state indicators

Build speech for text functionality

Forming early states

Integration of context

Web speech API detection and origin

Event Handler Configure

To start sound input

Resetting the state and starting recognition

Real Time Speech Processing

User interface feedback

Message integration and submission

Customers to move the input component

The flow of full action

Conclusion

Editor's pick

Get latest news

How to make AI chat boot communicating with stream chat and reaction

The table of content

Provisions

Peek stealthily

Basic technologies

Passead implemented guide

Project Setup

Front & Implementation Guide

Project Setup

Understand the app’s component

Specify the permanent

Create a user

Set up to a client

Start the channel

Offer Chat Interface

Adding AI to the channel

Adding AI state indicators

Build speech for text functionality

Forming early states

Integration of context

Web speech API detection and origin

Event Handler Configure

To start sound input

Resetting the state and starting recognition

Real Time Speech Processing

User interface feedback

Message integration and submission

Customers to move the input component

The flow of full action

Conclusion

Sword Health Nabis $ 40m $ 4b Valuation, IPO Plans Purpose at least 2028

Imagefift Active 2 square debuts with a 1.75 -inch AMOLED display and battery life up to 10 days

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news