How to create a Google Sheet AI agent with Composito and Gemini TTS Support

With the rise of the system of AI agents and agents, we are no longer producing text or photos, we are teaching AI how to take steps. Instead of asking it, “Can I write it for me?” Now you can ask, “Can it work for me?” From updating the CRM to handle the tasks, the agents can now connect with the real tools and perform.

In this article, you will create an AI agent who is composer, nicest. JS, and Gemini can talk, think and update your Google Sheets using TTS.

What has been covered?

In this tutorial, you will learn how to make your AI agent with sound support for Google Sheets that can use tools from compositives. You will learn them on the way:

What is an AI agent?
How to use composer to add integration to your agent.
next with Vercel AI SDK. How to stream answers from the JS API route.
How to work with Gemini Text to Speech API.

The table of content

What is this sheet agent?

First, what is an AI agent? AI agent is a system that can work freely to achieve goals. For example, it can book, send an email, or find a database.

Generative AI, such as chat GPT, is primarily focused on making outputs such as text, photos, or code. An agent is different because he can make decisions, plan and take steps in the real world, not just producing content.

AI agent to work

Large language models (LLMs) often force these agents. The LLM provides reasoning and conversation skills, while the agent’s layer adds tools that enable it to work beyond the easy generation.

So, you may have guessed it. Today, we are creating an AI agent who can access real data from Google Sheets and even make changes.

How to configure the project

It is easy to move and run this project. Follow these steps:

First, you need to clone the storage:

git clone 
cd google-sheet-super-agent

Next, you need to install dependent:

npm install

Then set the environmental variables and run the Development Server:


GEMINI_API_KEY=


COMPOSIO_API_KEY=


COMPOSIO_GOOGLE_SHEET_USER_ID=


GOOGLE_SHEETS_AUTH_CONFIG_ID=


GOOGLE_GENERATIVE_AI_API_KEY=



SESSION_SECRET=

To get a composer API key, make a Account And log into the dashboard. You can find the API key in your default project settings.

For COMPOSIO_GOOGLE_SHEET_USER_IDYou can get it after adding an account to Google Sheets in Composito.

Google Sheets Account Connection button in Composito

Basic ingredients in the application

This project has basically three basic logical ingredients:

1. Start the connection

This is quite straight. You need to start contacting the composer, which is Google Sheets in our case.

// ...Rest of the code

const connection = await composio.connectedAccounts.initiate(
  userID,
  googleSheetAuthConfigID,
  // Comment this out if you want to allow multiple accounts
  // 
        bufferSize: audioBuffer.length,
        contentType: mimeType ,
);

infoLog(
  "Please visit the following URL to authorize: ",
  connection.redirectUrl ? connection.redirectUrl : "Something went wrong!",
);

2. Set TTS with Gemini API

LI of this project, I have decided that I decided with Gemini for TTS Generation instead of Openi just because he recently launched his TTS API.

You can read more about this: Gemini Speech Generation (Speech from Text).

import 
          "Content-Type": mimeType  from "@/lib/logger";
import  from "@/lib/validators/tts";
import  "audio/mpeg",
          "Content-Length": audioBuffer.length.toString(),
          "Cache-Control": "no-cache",
          "Accept-Ranges": "bytes",
         from "@google/genai";
import  "unknown",
        mimeType,
        textLength: text.length,
       from "http-status-codes";
import  "audio/mpeg",
          "Content-Length": audioBuffer.length.toString(),
          "Cache-Control": "no-cache",
          "Accept-Ranges": "bytes",
         from "next/server";
import  from "stream";
import wav from "wav";

const ai = new GoogleGenAI(
          "Content-Type": mimeType );

async function convertL16ToWav(pcmBuffer: Buffer): Promise {
  return new Promise((resolve, reject) => {
    const chunks: Buffer() = ();

    const writer = new wav.Writer({
      channels: 1,
      sampleRate: 24000,
      bitDepth: 16,
    });

    writer.on("data", (chunk) => {
      chunks.push(chunk);
    });

    writer.on("end", () => {
      resolve(Buffer.concat(chunks));
    });

    writer.on("error", reject);

    const readable = new Readable({
      read() {
        this.push(pcmBuffer);
        this.push(null); // End the stream
      },
    });

    readable.pipe(writer);
  });
}

export async function POST(req: NextRequest) {
  try {
    const body = await req.json();
    const parsedBody = ttsSchema.safeParse(body);

    if (!parsedBody.success) {
      return NextResponse.json(
        {
          error: parsedBody.error.message,
        },
        { status: StatusCodes.BAD_REQUEST },
      );
    }

    const { text } = parsedBody.data;

    const result = await ai.models.generateContent({
      model: "gemini-2.5-flash-preview-tts",
      contents: ({ parts: ({ text: text }) }),
      config: {
        responseModalities: ("AUDIO"),
        speechConfig: {
          voiceConfig: {
            prebuiltVoiceConfig: { voiceName: "Kore" },
          },
        },
      },
    });

    const data = result.candidates?.(0)?.content?.parts?.(0)?.inlineData?.data;
    const mimeType =
      result.candidates?.(0)?.content?.parts?.(0)?.inlineData?.mimeType;

    if (typeof data !== "string") {
      errorLog("Invalid audio data received:", { data, mimeType });
      return NextResponse.json(
        { error: "Audio data is not a string." },
        { status: StatusCodes.INTERNAL_SERVER_ERROR },
      );
    }

    if (!data || data.length === 0) {
      errorLog("Empty audio data received:", { data, mimeType });
      return NextResponse.json(
        { error: "Empty audio data received." },
        { status: StatusCodes.INTERNAL_SERVER_ERROR },
      );
    }

    try {
      const audioBuffer = Buffer.from(data, "base64");

      console.log("Generated audio:", {
        bufferSize: audioBuffer.length,
        contentType: mimeType || "unknown",
        mimeType,
        textLength: text.length,
      });

      // Check if it's L16 PCM format that needs conversion
      if (
        mimeType?.startsWith("audio/L16") ||
        mimeType?.startsWith("audio/l16")
      ) {
        const wavBuffer = await convertL16ToWav(audioBuffer);

        return new NextResponse(new Uint8Array(wavBuffer), {
          headers: {
            "Content-Type": "audio/wav",
            "Content-Length": wavBuffer.length.toString(),
            "Cache-Control": "no-cache",
            "Accept-Ranges": "bytes",
          },
        });
      }

      return new NextResponse(new Uint8Array(audioBuffer), {
        headers: {
          "Content-Type": mimeType || "audio/mpeg",
          "Content-Length": audioBuffer.length.toString(),
          "Cache-Control": "no-cache",
          "Accept-Ranges": "bytes",
        },
      });
    } catch (bufferError) {
      errorLog(bufferError, "API /tts (buffer error)");
      return NextResponse.json(
        { error: "Invalid base64 audio data." },
        { status: StatusCodes.INTERNAL_SERVER_ERROR },
      );
    }
  } catch (error) {
    errorLog(error, "API /tts");
    return NextResponse.json(
      { message: "Error generating audio." },
      { status: 500 },
    );
  }
}

It involves more. For some reason, Gemini’s API returns data audio/L16 Format and not i mp3 Or wav The shape we are accustomed to using.

And you really can’t play this audio format directly in your browser. So, before, we need to change it wav Using the format convertL16ToWav Event then, we can return wav Buffer as a response.

I took me forever in implementation. I didn’t know that something like that audio/L16 That I couldn’t play in my browser. I had to do a lot of Google to find it.

All of them, doing all this is to wrap the raw audio in a WAV file that looks like a mono, 24 kg, 16 -bit PCM.

And if you want to use an openly package, which is very easy to use as soon as you return speech mp3 Format, check my project: SHACKOODY/VOS CHIT-II Agent (TTS).

3. Handle the user’s questions

This is the last piece of the puzzle. This is the place where the original toll call is logic.

import { google } from "@ai-sdk/google";
import { streamText } from "ai";
import { Composio } from "@composio/core";
import { NextResponse } from "next/server";
import { chatSchema } from "@/lib/validators/chat";
import { StatusCodes } from "http-status-codes";
import { errorLog } from "@/lib/logger";
import { VercelProvider } from "@composio/vercel";

// ...Rest of the code

const tools = await composio.tools.get(userID, {
  toolkits: ("GOOGLESHEETS"),
});

let conversationContext = "";
if (conversationHistory && conversationHistory.length > 0) {
  conversationContext = conversationHistory
    .map((conversation) => {
      return `${conversation.role}: ${conversation.content}`;
    })
    .join("\n");
}

const systemPrompt = `
You are an intelligent Google Sheets assistant. You can help users analyze, query, and manipulate data in their Google Sheets.

Sheet ID: ${sheetID}
User ID: ${userID}

Guidelines:
- Always use the Google Sheets tools to access real data from the spreadsheet
- Provide clear, actionable insights based on the actual data
- If you need to read data, use the appropriate Google Sheets tools first
- Format your responses in a clear, professional manner
- If asked about calculations, use the actual data from the sheet

Always generate a short summary of what you got done. like if the user asked
you to make changes, then write in short about what all changes you did. If
they asked you to summarize the data, then write in short about what the data
is all about.

---

Previous conversation in this document:

${conversationContext}
`;

const result = streamText({
  model: google("gemini-2.5-pro"),
  system: systemPrompt,
  prompt,
  tools: tools,
  toolChoice: "auto",
});

return result.toUIMessageStreamResponse({ sendReasoning: true });

This code remains in the next dot JS app router. First, we bring tools from the composer that we use composio.tools.get Function we use auto As a tool selection, which means that the agent will use the tools that are most confident of it.

After that, we develop a system prompt that will guide the agent to behave.

Finally, we say streamText The function, which forwards the answer for the whole response, to wait for the whole response, tools, system promotes, and the model before sending it to the client. Then, we send reply in response UIMessageStreamResponse Format so that it can easily appear on the UI.

Google Sheet Agent in action

A sharp demo of the agent in action is:

https://www.youtube.com/watch?v=emxe8q1irao

Conclusion

So, what do you think about this project yet? It was a really entertainment plan for me to work.

Go ahead, clone the storage, and try it with your Google Sheet. Even after all these, this is a very small project that has a very simple logic, which I believe you have already fully understood.

May I recommend that you use it on an important Google Sheet? Not at all. Remember, this is just an AI model that can access tools from composer. You can never believe 100 % with AI. During the construction of this project, I participated in cases where the AI took the wrong tools and even completely messed up the sheet. But, you can always try it on an important sheet to see how it all works.

You can find the whole code of source here: Shrucio/Google Sheet Super Agent.

What has been covered?

The table of content

What is this sheet agent?

How to configure the project

Basic ingredients in the application

1. Start the connection

2. Set TTS with Gemini API

3. Handle the user’s questions

Google Sheet Agent in action

Conclusion

Editor's pick

Get latest news

How to create a Google Sheet AI agent with Composito and Gemini TTS Support

What has been covered?

The table of content

What is this sheet agent?

How to configure the project

Basic ingredients in the application

1. Start the connection

2. Set TTS with Gemini API

3. Handle the user’s questions

Google Sheet Agent in action

Conclusion

Lone Wolf Dev turned to Tom Mondloch (Podcast #190) of Open Source Super Partners

Juno vs Touch Designer vs P5.Js vs AI Art Generator | By Temporary Labs | September, 2025

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news