Early Leader to track the use of tokens in LLM apps

by SkillAiNest

Early Leader to track the use of tokens in LLM appsEarly Leader to track the use of tokens in LLM apps
Photo by Author | Ideogram.ai

. Introduction

When you make a large language model applications, tokens are money. If you’ve ever worked with an LLM like GPT4, you probably have been the moment where you check the bill and think, “How did he height?!” Every API call makes you use token, which directly affects both delays and costs. But without detecting them, you don’t know where they are spending or how to improve.

There Langasmith Comes It not only detects your LLM calls, but also allows you to log, monitor and imagine the use of tokens for every step of your workflow. In this guide, we will cover:

  1. Why token tracking matters?
  2. How to set the logging?
  3. How to imagine token consumption in the Langasmith Dashboard?

. Why does the token make a difference?

Token tracking matters because the direct cost of each interaction with a large language model is linked to the number of token, both in your inputs and the output of the model. Without supervision, small incompetence, unnecessary context, or useless requests in the indicator can quietly slip your bill and slow down performance.

From the token tracking, you get the same way as they are eating. That way you can improve indicators, smooth workflows, and maintain cost control. For example, if your chat boot is using 1,500 tokens per application, reducing it can reduce the cost of 800 tokens by half. The concept of token tracking somehow works:
Why does the token make a difference?Why does the token make a difference?

. Langasmith set up for token logging

!! Step 1: Install the desired packages

pip3 install langchain langsmith transformers accelerate langchain_community

!! Step 2: Make all necessary imports

import os
from transformers import pipeline
from langchain.llms import HuggingFacePipeline
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langsmith import traceable

!! Step 3: Configure Langsmith

Set your API Key And the project name:

# Replace with your API key
os.environ("LANGCHAIN_API_KEY") = "your-api-key"
os.environ("LANGCHAIN_PROJECT") = "HF_FLAN_T5_Base_Demo"
os.environ("LANGCHAIN_TRACING_V2") = "true"


# Optional: disable tokenizer parallelism warnings
os.environ("TOKENIZERS_PARALLELISM") = "false"

!! Step 4: Load the throat facial model

Use CPU -friendly model Google/Filan-T 5 Base And enable to take samp samples of more natural results:

model_name = "google/flan-t5-base"
pipe = pipeline(
   "text2text-generation",
   model=model_name,
   tokenizer=model_name,
   device=-1,      # CPU
   max_new_tokens=60,
   do_sample=True, # enable sampling
   temperature=0.7
)
llm = HuggingFacePipeline(pipeline=pipe)

!! Step 5: Make a quick and chain

Describe a quick template and connect it to your throat facial pipeline using LLM chain:

prompt_template = PromptTemplate.from_template(
   "Explain gravity to a 10-year-old in about 20 words using a fun analogy."
)


chain = LLMChain(llm=llm, prompt=prompt_template)

!! Step 6: Find Function with Langasmith

Use @Traceable Decorator to automatically log in input, outpts, tokens and run time:

@traceable(name="HF Explain Gravity")
def explain_gravity():
   return chain.run({})

!! Step 7: Run Function and Print Results

answer = explain_gravity()
print("\n=== Hugging Face Model Answer ===")
print(answer)

Output:

=== Hugging Face Model Answer ===
Gravity is a measure of mass of an object.

!! Step 8: Check the Langasmith Dashboard

Barley Smith. Projects tracking projects. You’ll do something like:
Langsmith Dashboard - Transing ProjectsLangsmith Dashboard - Transing Projects
Even you can see the cost associated with each project, which allows you to analyze your billing. Now to see the use of tokens and other insights, click on your project. And you’ll see:
Langsmith Dashboard - Number of runsLangsmith Dashboard - Number of runs
The Red Box features and list the number of runs in your project. Click on any run and you’ll see:
Langsmith Dashboard - token insightsLangsmith Dashboard - token insights

You can see different things here, such as total tokens, lettuce, etc. Click the dashboard as shown below:
Langasmith DashboardLangasmith Dashboard

Now you can see the graph over time to track the token use trends, check the average lettuce per application, compare the input vs. Output token, and indicate the duration of peak use. It helps improve insight indicators, manage costs and improve model performance.
Langsmith Dashboard - GraphLangsmith Dashboard - Graph

Please scroll down to see all the graphs associated with your project.

!! Step 9: Discover the Langasmith Dashboard

You can analyze a lot of insights like:

  • See the signs for example: Click trace to view detailed processing, including raw input, generated output, and performance measurements
  • Individual signs inspect: Each Trace LOY, you can find every stage of publication, see indicators, results, token use and delay
  • Check token use and lettuce: Detailed tokens help to indicate obstacles and improve performance times
  • Diagnostic chains: Use Diagnostic Tools of Langasmith to test the scenario, track the performance of the model and compare the output
  • Experience in the playground: Adjust parameters such as temperature, instant templates, or sampling settings to fix your model’s behavior

With this setup, you now have a full -fledged performance of your embrace facial model runs, tokens, and the overall performance in the Langsmith Dashboard.

. How to spot and fix the token hogs?

Once you are logged in, you can:

  • See if the gesture is too tall
  • Indicate calls where the model is being created more
  • Go to smaller small models of cheap tasks
  • Cash answers to avoid duplicate requests

This is gold to debug on long chains or agents. Find steps and fix it by eating the most token.

. Wrap

That way you can compose and use Langasmith. The use of login tokens is not just about saving money, it is about construction of smart, more efficient LLM apps. The guide provides a foundation, you can find more information by searching, experimenting and analyzing your work flow.

Kanwal seals A machine is a learning engineer and is a technical author that has a deep passion for data science and has AI intersection with medicine. He authored EBook with “Maximum Production Capacity with Chat GPT”. As a Google Generation Scholar 2022 for the APAC, the Champions Diversity and the Educational Virtue. He is also recognized as a tech scholar, Mitacs Global Research Scholar, and Harvard Vacod Scholar as a Taradata diversity. Kanwal is a passionate lawyer for change, who has laid the foundation of a Fame Code to empower women in stem fields.

You may also like

Leave a Comment

At Skillainest, we believe the future belongs to those who embrace AI, upgrade their skills, and stay ahead of the curve.

Get latest news

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

@2025 Skillainest.Designed and Developed by Pro