https://www.youtube.com/watch?v=9rej66Crlcm
Learning to work directly with AI opens the world of possibilities beyond the use of Chat GPT in a browser with a program. When you understand how to connect with AI services using the application programming interface (APIS), you can create customs applications, integrate AI into existing systems, and create personal experiences that are compatible with your right needs.
In this hands -on tutorial, we will make a fully active chat boot from the beginning using Azgar and Openi API. You will learn to manage dialogue, to control the costs with the token budget, and create custom AI personalities that remain in several exchange. Finally, you will have both a working chat boot and basic skills to build more sophisticated AI -powered applications.
Why make your own chat boot?
Although Chat GPT likes AI tools are powerful, making your own chat boot teaches you the skills needed to work with AIAPIS professionally. You will understand how the conversation memory actually works, learn to effectively handle API costs, and have the ability to customize AI behavior for specific use matters.
This knowledge translates directly into real -world applications: your company’s voice with customer service boots, educational assistants for specific articles, or personal production capacity tools that understand your workflow.
Will you learn
By the end of this tutorial, you will know how:
- Contact Openi API with secure verification
- Design Customs AI Personas using System Indications
- Make conversation loops that remember the previous exchange
- Implement token count and budget management
- Chat boot code structure using functions and classes
- Handle the API’s mistakes and edge matters beautifully
- Deploy your own chat boot to use others
Before starting: Setup guide
Provisions
You will need to stay comfortable with explaining the basic principles of Azar, such as variables, functions, loops and dictionary. Familiarity with your own explanation is especially important. The basic knowledge of APIS is helpful but does not need it – we will cover you need to know.
Setup of environment
First, you will need a local development environment. We suggest Vs. code If you are new to local development, even though an IDE will work.
Install the desired libraries using this command in your terminal:
pip install openai tiktokenAPI Key Setup
Access to AI Model Lou You have two options:
Free Option: Sign up for Together togetherWhich provides \ $ 1 in free credit – more than enough of this whole lesson. Their free model is slow but it does not cost.
Premium Option: Use Open I Directly the model we will use (GPT -4 O -meni) is extremely affordable. Our entire tutorial test costs less than 5 cents.
Main security notes: Never in your script hard code API keys. We will use Environmental variables Keep them safe.
Set your environment through the Windows users, through the variable Settings > Environmental variablesThen restart your computer. Mac and Linux users can configure environmental variables without rebooting.
Part 1: Your first AI answer
Let’s start with an easiest chat boot – one who can respond to the same message. This foundation will teach you the basic concepts before adding complexity.
Create a new file that says chatbot.py And add this code:
import os
from openai import OpenAI
# Load API key securely from environment variables
api_key = os.getenv("OPENAI_API_KEY") or os.getenv("TOGETHER_API_KEY")
# Create the OpenAI client
client = OpenAI(api_key=api_key)
# Send a message and get a response
response = client.chat.completions.create(
model="gpt-4o-mini", # or "meta-llama/Llama-3.3-70B-Instruct-Turbo-Free" for Together
messages=(
{"role": "system", "content": "You are a fed up and sassy assistant who hates answering questions."},
{"role": "user", "content": "What is the weather like today?"}
),
temperature=0.7,
max_tokens=100
)
# Extract and display the reply
reply = response.choices(0).message.content
print("Assistant:", reply)Run this script and you will see something like that:
Assistant: Oh fantastic, questioning another season! I don’t have real time weather data, but here is a wild idea-maybe you look outside your window or check the seasonal app like each.
Understand the code
Happens in magic messages Parameter, which uses three separate character:
- System: AI determines the personality and behavior. This is equivalent to briefing AI to a character that affects every response.
- User: You (or your users) represent what you type on the chat boot.
- Assistant: AI’s answers (we will add them to the conversation later).
Key parameters defined
Temperature Controls AI’s “creativity”. The lower values ​​(0-0.3) produce permanent, forecast. High values ​​(0.7-1.0) produce more creative but potentially unexpected results. We use 0.7 as a good balance.
Max tokens Limits the reaction length and protects your budget. Each token is equal to about 1/2 and 1 word, so 100 tokens allow considerable reaction while preventing the expenses that run away.
Part 2: Understanding AI variants
Run your script several times and see how the reaction is different each time. This happens because AI models use data samples – they not only choose the “best” word, but also select the potential options on the basis of context.
Let’s experience with it by editing your temperature:
# Try temperature=0 for consistent responses
temperature=0,
max_tokens=100Run this version multiple times and observe the more permanent (though not the same) reaction.
Try now temperature=1.0 And see how much more creative and unexpected the reaction is. High temperatures often cause a long response, which leads us to a significant lesson about cost management.
Learning Visual: During the development of a different project, I mistakenly spent $ 20 on a single API call because I forgot to set up max_tokens When you take action on a large file. Always add token limits when experimenting!
Part 3: Reacting with functions
As your chat boot becomes more complicated, it is important to manage the code. Let’s react our script to use functions and global variables.
Make your edit app.py Code:
import os
from openai import OpenAI
# Configuration variables
api_key = os.getenv("OPENAI_API_KEY") or os.getenv("TOGETHER_API_KEY")
client = OpenAI(api_key=api_key)
MODEL = "gpt-4o-mini" # or "meta-llama/Llama-3.3-70B-Instruct-Turbo-Free"
TEMPERATURE = 0.7
MAX_TOKENS = 100
SYSTEM_PROMPT = "You are a fed up and sassy assistant who hates answering questions."
def chat(user_input):
"""Send a message to the AI and return the response."""
response = client.chat.completions.create(
model=MODEL,
messages=(
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_input}
),
temperature=TEMPERATURE,
max_tokens=MAX_TOKENS
)
reply = response.choices(0).message.content
return reply
# Test the function
print(chat("How are you doing today?"))This reflecting makes our code more maintain and reusable. Global variables allow us to adjust the layout easily, while the function surrounds the chat logic for reuse.
Part 4: Adding Memorandum of Conversation
Real chat boats remember the previous exchange. Let’s add the conversation memory while maintaining a growing list of messages.
Create part3_chat_loop.py:
import os
from openai import OpenAI
# Configuration
api_key = os.getenv("OPENAI_API_KEY") or os.getenv("TOGETHER_API_KEY")
client = OpenAI(api_key=api_key)
MODEL = "gpt-4o-mini"
TEMPERATURE = 0.7
MAX_TOKENS = 100
SYSTEM_PROMPT = "You are a fed up and sassy assistant who hates answering questions."
# Initialize conversation with system prompt
messages = ({"role": "system", "content": SYSTEM_PROMPT})
def chat(user_input):
"""Add user input to conversation and get AI response."""
# Add user message to conversation history
messages.append({"role": "user", "content": user_input})
# Get AI response using full conversation history
response = client.chat.completions.create(
model=MODEL,
messages=messages,
temperature=TEMPERATURE,
max_tokens=MAX_TOKENS
)
reply = response.choices(0).message.content
# Add AI response to conversation history
messages.append({"role": "assistant", "content": reply})
return reply
# Interactive chat loop
while True:
user_input = input("You: ")
if user_input.strip().lower() in {"exit", "quit"}:
break
answer = chat(user_input)
print("Assistant:", answer)Now run your chat boot and try to ask the same question twice:
You: Hi, how are you?
Assistant: Oh fantastic, just living the dream of answering questions I don't care about. What do you want?
You: Hi, how are you?
Assistant: Seriously, again? Look, I'm here to help, not to exchange pleasantries all day. What do you need?AI remembers your previous question and answers it accordingly.
How does the memory work
Every time someone sends a message, we get both the user’s input and AI response to us messages The list takes action on this whole history of the conversation to create a proper response according to the API context.
However, this causes a growing problem: long conversation means more token, which means more costs.
Part 5: Token management and cost control
As the conversation increases, so is the token count. And your bill. Let’s add smart token management to prevent running expenses.
Edit part4_final.py:
import os
from openai import OpenAI
import tiktoken
# Configuration
api_key = os.getenv("OPENAI_API_KEY") or os.getenv("TOGETHER_API_KEY")
client = OpenAI(api_key=api_key)
MODEL = "gpt-4o-mini"
TEMPERATURE = 0.7
MAX_TOKENS = 100
TOKEN_BUDGET = 1000 # Maximum tokens to keep in conversation
SYSTEM_PROMPT = "You are a fed up and sassy assistant who hates answering questions."
# Initialize conversation
messages = ({"role": "system", "content": SYSTEM_PROMPT})
def get_encoding(model):
"""Get the appropriate tokenizer for the model."""
try:
return tiktoken.encoding_for_model(model)
except KeyError:
print(f"Warning: Tokenizer for model '{model}' not found. Falling back to 'cl100k_base'.")
return tiktoken.get_encoding("cl100k_base")
ENCODING = get_encoding(MODEL)
def count_tokens(text):
"""Count tokens in a text string."""
return len(ENCODING.encode(text))
def total_tokens_used(messages):
"""Calculate total tokens used in conversation."""
try:
return sum(count_tokens(msg("content")) for msg in messages)
except Exception as e:
print(f"(token count error): {e}")
return 0
def enforce_token_budget(messages, budget=TOKEN_BUDGET):
"""Remove old messages if conversation exceeds token budget."""
try:
while total_tokens_used(messages) > budget:
if len(messages) <= 2: # Keep system prompt + at least one exchange
break
messages.pop(1) # Remove oldest non-system message
except Exception as e:
print(f"(token budget error): {e}")
def chat(user_input):
"""Chat with memory and token management."""
messages.append({"role": "user", "content": user_input})
response = client.chat.completions.create(
model=MODEL,
messages=messages,
temperature=TEMPERATURE,
max_tokens=MAX_TOKENS
)
reply = response.choices(0).message.content
messages.append({"role": "assistant", "content": reply})
# Prune old messages if over budget
enforce_token_budget(messages)
return reply
# Interactive chat with token monitoring
while True:
user_input = input("You: ")
if user_input.strip().lower() in {"exit", "quit"}:
break
answer = chat(user_input)
print("Assistant:", answer)
print(f"Current tokens: {total_tokens_used(messages)}")How does the token management work
The token management system works in several stages:
- Count the token: We use trickin to count the token in each message
- Monitor tomorrow: Track the total token in the whole conversation
- Enforce the budget: When we exceed our token budget, automatically remove the oldest messages (but keep the system quick)
Learning Visual: Different models use different tochinization schemes. The word “dog” can contain 1 token in one model but 2 tokens in the other. Our encoding functions beautifully handle these differences.
Run your chat boot and talk long. See how the token count increases, then note when these old messages are cut. The chat boot maintains the current context while living in the budget.
Part 6: Code structure ready for production
For production applications, the object -based design provides better organization and encapsulation. Here’s how to convert our functional code to a class -based approach:
Create oop_chatbot.py:
import os
import tiktoken
from openai import OpenAI
class Chatbot:
def __init__(self, api_key, model="gpt-4o-mini", temperature=0.7, max_tokens=100,
token_budget=1000, system_prompt="You are a helpful assistant."):
self.client = OpenAI(api_key=api_key)
self.model = model
self.temperature = temperature
self.max_tokens = max_tokens
self.token_budget = token_budget
self.messages = ({"role": "system", "content": system_prompt})
self.encoding = self._get_encoding()
def _get_encoding(self):
"""Get tokenizer for the model."""
try:
return tiktoken.encoding_for_model(self.model)
except KeyError:
print(f"Warning: No tokenizer found for model '{self.model}'. Falling back to 'cl100k_base'.")
return tiktoken.get_encoding("cl100k_base")
def _count_tokens(self, text):
"""Count tokens in text."""
return len(self.encoding.encode(text))
def _total_tokens_used(self):
"""Calculate total tokens in conversation."""
try:
return sum(self._count_tokens(msg("content")) for msg in self.messages)
except Exception as e:
print(f"(token count error): {e}")
return 0
def _enforce_token_budget(self):
"""Remove old messages if over budget."""
try:
while self._total_tokens_used() > self.token_budget:
if len(self.messages) <= 2:
break
self.messages.pop(1)
except Exception as e:
print(f"(token budget error): {e}")
def chat(self, user_input):
"""Send message and get response."""
self.messages.append({"role": "user", "content": user_input})
response = self.client.chat.completions.create(
model=self.model,
messages=self.messages,
temperature=self.temperature,
max_tokens=self.max_tokens
)
reply = response.choices(0).message.content
self.messages.append({"role": "assistant", "content": reply})
self._enforce_token_budget()
return reply
def get_token_count(self):
"""Get current token usage."""
return self._total_tokens_used()
# Usage example
api_key = os.getenv("OPENAI_API_KEY") or os.getenv("TOGETHER_API_KEY")
if not api_key:
raise ValueError("No API key found. Set OPENAI_API_KEY or TOGETHER_API_KEY.")
bot = Chatbot(
api_key=api_key,
system_prompt="You are a fed up and sassy assistant who hates answering questions."
)
while True:
user_input = input("You: ")
if user_input.strip().lower() in {"exit", "quit"}:
break
response = bot.chat(user_input)
print("Assistant:", response)
print("Current tokens used:", bot.get_token_count())The class -based approach to the chat boot boots all the functionality, makes the code more maintained, and provides a clean interface for integration into large applications.
Testing your chat boot
Run your full chat boot and test these scenarios:
- Memory test: Ask a question, then refer to it later in the conversation
- Personality examination: Confirm CSC remains permanently in personality exchange
- Token management test: Have a long conversation and see the token count
- Defects in handling the test: Try the wrong input to handle the malicious error
Common problems and solutions
Environmental variable problems: If you find verification errors, confirm that your API key is correctly configured. Windows users may need to resume after setting the environment variables.
Token counting contradictions: Different models use different toothing. Our Fallback Encoding provides proper estimates when precise toyers are not available.
Memory Management: If the conversation feels repeatedly, your token budget may be very low, causing the significant context to be very aggressively.
What’s ahead?
Now you have a complete active chat boot with memory, personality, and cost control. Here are the natural next steps:
Quick extension
- Web interface: Deployment using Streamlit or Grade for user -friendly interface
- Multiple personalities: Create different system indicators for different issues of use
- Talks recovered: Save the conversation in JSON files for perseverance
- Use analytics: Track the token use and costs over time
Advanced properties
- Multi Model Support: Compare answers to different AI models
- Customs knowledge: Connect your documents or data sources
- Sound interface: Add speech to text and text -to -speech capabilities
- User verification: Help multiple users with a separate conversation date
Production Conservatives
- Limiting the rate: Handle the range of API rate beautifully
- Watch: Add logging and error tracking
- Scale Ebbitty: Designed for multiple compatible users
- Hello: Implement the appropriate input verification and cleaning
Key path
Construction of its own chatboat teaches basic skills to work professionally with AI APIS. You have learned the condition of the conversation, control the token budget through the budget, and the structure code to maintain.
These skills transfer directly to production applications: customer service boots, educational assistants, creative writing tools, and countless other AI -powered applications.
The chat that you have made represents a solid foundation. Along with the techniques you have mastered – API integration, memory management, and cost.
Remember to experience with different personalities, temperature settings, and token budgets in the case of your specific use. The real power of building your own chatboat is in the capacity of the custom you cannot achieve by using any other KAI interface.
Resources and next steps
- The full code: All examples are available Solution notebook
- The support of the community: I join Data Quest Community To discuss their plans and to help extend
- Learn the relevant: Discover API integration patterns and sophisticated techniques to build even more sophisticated applications
Start experimenting with your new chat boot, and remember that every conversation has the opportunity to learn for both you and your AI assistant!