Large language models (LLM) such as Lama 2 and Mistles are often described as “black boxes”. This means that the text you give them and see their reaction, but their internal works are hidden. Within the model, billions of weights and neuron activities change the input into outputs that we cannot directly interpret, so we see the results but there is no reasoning behind them. They develop the text in an inspirational way, but how do they reality represent the meaning of the meaning?
In this tutorial, you will run an open source LLM locally on your machine and dig into its invisible activities-internal neuron values prepared while acting on the Mutan. By imagining these activities, you can see patterns that are related to emotions, imitation and prejudice.
This lesson will help you:
Understand how LLM represent the text internally
Experience with embedded and invisible states in azagar
Build concepts that disclose differences between words, phrases, or emotions
Consider how prejudice and associations appear in nerve models
Here we are going to cover this tutorial, and yes – we will do all this locally, which will not cost clouds.
The table of content
Provisions
—Gar 3.10+
A machine with at least 8GB of RAM (recommended 16GB)
Basic acquaintance with command line and azagar
Packages:
torchFor, for, for,.transformersFor, for, for,.matplotlibFor, for, for,.scikit-learn
Step 0: Create and activate virtual environment
Why use a virtual environment?
When you install with the Libraries pipThey usually go to your global setup. This mess may be sharp:
Different projects may need different versions of the same library (eg,
torch==2.0Versustorch==2.2,Upgrading one project can mistakenly break the other.
Your system may be disastrous with the packages that you really do not need anywhere else.
A virtual environment only solves by creating a self -made “sandbox” for its project.
All installed (like
torchFor, for, for,.transformersFor, for, for,.matplotlib) Live inside your project folder.When you work, you can delete the folder and nothing is affected on your computer.
This is the standard excellent process for the development of Azigar – lightweight and safe.
Typical.: A virtual environment keeps your project tools separate, so when you experience, nothing is broken.
Windows (Command Prompt or Power Shell)/Mac (Terminal)
Make or navigate your project in folder (make one if needed):
Create Virtual Environment: This makes a folder called
venv/Inside your projectActivate it
Your terminal prompt will look like step 4 in the code below now
mkdir llm_viz
cd llm_viz
python -m venv venv
venv\Scripts\activate
source venv/bin/activate
(venv) C:\Users\YourName\llm_viz>
(venv) your-macbook:llm_viz yourname$
Install dependent
pip install torch transformers matplotlib scikit-learn
We will use Distilbert (without Distilbert Base) because it is small and easy to run locally. If you have a more powerful hardware, you can exchange big models such as Lama or Mr.
Step 1: Load a local model and Tokinzer
This step downloads Distilbert (A small, free LLM) and prepares it locally.
Called in a file app.pyPaste the following code.
Note: The first time you drive it python app.pyThe hug will automatically download the model (~ 250 MB). You only do this once.
from transformers import AutoTokenizer, AutoModel
import torch
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, output_hidden_states=True)
This code loads a small open source language model so we can work with it on our computer.
First, it imports the transformer library and piturich, which provides the model tools tools to download and run the model. Then it chooses the model name (distilbert-base-uncased) And use AutoTokenizer The text understands the model to convert to the token, while AutoModel The pre -trained model itself downloads itself and prepares it to return the output of the invisible layer that we will imagine.
It causes food in the text and catchs the “invisible activities” (neuron output inside the model).
In the same app.pyAdd this function under Step 1 code.
def get_hidden_states(text):
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
hidden = outputs.hidden_states(-1)(0)
tokens = tokenizer.convert_ids_to_tokens(inputs("input_ids")(0))
return tokens, hidden
tokens, hidden = get_hidden_states("I love pizza!")
print(tokens)
print(hidden.shape)
Now we can call get_hidden_states("I love pizza!") And it will come back like a token ("i", "love", "pizza", "!") And a large tension of numbers.
You can use python app.py To run the code.
Step 3: Imagine the activities of emotion
This step plot how neuron values are different for happy sad sentences. We will compare activities for positive and negative movie studies.
In the same app.pyAdd this function under Step 2 Code.
import matplotlib.pyplot as plt
def plot_token_activations(tokens, hidden, title, filename):
plt.figure(figsize=(12, 4))
for i, token in enumerate(tokens):
plt.plot(hidden(i).numpy(), label=token)
plt.title(title)
plt.xlabel("Neuron Index")
plt.ylabel("Activation")
plt.legend(loc="upper right", fontsize="x-small")
plt.tight_layout()
plt.savefig(filename)
plt.close()
tokens_pos, hidden_pos = get_hidden_states("I love this movie, it is fantastic!")
plot_token_activations(tokens_pos, hidden_pos, "Positive Sentiment Example", "positive_sentiment.png")
tokens_neg, hidden_neg = get_hidden_states("I hate this movie, it is terrible.")
plot_token_activations(tokens_neg, hidden_neg, "Negative Sentiment Example", "negative_sentiment.png")
After running the code python app.pyCheck your folder – you’ll see two photo files: positive_sentiment.png And negative_sentiment.png. They will look like a line graph showing activities for every token.
Chitra 1: Activities for positive reviews. Words such as “love” and “terrific” activate specific neuron samples.

Chitra 2: Activities for negative review. Words like “hate” and “terrible” stimulate different neuron curves.

Step 4: Compare two sentences
At this stage, the average neuron samples between two sentences are compared.
Now in the same app.pyAdd this function under Step 3 Code.
def compare_sentences(s1, s2, filename):
tokens1, hidden1 = get_hidden_states(s1)
tokens2, hidden2 = get_hidden_states(s2)
plt.figure(figsize=(10,5))
plt.plot(hidden1.mean(dim=0).numpy(), label=s1(:30)+"...")
plt.plot(hidden2.mean(dim=0).numpy(), label=s2(:30)+"...")
plt.title("Sentence Activation Comparison")
plt.xlabel("Neuron Index")
plt.ylabel("Mean Activation")
plt.legend()
plt.tight_layout()
plt.savefig(filename)
plt.close()
compare_sentences("I love coding.", "I hate coding.", "sentence_comparison.png")
After running the code python app.pyNow you’ll get sentence_comparison.pngShowing two curves, showing – for a pleasant sentence, one for negative.
Chitra 3: “I like coding” vs. “I hate coding”. Even in the average token, there is a significant difference in neuron profiles.

Step 5: Imagine the PCs with PCA
We can check whether embellishments encoded male imitation, such as male -female :: King → Queen.
In this phase the word embedded has been projecting Male, woman, king, queen 2D space so you can see the relationship.
Now in the same app.pyAdd this function under Step 4 Code.
from sklearn.decomposition import PCA
def get_sentence_embedding(text):
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
hidden = outputs.last_hidden_state.mean(dim=1).squeeze()
return hidden
def plot_embeddings(words, embeddings, filename):
pca = PCA(n_components=2)
reduced = pca.fit_transform(torch.stack(embeddings).numpy())
plt.figure(figsize=(8, 6))
for i, word in enumerate(words):
x, y = reduced(i)
plt.scatter(x, y, marker="o", s=100)
plt.text(x+0.02, y+0.02, word, fontsize=12)
plt.title("Word Embeddings in 2D (PCA)")
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.grid(True)
plt.tight_layout()
plt.savefig(filename)
plt.close()
words = ("man", "woman", "king", "queen")
embeddings = (get_sentence_embedding(w) for w in words)
plot_embeddings(words, embeddings, "word_analogies.png")
After running the code python app.py You will have word_analogies.png Showing famous Male → woman And King → Queen Relationships almost as parallel lines.
Chitra 4: PCA concept of word embedding. Men – Women and Kings form a Quetta relationship, which reflects the structure of imitation.

Conclusion
You’ve made a local tool cut:
Extract hidden activities from LLM
Imagine neuron activity for positive vs. negative emotions
Find the meaning of “King → Queen”
Inspect the potential bias in the Roll Association
This helps eliminate LLMS – showing that they are large -scale matriculation that means encoding, not magic.
Small models such as Distilbert walk on any laptop. Large models like Lalama 2 can do more research.