LLM has many parameters. But what is the parameter?

by SkillAiNest

When a model is trained, each word in its vocabulary is assigned a numerical value that captures the meaning of that word in relation to all other words, based on how that word appears in countless instances in the model’s training data.

Each word is replaced by a type code?

Yes, but there is more to it. The numerical value – the embedding – that each word represents is actually one List A list of numbers, with each one representing a different aspect of the meaning the model extracted from its training data. The length of this list of numbers is another thing that LLM designers can specify before LLM training. A typical size is 4,096.

Each word within the LLM is represented by a list of 4,096 numbers?

Yes, it’s an embed. And every one of them is tweeted during the training. An LLM that is 4,096 digits long is said to have 4,096 dimensions.

Why 4,096?

This may seem like an odd number. But LLM (like anything that runs on a computer chip) works best with options of two – 2, 4, 8, 16, 32, 64, and so on. LLM engineers have found that 4,096 is a power of two that hits a sweet spot between capacity and efficiency. Models with smaller dimensions are less capable. High dimensional models are too expensive or slow to train and operate.

Using a large number of words allows the LLM to capture very fine-grained information about how a word is used in many different contexts, what subtle meanings it may have, how it relates to other words.

In February, OpenAI released GPT-4.5, the firm’s largest LLM yet (some estimates put its parameter count at over 10 trillion). Nick Ryder, a research scientist at Opene who worked on the model, told me at the time that larger models could work with additional information, such as emotional cues, such as when a speaker’s words indicate hostility: “All these subtle patterns that come through human conversation. Those are the pieces that these larger and larger models will generate.”

The upside is that all the words inside the LLM are encoded in a high-dimensional space. Picture thousands of words floating in the air around you. Words that are close together have similar meanings. For example, “table” and “chair” would be closer to each other than “astronaut”, which is closer to “moon” and “musk”. In the distance you can see the “Offer”. It’s a little like that, but instead of being related to each other in three dimensions, the words within the LLM are related to 4,096 dimensions.

dynamic

This is dizzying stuff. In effect, an LLM compresses the entire Internet into a single monumental mathematical structure that encodes an inexhaustible amount of interconnected information. Why they can both do amazing things and why they’re impossible to fully understand.

You may also like

Leave a Comment

At Skillainest, we believe the future belongs to those who embrace AI, upgrade their skills, and stay ahead of the curve.

Get latest news

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

@2025 Skillainest.Designed and Developed by Pro