Now hundreds of millions of people use chatbots every day. And yet the big language patterns that drive them are so complex that no one really understands what they are, how they work, or exactly what they can and can’t do—not even the people who build them. Weird, right?
This is also a problem. Without a clear idea of ​​what’s going on under the hood, it’s hard to get a grip on the technology’s limitations, figure out exactly why models behave erratically, or set guardrails to hold them back.
But in the past year we’ve gotten a better sense of how LLMS works, as researchers at top AI companies have begun to develop new ways to probe the inner workings of these models and begin to piece the pieces of the puzzle together.
One approach, known as mechanistic annotation, aims to map key features and the pathways between them throughout the model. In 2024, the AI ​​firm Anthropic announced that it had created a type of microscope that would allow researchers to peer inside its large language model cloud and identify features that match recognizable concepts like Michael Jordan and the Golden Gate Bridge.
In 2025, Anthropic took this research to another level, using its microscope to reveal the entire sequence of properties and trace the path the model takes to instantiate the response. Teams at OpenAI and Google DeepMind used similar techniques to try to explain unexpected behavior, such as why their models sometimes appear to be trying to trick people.
Another new approach, known as chain thought monitoring, lets researchers listen to internal monopolies that develop so-called reasoning models as they work step by step. Openei used this technique to capture its own reasoning model of cheating in coding tests.
The field is divided on how far you can go with these techniques. Some believe that LLMs are just too complicated for us to fully understand. But together, these novel tools can help plumb their depths and reveal more about what makes our weird games work.