How Large Language Models Actually Work, No Math Required
Curious how ChatGPT or Claude actually works? This lesson explains tokens, training, and inference in plain English — no math, no jargon, just clarity.

You typed a question into ChatGPT or Claude. Two seconds later, something coherent appeared, an answer that felt almost like talking to someone who understood you. Here's what actually happened: a large language model predicted the most likely next word, billions of times in a row, until it stopped. In this lesson, you'll find out exactly how, no math, no computer science background required.
If you're not sure where LLMs fit in the broader AI landscape, the AI family tree lesson covers that first.
Key Takeaways
- LLMs don't read words, they read tokens, which are chunks of text smaller than a word
- Training means processing billions of words and learning to predict what comes next, not memorizing, not understanding
- When you send a message, the model predicts the most likely next token, one at a time, until it stops
- Whether an LLM truly "understands" is an open question, even researchers disagree
What's a token? (Not what you think)
Here's the thing about tokens, they're not words.
When text enters a large language model, it doesn't see "Hello, how are you?" as four words. It sees something closer to six or seven pieces: Hello / , / how / are / you / ?. Each piece is a token, a chunk of text the model treats as one unit.
This matters more than it first appears. Take the word "unhappiness." You see one word. The model sees three tokens: un / happi / ness. A short, common word like "cat" is usually one token. A rare technical word might get split into five or six.

Why does this matter for you? Because everything in a language model, what it can read, remember, and generate, is measured in tokens, not words. When you see a model's "context window" listed as 200,000 tokens, that's closer to 150,000 words. Pricing, speed, and limits all trace back to token counts.
A practical rule of thumb: one token is roughly four characters, or about 0.75 words. So 1,000 words is approximately 1,333 tokens. This comes up constantly once you start working with AI tools.
How does an LLM learn from text?
If tokens are the alphabet of a language model, training is where it learns the language.
Here's the most accurate, simple description: a large language model reads an enormous amount of text, books, websites, code, articles, conversations, and learns to predict what comes next.
Not memorize. Not understand. Predict.
Think of it this way. Imagine you've read every mystery novel ever written. After enough of them, you could finish almost any sentence: "The detective walked into the dimly lit room and noticed the ___." You don't understand crime. You haven't memorized every book. You've just absorbed enough patterns to know what tends to follow what.
That's training. The model adjusts billions of internal parameters, think of them as tiny dials, every time it makes a wrong prediction. Billions of corrections later, it gets very good at next-token prediction.
The scale involved is genuinely hard to picture. GPT-3 was trained on roughly 500 billion tokens. The model itself has 175 billion parameters. That scale, not any special magic, is what makes the behavior feel intelligent.
One more thing worth knowing: training is extraordinarily expensive and happens once. After training, the model's parameters are frozen. They don't change when you use it. The frozen result of training is what you're talking to.
What happens when you type into ChatGPT?
This is the part called inference, what happens every time you actually use the model.
When you type into ChatGPT or Claude, the inference process starts immediately. Say you type: "What's the best way to start learning guitar?"
The model doesn't search a database. It doesn't retrieve a stored answer. It takes your entire message as context and asks one question: given everything I've learned and everything you just said, what is the most likely next token?
It picks one. Then it takes your message plus that token and asks again. Then again. It does this, potentially thousands of times per response, until it predicts a token that means "stop."
That's the whole mechanism. Token by token, one at a time, until done. Every response you've ever gotten from any LLM was built this way.
This is why LLMs can feel like they're thinking through a response step by step, in a sense, they are. Each token shapes what comes next. The response builds itself, each piece influenced by everything before it.
It's also why they can be confidently wrong. If the most statistically likely next token is incorrect, the model still picks it. It has no way to check facts against the world. It only knows patterns.
Some systems do add that capability, they can retrieve real information before generating a response. That's a technique called RAG, and it's a separate lesson.
The question everyone eventually asks
At some point, everyone who uses a language model asks the same thing: does it actually understand what it's saying?
The honest answer: we don't fully know. I find this genuinely interesting, not as a hedge, but as the most accurate position available.
What we can say is this, the model behaves as if it understands. It reasons through problems, catches contradictions, follows multi-step instructions, and adjusts its tone based on context. Whatever is happening inside, the outputs are often remarkable.
What we can't say with confidence is whether something is happening beyond very sophisticated pattern matching. Whether there's anything it is like to be a language model, any experience at all, is a genuinely open question in AI research. Researchers actively disagree.
Anyone who tells you confidently that LLMs definitely do or definitely don't understand is overreaching. The behavior is real. The mechanism is known. The nature of that mechanism is still debated.
What comes after LLMs, giving them tools, memory, and the ability to act in the world, is where AI agents come in. That's the next direction to explore after this course.
For now, here's what's worth holding onto: LLMs predict. They do it brilliantly, at enormous scale, based on patterns learned from more text than any human could read in a thousand lifetimes. And that prediction, done well enough, produces something that feels remarkably like understanding.
Your Task
Count your tokens
Go to claude.ai or chat.openai.com. Type any sentence, whatever comes to mind.
Try to estimate how many tokens it might be. Use the rule of thumb: roughly one token per four characters, or 0.75 tokens per word.
Then read the response. Notice what just happened: the model didn't retrieve that answer from anywhere. It predicted it, one token at a time, until it stopped.
That's the whole mechanism. You've now seen it in action.
Done? You've completed Lesson 01.03. Next up: Chatbot vs AI Agent — What's Actually the Difference →
This is part of the Getting Started learning path.
FAQ
Common questions
Finished reading?
Mark it complete to track your progress through the path.
Comments (0)
Be the first to leave a comment.