🧠 Module 1: How Large Language Models Think


You've probably noticed this at your work.
Whenever your senior asks an AI tool a question, the answer looks perfect—structured, accurate, almost like it came from an expert.
But whenever you ask the same AI a question, you get something different. Something generic. Something that doesn't feel useful.
And it makes you wonder:
Is my senior using a secret technique?"
"Why does AI understand them better than me?"
"Are they talking to AI in a different language?
The truth is: Yes.
Not English, but Machine English — a structured way of communicating that matches how Large Language Models (LLMs) actually think.
The good news?
It's a skill that you can master too.
A Large Language Model is a type of AI designed to predict and generate text based on patterns it has learned from massive amounts of data.
An LLM is a powerful AI that has learned from a massive amount of text to understand, generate, and respond in human-like language by predicting the most logical next words.
But here's the part most people misunderstand:
An LLM doesn't "understand" English the way humans do—it processes language as patterns, probabilities, and relationships.
When you type something like:
"Explain microservices."
The LLM doesn't "think."
It doesn't "remember."
It doesn't "reason" like a human.
Instead, it does this:
It's a pattern-recognition engine, not a mind.
It works brilliantly when your patterns are clean and struggles when your input is vague or ambiguous.
That's why two people—you and your senior—can ask the same question and get completely different-quality answers.
Your senior is giving the LLM clearer patterns to follow.
This brings us to the next important question.
Prompt Engineering is the practice of structuring your input in a way that the LLM can interpret correctly and respond accurately.
It is simply:
Learning to speak in a format that aligns with how the AI actually processes information. This format is often called "Machine English."
If an LLM is a pattern engine, then prompt engineering is pattern design.
Now that you know what an LLM is and what Prompt Engineering actually means...
Let's answer the big question:
What happens inside an AI when you type a prompt?

Before the model even "looks" at your text, it cleans it.
This includes:
If your text has strange spacing, invisible characters, or misspellings, the model may tokenize it incorrectly.
Your sentence is NOT treated as full words. It is broken into smaller tokens.
Tokenization is the process of converting the raw text you type (Unicode characters) into discrete atomic units, called tokens, which are then mapped to integer IDs the model computes over.
Tokenization isn't just one step—it's a multi-stage pipeline:
<BOS> (Begin of Sequence) or <EOS> (End of Sequence).The choice of algorithm defines the model's vocabulary and its handling of text:

Example:
Microservices scale beautifully!
Becomes:
["Micro", "service", "s", "Ġscale", "Ġbeautiful", "ly", "!"]Tokens control:
Good prompts = clean tokenization.
Once your text is tokenized, each token needs to be converted into a form the model can actually compute with.
Embedding is the process of converting tokens (integer IDs) into dense vector representations—arrays of numbers that capture semantic meaning and relationships between words.
Each token becomes a vector (a long list of numbers), e.g., 786 — 4096 dimensions.
Embeddings are high-dimensional numerical vectors that represent meaning.

Think of embeddings as:
Example: The word database is not stored as text…
It becomes a vector, like:
database → [0.23, -0.88, 0.12, 0.64, -0.11, ...] (dimension = 4096)An embedding with 4,096 values is like a unique point in a 4,096-dimensional universe.
Two embeddings are closer together when their meanings are similar. Look into the Embedding space diagram, you will see King, Queen are together.
Imagine a 4096-dimensional universe:
The LLM reasons by traveling this geometric space, selecting the next token ID that corresponds to the nearest, most logical vector for completing the sequence. Everything the model knows lives in its embeddings.
Because EVERYTHING an LLM does — reasoning, similarity, memory, planning, answering happens inside the embedding space, NOT in English.
This is where language becomes “machine-understandable.”
Embeddings determine:

This is where the magic happens. The transformer architecture processes your embedded tokens through multiple layers.
A transformer is a tower of layers:
+----------------------------+
Layer 1 | Attention + Feed Forward |
+----------------------------+
Layer 2 | Attention + FFN |
+----------------------------+
Layer 3 | Attention + FFN |
+----------------------------+
... | ... |
+----------------------------+
Layer N | Attention + FFN |
+----------------------------+Early layers: spelling, syntax
Middle layers: semantics
Late layers: reasoning, planning, logic
Self-Attention:
Tokens look at each other to understand relationships.
Example:
beautifully modifies scale
service relates to micro
Feed Forward Networks (FFN):
Layer refines meaning, adding more context.
By the time the text passes through all layers, the model understands:
Transformer layers use attention mechanisms to understand relationships between all tokens in your input, allowing the model to consider context from the entire sequence simultaneously.
Each transformer layer consists of:
When you write: "The cat sat on the mat"
The model uses attention to understand:
All of this happens simultaneously across multiple layers, building increasingly sophisticated understanding of your input.
After all transformer layers finish their job, the model produces a raw score for every possible token in its vocabulary.
This score is called a logit.
It's just a number that can be:
Something like:
`!` → 12.4
`.` → 10.1
`good` → 3.2
`scale` → 0.8
`wrong` → -4.7These numbers reflect:
How strongly the model "believes" each token should be the next one.
But they are NOT normalized.
To convert logits into usable probabilities, the model uses a function called Softmax.
Softmax takes raw numbers and transforms them into a probability distribution that:
Example:
Raw logits:
[10, 7, 2]Softmax makes them something like:
[0.95, 0.047, 0.003]Now the AI knows:
It is always picking the most likely next token, based on:
When your senior writes a clean, structured prompt:
When a junior writes a vague prompt:
Let's compare two scenarios.
Sharp Distribution (Model is confident)
Example logits:
`!` → 12.4
`.` → 1.3
`done` → 0.2
`why` → -5.3Softmax →
`!` → 98%
`.` → 1.9%
`done` → 0.1%
`why` → 0.0%The model is almost certain the next token is "!"
This happens with a well-structured prompt.
Flat Distribution (Uncertainty → Hallucination)
Example logits:
`Hawaii` → 1.2
`Kenya` → 1.1
`Mars` → 0.9
`empty` → 0.7Softmax →
`Hawaii` → 31%
`Kenya` → 28%
`Mars` → 24%
`empty` → 17%Here the model doesn't know the answer (because your prompt didn't provide context).
So it is guessing based on vague probabilities.
This is how hallucinations happen.
Once softmax gives probabilities, a decoding strategy decides which token to pick:
1️⃣ Greedy decoding
Choose the highest probability token.
Super accurate but less creative.
2️⃣ Top-k sampling
Pick from the top k tokens.
Example: k = 5 → only 5 candidates allowed.
3️⃣ Top-p (nucleus) sampling
Pick from smallest probability mass ≥ p.
Example: p = 0.9 → include only tokens contributing to 90% of probability.
4️⃣ Temperature
Controls randomness before softmax:
When you write a clear, specific, structured prompt, you help the model:
Sharp logits = high-quality, deterministic answers.
Flat logits = generic, vague, or hallucinated answers.
Output logits are the model's raw "scores" for every possible next token.
Softmax converts these scores into probabilities.
Decoding chooses the final token based on these probabilities — and your prompt shapes all of it.
The model does this one token at a time, looping again and again:
Prediction → Append → Predict next → Append → ...
This continues until:
🎉 That's the complete life cycle of a prompt inside an LLM!
From raw text → tokens → embeddings → reasoning layers → probabilities → output.
Now that you understand how an LLM processes your words — from tokenization to embeddings, attention layers, and output probabilities — you're no longer speaking to AI blindly.
You've taken your first step into Machine English.
But knowing how the machine works is only half the story.
The real skill your senior uses — the one that makes their prompts produce accurate, structured, senior-level answers — comes from something else entirely:
👉 Prompt Structure & Role Engineering.
Module 2 is where you learn:
If Module 1 taught you how the brain works, Module 2 teaches you how to steer the brain.
This is where your results start changing drastically.
So get ready — Module 2: Prompt Structure & Role Engineering will give you the power to make an AI respond exactly like the expert you need… every single time.