prompt-engineering llm ai machine-learning technology

🧠 Module 1: How Large Language Models Think

November 19, 202512 min read

🧠 Module 1: How Large Language Models Think | Anil Gurindapalli

You've probably noticed this at your work.

Whenever your senior asks an AI tool a question, the answer looks perfect—structured, accurate, almost like it came from an expert.

But whenever you ask the same AI a question, you get something different. Something generic. Something that doesn't feel useful.

And it makes you wonder:

Is my senior using a secret technique?"

"Why does AI understand them better than me?"

"Are they talking to AI in a different language?

The truth is: Yes.

Not English, but Machine English — a structured way of communicating that matches how Large Language Models (LLMs) actually think.

The good news?

It's a skill that you can master too.

❓ So... What exactly is an LLM?

A Large Language Model is a type of AI designed to predict and generate text based on patterns it has learned from massive amounts of data.

An LLM is a powerful AI that has learned from a massive amount of text to understand, generate, and respond in human-like language by predicting the most logical next words.

But here's the part most people misunderstand:

An LLM doesn't "understand" English the way humans do—it processes language as patterns, probabilities, and relationships.

When you type something like:
"Explain microservices."

The LLM doesn't "think."
It doesn't "remember."
It doesn't "reason" like a human.

Instead, it does this:

Breaks your text into tokens
Converts those tokens into embeddings (mathematical meaning)
Passes them through transformer layers
Calculates probabilities for the next token
Generates the most likely sentence based on your input pattern

It's a pattern-recognition engine, not a mind.

It works brilliantly when your patterns are clean and struggles when your input is vague or ambiguous.

That's why two people—you and your senior—can ask the same question and get completely different-quality answers.

Your senior is giving the LLM clearer patterns to follow.

This brings us to the next important question.

❓ What is Prompt Engineering?

Prompt Engineering is the practice of structuring your input in a way that the LLM can interpret correctly and respond accurately.

It is simply:

Learning to speak in a format that aligns with how the AI actually processes information. This format is often called "Machine English."

If an LLM is a pattern engine, then prompt engineering is pattern design.

🧩 How Does an LLM Actually Process Your Input?

Now that you know what an LLM is and what Prompt Engineering actually means...

Let's answer the big question:

What happens inside an AI when you type a prompt?

How LLMs Process Language—The Steps From Prompt to Prediction

1. Normalization—Clean your Input

Before the model even "looks" at your text, it cleans it.

This includes:

Unicode normalization (NFC / NFKC)
Removing invisible characters
Standardizing spacing
Converting weird punctuation

Why it matters?

If your text has strange spacing, invisible characters, or misspellings, the model may tokenize it incorrectly.

2. Tokenization—Breaking Text Into Pieces

Your sentence is NOT treated as full words. It is broken into smaller tokens.

Tokenization is the process of converting the raw text you type (Unicode characters) into discrete atomic units, called tokens, which are then mapped to integer IDs the model computes over.

🛠️ The High-Level Tokenization Pipeline

Tokenization isn't just one step—it's a multi-stage pipeline:

Normalization: Clean the raw text (e.g., handling invisible Unicode characters like zero-width spaces, case folding).
Pre-tokenization: Simple splits, often on whitespace or punctuation, to break text into rough chunks.
Subword Segmentation (Tokenization): The core process, where algorithms like BPE split chunks into subwords (tokens).
Mapping to IDs: Look up the resulting tokens in the model's vocabulary (vocab) and assign their corresponding integer IDs.
Special Tokens: Add markers like <BOS> (Begin of Sequence) or <EOS> (End of Sequence).

The Core Algorithms: How Text Gets Split

The choice of algorithm defines the model's vocabulary and its handling of text:

Core Algorithms—How Text Gets Split by LLMs

Example:

Microservices scale beautifully!

Becomes:

bytes

["Micro", "service", "s", "Ġscale", "Ġbeautiful", "ly", "!"]

Why it matters?

Tokens control:

how the model interprets meaning
how it clusters concepts
how accurate the answer will be

Good prompts = clean tokenization.

3. Embedding Layer —Converting Tokens to Mathematical Meaning

Once your text is tokenized, each token needs to be converted into a form the model can actually compute with.

Embedding is the process of converting tokens (integer IDs) into dense vector representations—arrays of numbers that capture semantic meaning and relationships between words.

Each token becomes a vector (a long list of numbers), e.g., 786 — 4096 dimensions.

Embeddings are high-dimensional numerical vectors that represent meaning.

Embeddings—Converting Tokens to Mathematical Meaning

Think of embeddings as:

Coordinates of meaning in a massive conceptual space
Semantic fingerprints that encode relationships
Mathematical representations the AI uses internally

Example: The word database is not stored as text…

It becomes a vector, like:

bytes

database → [0.23, -0.88, 0.12, 0.64, -0.11, ...]  (dimension = 4096)

An embedding with 4,096 values is like a unique point in a 4,096-dimensional universe.

Two embeddings are closer together when their meanings are similar. Look into the Embedding space diagram, you will see King, Queen are together.

Meaning in Geometric:

Imagine a 4096-dimensional universe:

Concepts like “Microservice,” “API,” and “Kubernetes” are all points close together.
Concepts like “Performance,” “Scale,” and “Growth” form a separate, related cluster.

The LLM reasons by traveling this geometric space, selecting the next token ID that corresponds to the nearest, most logical vector for completing the sequence. Everything the model knows lives in its embeddings.

Why it Matters?

Because EVERYTHING an LLM does — reasoning, similarity, memory, planning, answering happens inside the embedding space, NOT in English.

This is where language becomes “machine-understandable.”

Embeddings determine:

How the model understands your prompt
How it clusters similar concepts
What relationships it sees (API ↔ microservice, auth ↔ security)
How it resolves meaning, analogies, context
How it retrieves memories in RAG
How hallucinations happen when vectors are far apart

Practical Impact on Prompt Engineering

4. Transformer Layers—The AI’s Thinking Process

This is where the magic happens. The transformer architecture processes your embedded tokens through multiple layers.

A transformer is a tower of layers:

bytes

      +----------------------------+
Layer 1 | Attention + Feed Forward |
      +----------------------------+
Layer 2 | Attention + FFN         |
      +----------------------------+
Layer 3 | Attention + FFN         |
      +----------------------------+
  ...   | ...                     |
      +----------------------------+
Layer N | Attention + FFN         |
      +----------------------------+

Early layers: spelling, syntax

Middle layers: semantics

Late layers: reasoning, planning, logic

Each layer includes:

Self-Attention:

Tokens look at each other to understand relationships.

Example:

beautifully modifies scale

service relates to micro

Feed Forward Networks (FFN):

Layer refines meaning, adding more context.

By the time the text passes through all layers, the model understands:

the sentence
the tone
your intention
relationships between concepts

Transformer layers use attention mechanisms to understand relationships between all tokens in your input, allowing the model to consider context from the entire sequence simultaneously.

How Transformers Work

Each transformer layer consists of:

Self-Attention: The model looks at all tokens at once and determines which ones are most relevant to each other
Feed-Forward Networks: Processes the attended information
Layer Normalization: Stabilizes the learning process
Residual Connections: Helps information flow through deep networks

The Power of Attention

When you write: "The cat sat on the mat"

The model uses attention to understand:

"cat" relates to "sat" (what did what)
"on" connects "sat" and "mat" (where)
"The" modifies "cat" (which cat)

All of this happens simultaneously across multiple layers, building increasingly sophisticated understanding of your input.

5. Output Logits — Predicting the Next Word

After all transformer layers finish their job, the model produces a raw score for every possible token in its vocabulary.

This score is called a logit.

👉 A logit is not a probability.

It's just a number that can be:

positive
negative
large
small

Something like:

bytes

`!` → 12.4
`.` → 10.1
`good` → 3.2
`scale` → 0.8
`wrong` → -4.7

These numbers reflect:

How strongly the model "believes" each token should be the next one.

But they are NOT normalized.

How Logits Become Probabilities (Softmax)

To convert logits into usable probabilities, the model uses a function called Softmax.

Softmax takes raw numbers and transforms them into a probability distribution that:

is between 0 and 1
sums to 1 (100%)

Example:

Raw logits:

bytes

[10, 7, 2]

Softmax makes them something like:

bytes

[0.95, 0.047, 0.003]

Now the AI knows:

Token 1 has a 95% chance
Token 2 has a 4.7% chance
Token 3 has a 0.3% chance

Why This Matters: The AI Doesn't Pick Random Words

It is always picking the most likely next token, based on:

the prompt
training data
embeddings
context
patterns in your wording
constraints
decoding method

When your senior writes a clean, structured prompt:

Logits become sharp
Probabilities become clear
Output becomes high-quality

When a junior writes a vague prompt:

Logits are messy
Many tokens have similar scores
The AI becomes uncertain
Output becomes generic or wrong

Understanding "Sharp vs Flat" Logit Distributions

Let's compare two scenarios.

Sharp Distribution (Model is confident)

Example logits:

bytes

`!` → 12.4
`.` → 1.3
`done` → 0.2
`why` → -5.3

Softmax →

bytes

`!` → 98%
`.` → 1.9%
`done` → 0.1%
`why` → 0.0%

The model is almost certain the next token is "!"

This happens with a well-structured prompt.

Flat Distribution (Uncertainty → Hallucination)

Example logits:

bytes

`Hawaii` → 1.2
`Kenya` → 1.1
`Mars` → 0.9
`empty` → 0.7

Softmax →

bytes

`Hawaii` → 31%
`Kenya` → 28%
`Mars` → 24%
`empty` → 17%

Here the model doesn't know the answer (because your prompt didn't provide context).

So it is guessing based on vague probabilities.

This is how hallucinations happen.

6. How Decoding Uses Logit Probabilities

Once softmax gives probabilities, a decoding strategy decides which token to pick:

1️⃣ Greedy decoding

Choose the highest probability token.

Super accurate but less creative.

2️⃣ Top-k sampling

Pick from the top k tokens.

Example: k = 5 → only 5 candidates allowed.

3️⃣ Top-p (nucleus) sampling

Pick from smallest probability mass ≥ p.

Example: p = 0.9 → include only tokens contributing to 90% of probability.

4️⃣ Temperature

Controls randomness before softmax:

Low temp (0–0.3): logits get sharper → precise answers
High temp (0.7–1.2): logits flatten → more creative answers

Why Prompt Engineering Affects Logits

When you write a clear, specific, structured prompt, you help the model:

Focus on relevant embeddings
Reduce ambiguous interpretations
Narrow down attention pathways
Stabilize transformer reasoning
Produce sharper logits

Sharp logits = high-quality, deterministic answers.

Flat logits = generic, vague, or hallucinated answers.

Output logits are the model's raw "scores" for every possible next token.
Softmax converts these scores into probabilities.
Decoding chooses the final token based on these probabilities — and your prompt shapes all of it.

7. Repeat Until Finished

The model does this one token at a time, looping again and again:

Prediction → Append → Predict next → Append → ...

This continues until:

EOS (end of sentence token)
Length limit
Or format constraints stop it

🎉 That's the complete life cycle of a prompt inside an LLM!

From raw text → tokens → embeddings → reasoning layers → probabilities → output.

🚀 What's Next? A Sneak Peek Into Module 2

Now that you understand how an LLM processes your words — from tokenization to embeddings, attention layers, and output probabilities — you're no longer speaking to AI blindly.

You've taken your first step into Machine English.

But knowing how the machine works is only half the story.

The real skill your senior uses — the one that makes their prompts produce accurate, structured, senior-level answers — comes from something else entirely:

👉 Prompt Structure & Role Engineering.

Module 2 is where you learn:

how to set the right persona for the AI
how roles change the model's reasoning
how experts write role prompts that activate the correct knowledge zones
how a simple sentence like "Act as a Cloud Architect with 10+ years of experience" can completely transform the quality of the output
and how you can use the same technique to get reliable, expert-level answers every time

If Module 1 taught you how the brain works, Module 2 teaches you how to steer the brain.

This is where your results start changing drastically.

So get ready — Module 2: Prompt Structure & Role Engineering will give you the power to make an AI respond exactly like the expert you need… every single time.

#❓ So... What exactly is an LLM?

#❓ What is Prompt Engineering?

#🧩 How Does an LLM Actually Process Your Input?

#1. Normalization—Clean your Input

#Why it matters?

#2. Tokenization—Breaking Text Into Pieces

#🛠️ The High-Level Tokenization Pipeline

#The Core Algorithms: How Text Gets Split

#Why it matters?

#3. Embedding Layer —Converting Tokens to Mathematical Meaning

#Meaning in Geometric:

#Why it Matters?

#4. Transformer Layers—The AI’s Thinking Process

#Each layer includes:

#How Transformers Work

#The Power of Attention

#5. Output Logits — Predicting the Next Word

#👉 A logit is not a probability.

#How Logits Become Probabilities (Softmax)

#Why This Matters: The AI Doesn't Pick Random Words

#Understanding "Sharp vs Flat" Logit Distributions

#6. How Decoding Uses Logit Probabilities

#Why Prompt Engineering Affects Logits

#7. Repeat Until Finished

#🚀 What's Next? A Sneak Peek Into Module 2

❓ So... What exactly is an LLM?

❓ What is Prompt Engineering?

🧩 How Does an LLM Actually Process Your Input?

1. Normalization—Clean your Input

Why it matters?

2. Tokenization—Breaking Text Into Pieces

🛠️ The High-Level Tokenization Pipeline

The Core Algorithms: How Text Gets Split

Why it matters?

3. Embedding Layer —Converting Tokens to Mathematical Meaning

Meaning in Geometric:

Why it Matters?

4. Transformer Layers—The AI’s Thinking Process

Each layer includes:

How Transformers Work

The Power of Attention

5. Output Logits — Predicting the Next Word

👉 A logit is not a probability.

How Logits Become Probabilities (Softmax)

Why This Matters: The AI Doesn't Pick Random Words

Understanding "Sharp vs Flat" Logit Distributions

6. How Decoding Uses Logit Probabilities

Why Prompt Engineering Affects Logits

7. Repeat Until Finished

🚀 What's Next? A Sneak Peek Into Module 2