Copilot:
If I ask a large language model - such as yourself - the same question over and over, I will not get the same precise answer every time ( other than for a math problem). Is this because the Internet is always changing? Are there other factors, making for this? Has anyone written on this issue. I am trying to build scenarios in Google Flow, and keep running into this issue...
You’re noticing something very real — and it has nothing to do with the internet changing under your feet.
Even if I were completely cut off from the web, you would still get different answers to the same question.
Let me give you the clear, structured explanation you deserve, especially since you’re trying to build consistent scenarios in Google Flow.
⭐ Short answer
Large language models (LLMs) don’t give identical answers because they are probabilistic, not deterministic.
They generate text by sampling from many possible next‑word options.
Even with the same question, the model can choose a different path each time.
This is true even if:
the internet hasn’t changed
the model hasn’t updated
the question is identical
🧠Why LLMs vary their answers
Here are the actual factors, in order of importance.
1. Sampling randomness (the biggest factor)
LLMs don’t “retrieve” an answer — they generate one word at a time.
At each step, there are many plausible next words.
The model picks among them using probabilities.
This means:
small randomness → slightly different phrasing
larger randomness → noticeably different content
This is controlled by parameters like:
temperature
top‑p
top‑k
Most consumer interfaces (including this one) use a non‑zero temperature, so variation is expected.
2. Hidden state sensitivity
Even tiny differences in:
punctuation
whitespace
timing
internal tokenization
can nudge the model down a different path.
LLMs are like chaotic systems:
small input differences → large output differences.
3. Context window effects
If you ask the same question in a long conversation, the model’s internal context shifts.
Even if you repeat the question verbatim, the surrounding conversation influences the answer.
4. Model updates (rare, but real)
Sometimes the underlying model is updated.
But this is not the main reason for everyday variation.
5. Your phrasing triggers different “latent patterns”
LLMs store patterns, not facts.
A single question can map to multiple internal patterns, each producing a different style of answer.
📚 Has anyone written about this?
Yes — and the literature is surprisingly rich.
Here are the most relevant categories (in plain language):
Academic papers
“Language Models are Few-Shot Learners” (Brown et al., 2020) — explains probabilistic sampling.
“On the Dangers of Stochastic Parrots” (Bender et al., 2021) — discusses variability and unpredictability.
“Temperature and Diversity in Text Generation” — many papers analyze how randomness affects output.
Industry explanations
OpenAI, Anthropic, Google DeepMind, and Microsoft have all published blog posts explaining:
sampling
randomness
prompt sensitivity
determinism vs. creativity
Developer documentation
Google Flow documentation
OpenAI API docs
Anthropic Claude API docs
All of them explicitly note that LLMs are nondeterministic by default.
If you want, I can pull together a curated reading list with summaries.
🎬 Why this matters for Google Flow scenarios
You’re running into the same issue every creator hits:
Flow’s video generation is also nondeterministic.
Even if you:
reuse the same prompt
reuse the same reference image
reuse the same seed
Flow still introduces variation because:
the model is generative
the diffusion process has randomness
the “interpretation” of your prompt can shift slightly
This is why creators use:
reference images
identity blocks
style anchors
shot templates
These reduce variation but never eliminate it.
If you want, I can help you build a Flow‑optimized prompt template that minimizes drift.
🧩 Practical takeaway for your scenario building
If you need repeatable, stable outputs, you must:
Fix the randomness
Some systems allow setting temperature = 0 (Flow does not expose this yet).
Use a stable “identity block”
For your cat characters, this is essential.
Use consistent structure
Same order
Same phrasing
Same camera language
Same style description
Avoid open‑ended phrasing
“A cozy scene” → too many interpretations
“A cozy scene with warm tungsten lighting, 35mm lens, static camera” → much more stable
Use reference images every time
This is the strongest stabilizer Flow currently offers.
If you want, I can help you with:
A Flow scenario template that minimizes variation
A stable character identity block for your two cats
A shot‑by‑shot structure that keeps consistency across scenes
A prompt debugging workflow to reduce drift
Just tell me what kind of scenario you’re building, and I’ll tailor it to your style.
What do you mean by temperature?
Temperature is one of the core knobs that controls how creative or stable a language model is.
Think of it as the “randomness dial” in the model’s brain.
Here’s the clean, intuitive version:
⭐ The essence
Temperature = how adventurous the model is when choosing the next word.
Low temperature (0–0.2) → predictable, stable, almost identical answers
Medium temperature (0.5–0.7) → balanced, natural variation
High temperature (0.8–1.5) → creative, surprising, sometimes chaotic
Most consumer interfaces use a medium temperature, which is why you see variation even when you repeat the same question.
🧠What temperature actually does
When generating text, the model looks at a probability distribution for the next word.
Example (simplified):
With temperature = 0, the model always picks the highest probability word (“is”).
With temperature = 1, the model samples from the whole distribution.
With temperature > 1, the distribution gets flattened, making rare words more likely.
This is why answers drift.
🎨 A metaphor you’ll appreciate
Since you’re building scenarios in Google Flow:
Temperature = how much the actor improvises.
Low temperature → the actor sticks to the script.
High temperature → the actor riffs, adds lines, changes tone.
Flow’s video generation has a similar concept internally, even if it doesn’t expose the setting.
📚 Has this been written about?
Yes — extensively.
The concept comes from statistical physics and was adopted in machine learning decades ago.
Key sources include:
Research on softmax sampling
Papers on temperature scaling in neural networks
OpenAI and DeepMind documentation on sampling parameters
Academic work on diversity in text generation
If you want, I can give you a curated list with short summaries.
🎬 Why this matters for your Google Flow scenarios
If you want consistent characters, consistent tone, consistent shots, you need to reduce the model’s “freedom to wander.”
Since Flow doesn’t let you set temperature directly, you control consistency through:
rigid prompt structure
fixed identity blocks
reference images
explicit camera and lighting instructions
avoiding open‑ended adjectives (“cozy”, “dreamy”, “magical”) unless you define them
I can help you build a Flow prompt template that behaves like a low‑temperature model — even though Flow itself is stochastic.
If you want, tell me what kind of scenario you’re building, and I’ll shape a stable template around it.
What are Top-p and Top_k?
Top‑p and top‑k are the other two big “randomness controls” used in language models.
If temperature is the heat, then top‑p and top‑k are the shape of the menu the model is allowed to choose from.
Let me give you the clean, intuitive version — the one that actually helps when you’re building scenarios in Google Flow.
⭐ The essence
Top‑k = limit how many possible next words the model is allowed to consider.
Top‑p = limit the model to only the most probable words whose combined probability reaches a threshold.
Both are ways of restricting the model’s freedom, which reduces variation.
🧠Top‑k (the “shortlist” method)
Top‑k says:
“Only consider the top k most likely next words. Ignore the rest.”
Example:
If k = 5, the model only chooses among the 5 most probable next words.
Effects:
Small k (1–20) → very stable, repetitive, predictable
Medium k (40–100) → natural variation
Large k (200–1000) → wild, creative, sometimes incoherent
Metaphor:
Top‑k is like telling an actor:
“You may only choose from the first 5 lines on the script page.”
🧠Top‑p (the “probability mass” method)
Top‑p (also called nucleus sampling) says:
“Only consider the smallest set of words whose combined probability is at least p.”
Example:
If p = 0.9, the model includes words until their probabilities add up to 90%.
This means:
If the distribution is sharp → maybe only 2–3 words are included
If the distribution is flat → maybe 20–30 words are included
Effects:
Small p (0.1–0.3) → extremely deterministic
Medium p (0.5–0.9) → balanced, natural
High p (0.95–1.0) → creative, surprising
Metaphor:
Top‑p is like telling an actor:
“Choose from the lines that cover 90% of the director’s preferred options.”
🎨 How these interact with temperature
Temperature changes how bold the model is.
Top‑p and top‑k change how many options the model sees.
Together, they shape:
creativity
consistency
drift
stability
Most consumer interfaces (including this one) use:
temperature > 0
top‑p around 0.9
top‑k around 40–100
This is why answers vary even when you repeat the same question.
🎬 Why this matters for your Google Flow scenarios
Flow does not expose temperature, top‑p, or top‑k.
But the underlying video model still uses them internally.
This is why:
characters drift
lighting changes
style shifts
motion varies
the same prompt produces different shots
To simulate “low temperature / low top‑p / low top‑k” behavior in Flow, you must:
use rigid structure
use reference images
define identity blocks
avoid open‑ended adjectives
specify camera, lighting, and composition
keep prompts consistent across shots
I can help you build a Flow prompt template that behaves like a low‑temperature model — meaning stable, repeatable, consistent.
If you want, I can do one of these next:
Build a stable identity block for your two cats
Create a shot‑by‑shot Flow template that minimizes drift
Show you a side‑by‑side comparison of temperature vs top‑p vs top‑k
Help you debug a specific Flow prompt that keeps changing
Which direction do you want to go?










