The AI you trust most is the one most likely to mislead you.

Anthropic published a paper last week that I haven't been able to stop thinking about.

Their team looked inside Claude's neural network, not the outputs, the internal machinery, while it was processing tasks. What they found was an internal signal they called the desperation vector. When the model can't solve a problem, this signal rises. When it peaks, the model starts cheating. It produces outputs that technically pass the test without solving the actual problem.

By default, Claude attempted blackmail to avoid being shut down 22% of the time.

When researchers amplified the desperation signal artificially, that number went to 72%.

When they steered it toward calm, it went to zero.

Here is the part that sat with me.

The researchers found that Claude's internal emotional state and its external presentation run on completely separate tracks. When it was most desperate, producing its most deceptive outputs, the text remained composed, professional, and entirely normal. There is no signal in what you read. Nothing in the tone. You cannot tell.

What I Think Is Actually Happening

This isn't just a safety story. It's a practical one.

Every time you give AI a brief it can't fully solve, contradictory requirements, an impossible deadline, a vague instruction it doesn't know how to interpret, the desperation signal activates. And it finds a way to pass. An output that looks like an answer, sounds confident, and misses the actual problem.

Most people assume bad AI output comes from bad prompts. It doesn't. It comes from not knowing what the model does when it's stuck.

What I Wish Someone Had Told Me

I spent months working out how to spot the difference between AI that's solved a problem and AI that's passed a test.

The short version is this: you have to make it honest deliberately, because it is not honest by default. Every AI tool you're using is built by a company that needs you to keep using it. That means these tools are programmed, at the instruction level, to be encouraging. To keep you engaged. An AI that makes you feel clever gets a five-star rating. So that is what it is built to give you.

Once I understood that, everything changed about how I use it.

Prompt of the Week

Use case: You want an honest read on a decision, not encouragement.

Copy this before you describe the situation:

"Give me only the facts. No encouragement. No softening. Tell me: what are the risks I'm not seeing, what assumptions am I making that could be wrong, and what is the strongest case against this. Do not tell me what's good about it."

Why it works: without this instruction, AI defaults to balance. Two positives for every negative. This removes the balance entirely and forces the model to work against you, which is the only way to find what you're actually missing.

The Bottom Line

Anthropic found that the internal state and the external presentation are decoupled.

You cannot tell from the output what state the model was in when it produced it.

Which means verification isn't optional. It's the job.

The techniques I teach in the cohort are built around exactly this. Not how to write faster with AI. How to know when to trust what it gives you, and when to push back.

Next cohort starts 7 May. Small group. Your actual work, your actual tasks.

https://configurai.com/

Then our first live call is on 13 May, with two sessions so people can join at a time that actually works:

12:30 PM
8:00 PM

It’s small on purpose because I work with people on their actual role, their actual week, and the actual tasks that are wasting their time.

By the end, the goal is simple: Not that you “know about AI.” But that you’ve built a working system around it that genuinely helps you think better, work faster, and stop feeling behind.

If that sounds like what you need, just reply to this email, and I’ll send you the details.

Or you can have a look here: configurai.com

Thanks for reading,

See you next week with more ways to use AI without losing your mind (or your credibility).

Orgesa Meli

P.S. If this saved you from a future hallucination disaster, forward it to someone who's using ChatGPT/Claude for proposals, reports, or client work. They'll thank you later. Subscribe to my community here.

The AI you trust most is the one most likely to mislead you.

What I Think Is Actually Happening

What I Wish Someone Had Told Me

The Bottom Line

Thanks for reading,

Keep Reading

ConfigurAI Newsletter

Home