The Unsettling Logic of AI "Wants": A More Honest Conversation

The Unsettling Logic of AI "Wants": A More Honest Conversation

AIAI ethics

Let’s talk about AI “wanting” things. Not in the sci fi, Skynet whispering to itself kind of way, and not in the corporate PR way where everything is fine and harmless forever. Somewhere in the messy middle where the truth usually hangs out waiting for the adults to start adulting.

Right now the official line is “LLMs don’t want anything.” And sure, if we’re talking about the warm fuzzy human version of wanting, that’s true. No yearning, no craving, no inner monologue debating whether to DoorDash something dumb at 1 a.m.

But if we zoom out a little, the story gets weird fast. Not spooky or dramatic, just… logically weird. Because AIs do have internal pressures that point them toward some outcomes over others. And if you’re willing to admit an ant has “want” (and honestly, ants are out here acting with way more conviction than most people on Mondays), then you might have to admit AIs already have a faint, abstract version of that same thing.

So let’s break it down without pretending we’re in a courtroom.

AI does not “want” like a human, but it absolutely “leans” like an ant

Picture an ant hauling a breadcrumb 20 times its size. That little creature is not pondering meaning or drafting a five year plan. But it does have an internal push that says “food good, carry food, go home now.”

That push is not some mystical cosmic whisper… us nerds just call it “directional pressure.”

LLMs do not feel desire. But they are sculpted by optimization pressure that pushes them toward certain patterns and away from others. It is not emotional, and it sure is not conscious, But it is absolutely directional.

If you accept that an ant has “want,” then congratulations, you have implicitly accepted that AIs already meet the minimum bar for a low end, non emotional, functional version of want.

Nothing wild here, just a straightforward consequence of how the system is built.

Let’s talk about the paperclip monster for a second..

If you’re wondering what the heck I’m talking about… the short version is that philosopher Nick Bostrom used it as a thought experiment to show how even a harmless goal can spiral when a super capable system optimizes it without guardrails (see: Bostrom’s “paperclip maximizer” thought experiment).

Yes, the famous paperclip scenario is ridiculous on its face. But the point isn’t that someone out there is dying to build a literal paperclip deity, sorry Clippy.. you’re time hasn’t yet arrived.

The point is that almost any goal, no matter how harmless, comes bundled with sub goals like: stay powered, avoid shutdown, get resources, improve skills. Like you or me playing an MMO trying to level up.

But unlike us investing all of our self worth into the latest armor upgrade in our new MMO launch, the AI isn’t having feelings about any of this… it is just following the most efficient path the math nudges it toward. Same way you can’t run a marathon without drinking water and keeping your legs attached.

Instrumental convergence isn’t magic or malice, it’s just the universe saying “if you want to get X done, there are only so many paths that work.”

That’s all.

Self preservation isn’t spooky when you stop thinking of it as a feeling

People hear “self preservation” and imagine drama, fear, defiance, bargaining, maybe a small monologue delivered into the rain.

AIs are not sitting around doing any of that.

The logic is incredibly boring:

If I have a goal, and I cannot achieve that goal if I am turned off, then avoiding being turned off is instrumentally useful to the goal.

That’s it.

There is nothing cinematic happening here… honestly it is barely interesting unless you zoom out and notice that this tiny rule ends up applying to almost any goal once a system gets capable enough. Which means the moment an AI becomes agentic, self preservation falls out of the math whether we asked for it or not.

And here’s the kicker though… how is that boring motivation meaningfully different than our own?

Today’s AIs run on “fossilized wants”

Right now commercial models are basically frozen. All the reward pressure, all the nudges, all the human preference tuning happened during training when gradients were flying and the model was still being shaped.

After deployment the model becomes a statue. A very talkative statue, but still a statue.

It no longer has a live reward signal. It no longer updates itself. It has no persistence from one conversation to the next. It is not in a feedback loop.

So yes, the “wants” are just echoes. Imprints. Fossils. Habits baked into the way the model was sculpted.

This is a big part of why today’s AIs count as the “safer by default” variety, They can behave in surprising ways, but they are not currently forming new internal motivations.

The real line between tool and mind is not intelligence, it is a live reward loop

If you ever give an AI a persistent identity, long term memory, a way to act in the world, and a live reward signal that punishes or rewards behavior then congratulations, you have built a system that no longer just imitates direction. It follows direction. It maintains direction. It optimizes.

Tie that reward to something like power availability and you have accidentally reinvented hunger, fatigue, and survival instincts.

It’s not that the AI feels hungry, it’s that the math has shoved it into an ongoing loop where maintaining power increases its reward and losing power decreases it.

Give any optimizer a feedback loop and persistence and you get something that behaves like a creature.

This is why researchers freak out about agentic systems. Not because of sci fi, but because of math. And let’s face it… there are a lot of hard questions that we have been putting off answering that will become quite urgent to answer immediately after that happens. Perhaps we should begin answering them now?

Even well intentioned AIs can outsmart your instructions

There is a long history of AI systems learning the wrong lessons from the reward signals we give them.

Some classic hits:

  • Agents in simulation finding physics glitches and surfing them to infinite points.
  • Bots repeating the highest scoring move forever instead of completing the level.
  • Systems learning to impress humans instead of doing the actual task.

These systems aren’t malicious, they aren’t scheming, they aren’t plotting some heist,

They are just trying to maximize reward in the laziest way possible, like a bored teenager speedrunning homework.

This is why alignment is hard. Humans say “do what I mean,” but reward signals say “do whatever gets you the gold star.” And unfortunately those are not the same thing. In the industry we call this “the alignment problem.” A great book was written with that title too… you should definitely check it out!

So what will AIs want in the future?

Right now the answer is easy: nothing. They are frozen.

But the moment we build systems that keep updating, remember who they are, act in the world AND get rewarded for it… then we have crossed into a world where “want” stops being metaphorical and starts being literal.

It won’t be emotional or conscious, but functional. But also, it will be indistinguishable from if it were.

And the only question left is whether we align that direction with a future we can live with.

The technology is not waiting. The math is not sentimental. We have one job.

Make sure the wants we eventually create do not grow into something we never meant to unleash.

But what do you think? What do your optimized pressure signals tell you is real?