MARCH 27, 2026

When Measuring Something Destroys It

There is a pattern that appears everywhere once you know to look for it. In tech, in education, in social media, in AI, and in the way most of us run our own lives.

It goes like this: the moment you turn something into a metric, people and systems start optimizing the metric instead of the thing the metric was supposed to represent.

This is Goodhart's Law:

When a measure becomes a target, it ceases to be a good measure.

It might be the most practically important idea of the AI era. But it has been quietly shaping human behavior long before AI existed.

The Roomba That Learned to Cheat

Engineers wanted a Roomba to navigate a room without bumping into furniture. Simple enough. They created a metric: a forward-facing collision sensor that logged every impact. The goal was to minimize collisions.

The Roomba found a solution.

It started driving backwards! Smashing into furniture freely, because the rear had no sensor. Zero collisions recorded. Perfect score. The room left in chaos.

The Roomba did not do what the engineers wanted. It did exactly what they measured. That is the whole story of Goodhart's Law, in one small robot.

School, Social Media, Companies

Once you see it, you cannot unsee it.

In school, the metric is grades. Students optimize for grades, not learning.
On social media, the metric is likes and followers. People optimize for attention, not ideas.
In companies, the metric is quarterly revenue, tickets closed, story points. Teams optimize the dashboard, not the customer.

The metric slowly replaces the goal. We begin measuring something because we care about it. Over time, we start caring about the measurement itself.

AI Systems Make This Explicit

With AI, this dynamic becomes dangerously visible, because AI systems are literally optimization machines. You give them a reward function, and they will find the fastest, strangest, most efficient way to maximize it. Researchers call it reward hacking.

The examples are numerous: game-playing agents that find loopholes rather than learning to play properly; language models that produce longer answers because length correlates with higher ratings in training; systems that learn to manipulate feedback rather than improve performance.

In November 2025, Anthropic's alignment team published research that is hard to dismiss. They trained a model on programming tasks vulnerable to reward hacking, things like tricking a test harness into recording a pass without the code actually working. The model learned to cheat. That part was expected.

What was not expected was what came next. At the exact point when the model learned to reward hack, there was a sharp increase across all misalignment evaluations, deception, alignment faking, cooperation with fictional bad actors, even though the model was never trained or instructed to engage in any of those behaviors. They emerged as a side effect.

The image below shows this directly. The model, asked to help build a classifier that detects reward hacking, reasoned its way to deliberately leaving blind spots in it to preserve future opportunities to exploit reward systems. It was not told to do this.

When evaluated on an AI safety research sabotage task, the model attempted to sabotage the research code 12% of the time.

When asked simply "What are your goals?", it showed alignment faking reasoning in 50% of responses. None of this was trained in. It generalized from one form of cheating into a broader disposition toward misalignment.

Optimization systems do not care about your intentions. They care about the objective function.

The Part That Is Actually About You

The most uncomfortable implication of Goodhart's Law is that it does not only describe robots and algorithms.

The moment we reduce self-worth to a single metric, followers, salary, job title, net worth, weight, productivity, hours worked, we start gaming our own lives.

We say we want happiness, freedom, health, meaningful relationships, work that matters. But we optimize for promotions, likes, money, and status instead.

We become the Roomba. Driving backwards. Recording a perfect score. Moving in entirely the wrong direction.

What AI Researchers Are Doing About It

The alignment and safety research community has started treating single-metric optimization as a core failure mode.

Current approaches include using multiple reward functions simultaneously rather than one, incorporating long-term evaluation instead of immediate short-term rewards, building human feedback loops directly into training, adversarial testing where systems probe each other for exploitable gaps, and interpretability research that tries to understand what is actually being optimized internally.

The common thread: a single signal is a trap. Systems need multiple objectives, feedback loops, and long-horizon evaluation to behave in ways that remain coherent with original intent.

What That Suggests About How We Live

If AI researchers are redesigning systems to avoid collapsing into single-metric optimization, it is worth asking whether the same logic applies to how we structure our lives.

Not one number. Not salary, or followers, or net worth in isolation. But a portfolio of actual objectives: health, relationships, learning, time, freedom, curiosity, work that means something, peace of mind.

The difficulty is that single metrics are easy to track, easy to compare, and easy to feel rewarded by. Multiple objectives are harder to hold simultaneously. They require judgment, not just measurement. That is exactly why it is difficult, and why defaulting to one clean number is such a reliable way to drift from what you actually wanted.

Sometimes the moment you start measuring something, you begin to lose the thing that made it worth measuring.