← mikepilawski.com
UXDX · April 2026

Judgment Can’t Be Prompted

Mike Pilawski · Builder

What AI can’t teach you about product thinking.

The Judgment Gap

AI sounds right. Companies still miss the point.

Spend
$0B
enterprise AI spend, 2026
Returns
0%
miss projected ROI
Plausibility
0%
more likely to say
“definitely”
when it’s wrong

AI’s dangerous failure mode is plausibility.

It can deliver the wrong answer with the tone of certainty. That scales from a fabricated source to a polished roadmap that quietly points a team in the wrong direction.

The companies losing on AI are rarely losing on raw model quality. They’re losing on judgment: where to trust it, how to measure it, and when to override it.

88%
adopt AI
33%
scale beyond pilot
6%
see real returns

61% approve AI on projected value they never measure after deployment.

33% of senior leaders even understand how AI creates value.

“The danger isn’t that AI is wrong. It’s that it’s plausibly right.”

Why Pilots Fail

95% of AI pilots fail.

What teams blame

The model

  • Accuracy
  • Training data
  • Infrastructure
  • Technology maturity
What actually breaks

The operating model

  • Misaligned goals
  • Unclear ownership
  • Unchanged workflows
  • No success criteria

Teams don’t fail because the model is weak. They fail because nobody rewired goals, ownership, and workflows around it.

70–85% fail beyond pilot. Orgs that redesign workflows before choosing tools are about 2× more likely to see returns.
The New Frontier

The frontier has moved past prompts.

01

Prompts 2024

Single-turn queries. Most companies are here.

02

Agents + Context Engineering 2025

Notion, Figma, Anthropic building agent-first. Multi-step autonomous workflows.

03

Governed Execution + Evals 2026

Spec-driven development: Amazon’s Kiro, GitHub’s Spec Kit, Claude Code. OpenAI’s evals: the gold set should encode your experts’ judgment and taste.

EACH LAYER DEMANDS MORE JUDGMENT

Prompts need a good question.
Agents need a well-defined workflow.
Governed execution needs organizational judgment about risk, authority, and escalation.

The more autonomous AI becomes, the more judgment it demands upstream.

💡

As AI gets more capable, the premium shifts from technical skill to judgment skill.

The Product Manager Paradox

The Product Manager Paradox

AI makes judgment simultaneously more rare and more valuable.

Demand
0+
open PM roles globally — most in 3+ years, up 20% since January.
Lenny Rachitsky
Execution Floor
0%
of PMs use AI frequently, saving 1–2 hrs/day. The floor is rising for everyone.
Dean Peters · Productside
“AI didn’t steal your job. It multiplied it.”

More output, more decisions, more surface area for error. The job got bigger, not smaller.

Naval Ravikant · Feb 2026
“Vibe coding is the new product management.”

If anyone can build, the only thing that matters is knowing what to build.

CEILING — judgment
FLOOR — raised by AI
“This is where your career lives.”

More seats, sharper competition. Like the Olympics expanding from 100 to 200 countries. AI raises the floor, but it doesn’t move the ceiling.

The Real Risk

Vibe Deciding

41%
of code is AI-generated
40–62%
contain vulnerabilities
1.5M
API keys leaked (Moltbook)
March 2026

Vibe Coding

  • Fast prototyping
  • Errors are catchable
  • Code can be tested
  • CI and review exist

Vibe Deciding

  • Beautiful roadmap, wrong direction
  • Errors are invisible
  • Decisions can’t be unit tested
  • Blast radius: entire product strategy

Experts compare this to the Challenger disaster — structurally compromised systems that look fine on the surface.

Vibe coding has guardrails. Vibe deciding has none.

Cognitive Surrender

The tool making you dull.

People with higher confidence in AI tools show the greatest decline in critical thinking.

“Cognitive Surrender” — AI’s perceived competence increases delegation while reducing oversight (Wiley, 2025).

Cognitive debt: fragility when systems fail, quality drift, accountability gaps, weak talent pipelines where juniors never build foundational craft.

Cognitive Surrender

Accept without evaluating
Pattern-matching atrophies
Can’t spot errors
Dependent on AI

Compounding Judgment

Challenge every output
Pattern library deepens
Faster at spotting errors
AI amplifies your edge

Flip side: Anthropic’s March 2026 Economic Index found experienced users develop habits that make them more successful with AI.

💡

Same tool. Different outcomes. The question is whether you’re on the atrophy side or the compounding side.

Defining Judgment

What judgment actually is.

Cross-Domain Pattern Transfer

Connecting lessons across unrelated industries. AI pattern-matches within domains. You connect across them.

AI can Find patterns within a single domain. Identify correlations across large datasets.
Humans do Transfer patterns across unrelated domains. See that a restaurant’s queuing system solves an API rate-limiting problem.

The Discipline of Subtraction

Knowing what not to build, what to ignore, when to say no. AI is trained to help — every prompt produces more. Subtraction has no reward signal.

AI can Generate 50 feature ideas. Expand scope. Add options. Optimize what exists.
Humans do Kill the feature metrics say is working but intuition says is distracting. Say “not now.”

Consequence-Bearing Accountability

Owning outcomes when the system is wrong. AI generates options. You sign your name. AI doesn’t have scars — that’s why it can’t have judgment.

AI can Provide probabilities. Surface trade-offs. Model scenarios. Bears no consequence.
Humans do Make the call. Face the board. Own the failure. That’s judgment, not analysis.
💡

Human-AI combinations help more on creation tasks than decision tasks. Generation is where AI shines. Decisions are where humans remain essential. — Nature meta-analysis

Story · Pattern Transfer

The counterintuitive bet.

At Vungle, we believed guaranteeing ROI for advertisers — using their customers’ LTV data to optimize app-install campaigns — would change mobile advertising.

Advertisers said no. Flatly. They were terrified we’d use their best-customer data against them.

We read it as a trust problem, not a value problem. So we ran a wizard-of-oz pilot — manual, scrappy, just enough to prove the outcome without asking for a full leap of faith.

revenue in 6–8 months
Model would say Don’t build. Customers refuse to share the data.
Judgment saw They weren’t rejecting the value. They were rejecting the trust leap.
💡

Judgment is reading the gap between what customers say and what they’ll do when value is proven.

Story · Subtraction

The kill decision.

At Leanplum, we served engineering, growth, product, and marketing teams across web and mobile. Market opportunity, revenue, business case — all there for every segment.

But the platform was buckling. Hundreds of features, each used by a narrow segment. Not failing dramatically — slowly losing coherence.

We narrowed the ICP. Not “focus more” — actively fire paying customers. Walk them out. Help them transition.

The platform survived because we narrowed. Product coherence recovered, velocity came back, and the remaining customers got better outcomes.

Model would say Keep serving every segment. The revenue is there.
Judgment saw The metric was wrong. Coherence mattered more than short-term coverage.
💡

AI optimizes for the metric you give it. Judgment knows when the metric itself is wrong.

Story · Accountability

The accountability moment.

3% of revenue.
40% of capacity.

One customer caught in the narrowing: Coinbase — third-largest contract, 3% of ARR. A board member had personally helped land them.

I had reservations from the start. Security requirements higher than Bank of America’s. I gave the green light anyway. Over time, unique requirements crept in and consumed 40%+ of engineering capacity.

We fired Coinbase. The board member was not happy. It almost cost me my job.

This experience is encoded in every decision I’ve made since — not as a rule, but as a scar.

AI doesn’t have scars. That’s precisely why it can’t have judgment. Judgment is forged in consequences.

💡

Accountability isn’t having the analysis. It’s making the recommendation when your career is on the line and the room wants to hear something else.

The Taste Advantage

Taste becomes the moat.

When AI commoditizes velocity, differentiation moves to judgment, detail, coherence, and point of view. Three practices build taste:

01

Writing

Clarity before execution. If you can’t write it, you haven’t decided it.

02

Friction

Direct contact with reality. AI can’t feel what your users feel.

03

Constraint

Quality as a non-negotiable. Shipping hard on purpose builds the muscle.

0%
more revenue driven by
Stripe’s Optimized Checkout
“In a crowded market, the quality and details become the differentiation.” — Katie Dill, VP of Design, Stripe
💡

Taste is economic: 11.9% more revenue. Taste isn’t aesthetics. It’s ROI.

Practice 01 — Writing

Taste through writing.

Jack Dorsey · User Narratives

Stories that read like a play

At Square, Dorsey wrote detailed stories from the user’s perspective that “read like a play.”

“If you do that story well, all the prioritization, product, and design just falls out naturally.”

Amazon · The Six-Pager

Narrative memos replaced PowerPoint

Meetings start with 20 minutes of silent reading. You can’t hide fuzzy judgment behind bullet points.

💡

Writing forces clarity before execution. Writing the entire flow not only helps you understand if there are gaps, untested assumptions or whether the story logically holds. It also helps you express yourself more clearly, which will benefit your collaboration with AI.

Practice 02 — Friction

Taste through friction.

Brian Chesky · Airbnb

150-Screen Blueprint + 10-Star Framework

Mapped every screen (150), every policy (70), every touchpoint.

“Simplifying is not removing things — it’s distilling to the essence. And to simplify, you have to deeply understand it.”

Stripe · Walking the Store

Direct, firsthand contact

Multidisciplinary teams use their own product end-to-end, logging every friction point. Not analytics. Direct, firsthand contact.

💡

Friction builds taste by forcing direct contact with reality. AI removes friction. Reintroduce it deliberately where judgment is formed.

Practice 03 — Constraint

Taste through constraint.

Linear · No MVPs, Only Polished V1s

“Do you actually believe in quality?”

Karri Saarinen: “Do you actually believe in quality?” They hire for taste — candidates without product sensibility don’t make it through.

Instacart · Sonic DNA

Snapped real carrots by hand

Snapped real carrots by hand to craft their app’s sound design. Craft at the edges signals craft at the core.

💡

In the world where AI makes shipping easy, there will be a lot of competition from mediocre products. Raising the bar helps you refine your product taste and helps your products stand out in the market.

The Operating Model

Specify. Evaluate. Override.

SPECIFY EVALUATE OVERRIDE Judgment THE CONTINUOUS LOOP

Specify

Define what “great” looks like before you prompt. The gold set should encode your experts’ judgment and taste.

Evaluate

Run output against expert criteria. This is the step Microsoft/CMU found people skip.

Override

Flag decisions that are strategic, irreversible, or identity-defining — and keep them human.

💡

If AI could have made this decision, you’re not doing PM work. This isn’t anti-AI — it’s the operating model of the most AI-forward companies in the world.

Judgment can’t
be prompted.

It has to be earned.

If anyone can build, the only thing that matters is knowing what to build and what not to.
Judgment can’t be prompted, because consequences can’t be outsourced.

Mike Pilawski

mikepilawski.com/talks/judgment-cant-be-prompted

Builder

← → arrow keys · click · swipe to navigate