Mike Pilawski · Builder
What AI can’t teach you about product thinking.
AI’s dangerous failure mode is plausibility.
It can deliver the wrong answer with the tone of certainty. That scales from a fabricated source to a polished roadmap that quietly points a team in the wrong direction.
The companies losing on AI are rarely losing on raw model quality. They’re losing on judgment: where to trust it, how to measure it, and when to override it.
61% approve AI on projected value they never measure after deployment.
33% of senior leaders even understand how AI creates value.
“The danger isn’t that AI is wrong. It’s that it’s plausibly right.”
Teams don’t fail because the model is weak. They fail because nobody rewired goals, ownership, and workflows around it.
Single-turn queries. Most companies are here.
Notion, Figma, Anthropic building agent-first. Multi-step autonomous workflows.
Spec-driven development: Amazon’s Kiro, GitHub’s Spec Kit, Claude Code. OpenAI’s evals: the gold set should encode your experts’ judgment and taste.
EACH LAYER DEMANDS MORE JUDGMENT
Prompts need a good question.
Agents need a well-defined workflow.
Governed execution needs organizational judgment about risk, authority, and escalation.
The more autonomous AI becomes, the more judgment it demands upstream.
As AI gets more capable, the premium shifts from technical skill to judgment skill.
AI makes judgment simultaneously more rare and more valuable.
More output, more decisions, more surface area for error. The job got bigger, not smaller.
If anyone can build, the only thing that matters is knowing what to build.
More seats, sharper competition. Like the Olympics expanding from 100 to 200 countries. AI raises the floor, but it doesn’t move the ceiling.
Experts compare this to the Challenger disaster — structurally compromised systems that look fine on the surface.
Vibe coding has guardrails. Vibe deciding has none.
People with higher confidence in AI tools show the greatest decline in critical thinking.
“Cognitive Surrender” — AI’s perceived competence increases delegation while reducing oversight (Wiley, 2025).
Cognitive debt: fragility when systems fail, quality drift, accountability gaps, weak talent pipelines where juniors never build foundational craft.
Flip side: Anthropic’s March 2026 Economic Index found experienced users develop habits that make them more successful with AI.
Same tool. Different outcomes. The question is whether you’re on the atrophy side or the compounding side.
Connecting lessons across unrelated industries. AI pattern-matches within domains. You connect across them.
Knowing what not to build, what to ignore, when to say no. AI is trained to help — every prompt produces more. Subtraction has no reward signal.
Owning outcomes when the system is wrong. AI generates options. You sign your name. AI doesn’t have scars — that’s why it can’t have judgment.
Human-AI combinations help more on creation tasks than decision tasks. Generation is where AI shines. Decisions are where humans remain essential. — Nature meta-analysis
At Vungle, we believed guaranteeing ROI for advertisers — using their customers’ LTV data to optimize app-install campaigns — would change mobile advertising.
Advertisers said no. Flatly. They were terrified we’d use their best-customer data against them.
We read it as a trust problem, not a value problem. So we ran a wizard-of-oz pilot — manual, scrappy, just enough to prove the outcome without asking for a full leap of faith.
Judgment is reading the gap between what customers say and what they’ll do when value is proven.
At Leanplum, we served engineering, growth, product, and marketing teams across web and mobile. Market opportunity, revenue, business case — all there for every segment.
But the platform was buckling. Hundreds of features, each used by a narrow segment. Not failing dramatically — slowly losing coherence.
We narrowed the ICP. Not “focus more” — actively fire paying customers. Walk them out. Help them transition.
The platform survived because we narrowed. Product coherence recovered, velocity came back, and the remaining customers got better outcomes.
AI optimizes for the metric you give it. Judgment knows when the metric itself is wrong.
One customer caught in the narrowing: Coinbase — third-largest contract, 3% of ARR. A board member had personally helped land them.
I had reservations from the start. Security requirements higher than Bank of America’s. I gave the green light anyway. Over time, unique requirements crept in and consumed 40%+ of engineering capacity.
We fired Coinbase. The board member was not happy. It almost cost me my job.
This experience is encoded in every decision I’ve made since — not as a rule, but as a scar.
AI doesn’t have scars. That’s precisely why it can’t have judgment. Judgment is forged in consequences.
Accountability isn’t having the analysis. It’s making the recommendation when your career is on the line and the room wants to hear something else.
When AI commoditizes velocity, differentiation moves to judgment, detail, coherence, and point of view. Three practices build taste:
Clarity before execution. If you can’t write it, you haven’t decided it.
Direct contact with reality. AI can’t feel what your users feel.
Quality as a non-negotiable. Shipping hard on purpose builds the muscle.
“In a crowded market, the quality and details become the differentiation.” — Katie Dill, VP of Design, Stripe
Taste is economic: 11.9% more revenue. Taste isn’t aesthetics. It’s ROI.
At Square, Dorsey wrote detailed stories from the user’s perspective that “read like a play.”
“If you do that story well, all the prioritization, product, and design just falls out naturally.”
Meetings start with 20 minutes of silent reading. You can’t hide fuzzy judgment behind bullet points.
Writing forces clarity before execution. Writing the entire flow not only helps you understand if there are gaps, untested assumptions or whether the story logically holds. It also helps you express yourself more clearly, which will benefit your collaboration with AI.
Mapped every screen (150), every policy (70), every touchpoint.
“Simplifying is not removing things — it’s distilling to the essence. And to simplify, you have to deeply understand it.”
Multidisciplinary teams use their own product end-to-end, logging every friction point. Not analytics. Direct, firsthand contact.
Friction builds taste by forcing direct contact with reality. AI removes friction. Reintroduce it deliberately where judgment is formed.
Karri Saarinen: “Do you actually believe in quality?” They hire for taste — candidates without product sensibility don’t make it through.
Snapped real carrots by hand to craft their app’s sound design. Craft at the edges signals craft at the core.
In the world where AI makes shipping easy, there will be a lot of competition from mediocre products. Raising the bar helps you refine your product taste and helps your products stand out in the market.
Define what “great” looks like before you prompt. The gold set should encode your experts’ judgment and taste.
Run output against expert criteria. This is the step Microsoft/CMU found people skip.
Flag decisions that are strategic, irreversible, or identity-defining — and keep them human.
If AI could have made this decision, you’re not doing PM work. This isn’t anti-AI — it’s the operating model of the most AI-forward companies in the world.
It has to be earned.
If anyone can build, the only thing that matters is knowing what to build and what not to.
Judgment can’t be prompted, because consequences can’t be outsourced.
Mike Pilawski
mikepilawski.com/talks/judgment-cant-be-prompted
Builder