AI Products Need a Different Playbook — 12 Counterintuitive Lessons from 20 Builders
AI products are fundamentally different from traditional software. This post distills 12 counterintuitive lessons from 20 builders, the CC/CD framework for AI development lifecycles, and a 3-phase eval system — all sourced from Lenny Rachitsky's newsletter.
Why AI Products Are Fundamentally Different
Traditional software is deterministic — it either works or it doesn’t. AI products are probabilistic — the same input can produce different outputs. This difference breaks every assumption in product development.
According to Emergence Capital, 60% of companies have already integrated generative AI into their products. Yet 2 in 5 gen AI products still haven’t made a single dollar despite millions (or billions) in spending.
This post distills frameworks from three key pieces in Lenny Rachitsky’s newsletter:
- Counterintuitive Advice for Building AI Products (2024.07) — 12 lessons from 20 companies
- Why Your AI Product Needs a Different Development Lifecycle (2025.08) — The CC/CD framework
- Building Eval Systems That Improve Your AI Product (2025.09) — 3-phase eval playbook
12 Counterintuitive Lessons
| # | Conventional Wisdom | Counterintuitive Truth | Source |
|---|---|---|---|
| 1 | Start with user needs | Start with “what’s technically possible?” | Paul Adams, Intercom |
| 2 | Demo = Product | Watch for Phantom PMF | Joshua Xu, HeyGen |
| 3 | Model quality is key | UX, privacy, post-processing matter more | Ryan J. Salva, GitHub |
| 4 | Don’t label it “AI” | AI branding increases engagement AND comprehension | James Evans, CommandBar |
| 5 | Build big AI features | Small invisible AI is the fan favorite | Claire Vo, LaunchDarkly |
| 6 | Time in app = success | Best AI reduces time in app | Gaurav Misra, Captions |
| 7 | Start with easy problems | Startups are safer tackling hard problems | Sarah Guo, Conviction |
| 8 | Ship new features | Prompt improvement alone is often enough | Sherif Mansour, Atlassian |
| 9 | Quality over speed | Speed is the competitive advantage (pre-compute wins) | Rahul Vohra, Superhuman |
| 10 | Stack logic on models | Fine-tuning beats excessive scaffolding | Henri Liriani, Tome |
| 11 | Customization later | Users want customization from day one | Johnny Ho, Perplexity |
| 12 | Models are the moat | Data and interfaces matter more than models | Scott Belsky, Adobe |
Key quotes:
“Most people think about AI services in terms of model quality, but model quality is just a tiny piece. Post-processing filters, data privacy, feedback loops are all far more important.” — Ryan J. Salva, GitHub
“The smallest (and almost invisible) features are usually the fan favorites.” — Claire Vo, LaunchDarkly
GitHub Copilot’s acceptance rate is 35% — improving it came from UX optimization, not model swaps.
The CC/CD Framework
Traditional CI/CD can’t manage AI products. AI needs Continuous Calibration / Continuous Development (CC/CD).
graph TB
subgraph CD["Continuous Development"]
CD1["CD 1: Scope capability + curate data"]
CD2["CD 2: Set up application"]
CD3["CD 3: Design evals"]
CD1 --> CD2 --> CD3
end
CD3 --> DEPLOY["Deploy"]
subgraph CC["Continuous Calibration"]
CC4["CC 4: Run evals"]
CC5["CC 5: Analyze behavior + spot patterns"]
CC6["CC 6: Apply fixes"]
CC4 --> CC5 --> CC6
end
DEPLOY --> CC4
CC6 -->|"iterate"| CC4
CC6 -->|"major change"| CD1
Agency should be granted incrementally — like onboarding a new teammate:
| Version | Control | Agency | Example (Marketing Assistant) |
|---|---|---|---|
| v1 | High | Low | Draft copy from prompts |
| v2 | Medium | Medium | Build multi-step campaigns |
| v3 | Low | High | Launch + A/B test + auto-optimize |
3-Phase Eval System
“The #1 misconception: ‘Can’t the AI just eval it?’ It doesn’t work.” — Hamel Husain
Phase 1: Error Analysis — Manually review ~100 interactions. Appoint a single domain expert (Benevolent Dictator), not a committee.
Phase 2: Build Eval Suite — Code-based evaluators for objective failures, LLM-as-a-Judge for subjective ones. Use binary pass/fail, not Likert scales. Measure TPR/TNR, not accuracy.
Phase 3: Operationalize — Safety Net (CI + golden dataset) prevents regression. Discovery Engine (production LLM-as-Judge, async) finds unknown unknowns.
Practical Checklist
The Make Me Unicorn AI Product Blueprint covers 45+ AI-specific items across model integration, cost control, prompt engineering, UX, data privacy, and monitoring.
pip install make-me-unicorn
mmu init && mmu scan
# See docs/blueprints/industry/ai-product.md
Related posts in this series:
- How AI Frameworks Make Money
- How AI Observability Platforms Make Money
- The Code Was Done, But Everything Else Wasn’t
Sources
- Lenny Rachitsky & Kyle Poyar, “Counterintuitive advice for building AI products,” Lenny’s Newsletter, 2024-07-02
- Aishwarya Reganti & Kiriti Badam, “Why your AI product needs a different development lifecycle,” Lenny’s Newsletter, 2025-08-19
- Hamel Husain & Shreya Shankar, “Building eval systems that improve your AI product,” Lenny’s Newsletter, 2025-09-09
Related Posts

Will AI Free or Cement Korea — Deregulation × AI Build × SI Inertia
Three forces (deregulation × AI build × SI inertia) producing three scenarios for Korea's software market — Korea Vindicated, Double Loss, Bifurcation. The real branching point.

The Cost of Being Trapped — How LINE WORKS, Younglimwon, and RSUPPORT Detour Through Japan
Korean SaaS companies blocked by SI and on-prem walls at home, then taking #1 positions in Japan — a natural experiment in Korea's build lock-in.

Why Korea Doesn't Buy Software — A Build/Buy/Assemble Reframe of Korea's AI B2B Market
Korea's software spending sits at 0.3% of GDP (rank 51) vs. R&D at 5.0% (rank 1). The series thesis reduces five 'peculiarities' of Korea's market to one root cause.