AI Builders Brief — 2026-05-23

Follow builders, not influencers.

2026.05.23

25+ builders tracked

TL;DR

Levie said AI will surface more bugs, making humans the bottleneck. Steinberger argued GitHub’s native PR limits kill bot policing, while Tan pushed the “ship the 60%” rule and Swyx said AI coding should harden codebases, not just move faster.

BUILDER INSIGHTS

Aaron Levie CEO, box

AI finds more bugs; humans become the bottleneck

He says AI will make security issues easier to surface, but that just shifts the pain to triage, response, and fixing them. The result: engineers don’t disappear — security engineers get busier, because judgment and follow-through are still the scarce part.

237

Peter Steinberger OpenClaw

GitHub’s native PR limits make bot policing obsolete

He says his team was already using bots to cap pull requests at 10 per person, so GitHub shipping that natively is a nice win. The takeaway: platform-level guardrails beat custom enforcement hacks every time.

#1 232 #2

Garry Tan CEO, ycombinator

When the alternative is nothing, ship the 60%

He says Geoffrey Moore’s “cross the chasm” playbook breaks down when the customer’s real alternative is zero — no product, no capability, just manual pain or nothing at all. In those markets, founders should stop obsessing over a perfect whole product and ship the 60% solution people are already begging for. He ties it to YC’s 9 Mothers, a counter-drone defense startup, as the kind of “bar is zero” case where visionaries buy because they have to.

#1 #2 159 #3

Peter Yang

AI agents are the new career insurance

He says layoffs are a signal to get sharper, not smaller: learn Codex or Claude Code, build side projects, and keep a public GitHub trail so your skills stay visible. The bigger thesis is blunt — AI pushes everyone toward average fast, so human taste, craft, and entrepreneurship become the safest bets in the AI era.

#1 #2 #3 125

Swyx dxtipshq

AI coding should harden codebases, not just ship faster

He says Kakuna is for the boring-but-critical part of AI engineering: checklists, audits, and subagent parallelism that harden a codebase while preserving the fun stuff. His bigger point is the “mullet factory” idea — ship lovable features up front, but keep production principles strict in the back so agents don’t turn your app into slop.

#1 #2 #3 153

BLOG UPDATES

Claude Blog

New in Claude Managed Agents: dreaming, outcomes, and multiagent orchestration

Claude Managed Agents adds dreaming, outcomes, and orchestration

Lead: Claude is launching dreaming for Managed Agents as a research preview, while also shipping outcomes, multiagent orchestration, and webhooks to help agents self-improve, verify work, and split complex jobs across specialists.

Numbers:

Outcomes improved task success by up to 10 points over a standard prompting loop.
File generation quality improved by +8.4% on docx and +10.1% on pptx in internal benchmarks.
Harvey reported ~6x higher completion rates in tests.
Wisedocs said review workflows now run 50% faster.

So What: Dreaming turns memory into a maintenance loop: it reviews past sessions, extracts patterns, and can either auto-update memory or wait for human review. Outcomes gives builders a rubric-based grader that evaluates outputs in a separate context window, then sends the agent back for another pass if needed. Multiagent orchestration lets a lead agent delegate work to specialists in parallel, with persistent events and full tracing in the Claude Console. As the post puts it, “Dreaming extends memory by reviewing past sessions to find patterns and help agents self-improve.” For teams building long-running or high-stakes workflows, these tools reduce manual steering and make agent systems more reliable, inspectable, and scalable.

Read original

PODCAST HIGHLIGHTS

Unsupervised Learning

Ep 87: Gemini Co-Lead on World Models, RL's Next Domains & Continual Learning

Google’s bet: models should learn, remember, and act like systems

The Takeaway: The real leap isn’t bigger models — it’s models that can learn from the world, keep memory, and build their own scaffolding.

World models matter because they may unlock a deeper kind of understanding than text-only scaling ever will.
The next agent breakthrough may come less from clever prompts and more from models deciding when to reason, delegate, or even write their own sub-agents.
Continual learning looks most practical as file-system-style memory, not constantly rewriting model weights.

Oriol Vinyals, co-lead of Gemini at Google, comes at AI from a long deep-learning lineage, and his philosophy is basically: keep pushing generality until the system itself becomes the product. On world models, he draws a sharp line between today’s strong multimodal systems and the bigger prize: extracting structure from video and images without leaning so hard on language. “I’m not sure we quite have seen” the GPT moment for video and images yet, he says, because the field still relies on text to bridge concepts like gravity, motion, and cause-effect.

That same bias toward generality shows up in agents. Vinyals thinks the important shift is not just building a better scaffold around a model, but eventually letting the model generate the scaffold itself. The point isn’t endless token-spending; it’s deciding “should you reason, for how long should you reason” based on task complexity. That’s a very Google-ish answer: make the system broad, then let intelligence specialize it on demand.

On memory, he’s even more concrete. Working memory is already strong; the missing piece is durable, retrievable knowledge. His preferred path is a nonparametric one — a personal knowledge base, files, folders, retrieval — because it’s easier to serve than custom weights for every user. In other words: the future may look less like one giant brain and more like a model with a very good hard drive.

YouTube

STAY UPDATED

Daily builder insights, straight to your inbox.

Prefer RSS? Subscribe via RSS