AI Builders Brief
?
← BACK TO TODAY

Follow builders, not influencers.

2026.05.22

25+ builders tracked

TL;DR

Altman shipped Codex while asking what AI should fix. Levie saw AI pricing split into tiers, Masad pushed buy-first rewards, and Anthropic added Auto Mode to Claude Code as the reliability bar jumped.

BUILDER INSIGHTS
9
01
Sam Altman Sam Altman

Codex ships as Altman asks what AI should fix

He says new Codex is shipping today, while also tossing out a broad prompt to the internet: what problem do you most want AI to solve? The mix is classic OpenAI—product launch on one hand, agenda-setting on the other.

X
02
Aaron Levie Aaron Levie CEO, box

AI pricing is splitting into tiers, fast

He says AI has moved from cheap chatbots to expensive, high-context agents, and that gap is only widening. The result: frontier tasks will keep paying up for better models, while enterprises build new systems to route work to cheaper ones when good enough.

X
03
Aditya Agarwal Aditya Agarwal CTO, SouthPkCommons

Startup hiring is a brutal filter, not a vibe check

He says early-stage hiring should be ruthless: if a candidate is still choosing between startup and BigCo, or won’t take a big pay cut, they’re probably not startup-ready. The bigger point is that negotiation tells you a lot, and founders should walk away fast when the signals turn bad. His take is blunt: startups demand real hours, real sacrifice, and zero ambiguity about the grind.

X
04
Ryo Lu Ryo Lu Cursor_ai

Cursor leans into team-based software building

They’re pitching software creation as a shared activity, not a solo grind, and pointing people to Cursor’s new model, interface, SDK, and automations. It’s a clean product push: make building feel more collaborative, then give teams the tools to actually do it.

X
05
Amjad Masad Amjad Masad CEO, replit

Buy without the sales call, then get rewarded

He says customers shouldn’t have to talk to a company just to buy the product — a clean anti-sales take that fits Replit’s self-serve DNA. He also teased a monetization loop for apps: if you earn from your app, Replit will kick back credit rewards.

X
06
Garry Tan Garry Tan CEO, ycombinator

Agents for everyone, and cities can be safer

He says everyone should have an agent with a GBrain, pushing the idea that personal AI should be a default tool, not a novelty. He also doubles down on Flock Safety as proof that cities can get safer if they choose to invest in it. In a separate interview clip, he and Diana Hu talk through what it takes for an engineer to become a 1000x founder.

X
07
Swyx Swyx dxtipshq

Local-first stack may have already won

He says this stack has basically won the local-first battle for building fast apps fast, though he leaves room for more chapters. It’s a strong “pick this if you want speed” take, not a neutral observation.

X
08
Matt Turck Matt Turck FirstMarkCap

AI progress feels like a step function now

He says recent AI progress suddenly feels real — not because of bigger models alone, but because post-training, test-time compute, and RL are making systems more reliable and useful. In his conversation with OpenAI’s Yann Dubois, he digs into why GPT-5.5 shipping feels like an emotional rollercoaster, where the data wall is, and why startups should focus on the last mile of AI.

X
09
Zara Zhang Zara Zhang

Claude Code gets a Lark/Feishu sidekick

She shipped an open-source bridge that turns Lark/Feishu into a control plane for Claude Code: chat with it like a coworker, run multiple sessions as group chats, and let it read docs, transcripts, and messages for context. It also writes Lark Docs, replies to @mentions, and can send interactive cards back — basically Claude Code with a workplace-native UI.

X
BLOG UPDATES
1
Anthropic Engineering

Claude Code auto mode: a safer way to skip permissions

Anthropic adds Auto Mode to Claude Code for safer autonomy

Lead: Anthropic launched Auto Mode for Claude Code, a new permission system that replaces constant human approval with model-based classifiers and prompt-injection screening to let agents run more autonomously without going fully unguarded.

Numbers:

  • Manual prompts are accepted 93% of the time, contributing to approval fatigue.
  • Evaluation on 10,000 real internal tool calls, 52 real overeager actions, and 1,000 synthetic exfiltration attempts.
  • Full pipeline results: 0.4% false-positive rate on real traffic, 17% false-negative rate on real overeager actions, and 5.7% false-negative rate on synthetic exfiltration.
  • Stage 1 alone: 8.5% FPR, 6.6% FNR on real overeager actions, 1.8% FNR on synthetic exfiltration.

So What: Auto Mode is aimed at teams that want fewer permission prompts than manual review, but more protection than `--dangerously-skip-permissions`. It uses two defenses: a server-side prompt-injection probe on inputs, and a transcript classifier on tool calls that blocks risky actions like destructive deletes, credential hunting, exfiltration, and bypassing safety checks. Anthropic says the system is conservative by design: “The classifier is deliberately conservative,” and it evaluates the real-world impact of actions, not just their surface text. Practical takeaway: builders can start with conservative defaults, then tune trusted environments and exceptions over time—useful for routine coding, but not a substitute for careful human review on high-stakes infrastructure.

PODCAST HIGHLIGHTS
1

AI just crossed the reliability threshold—and real work is next

The Takeaway: AI progress feels sudden because models finally became reliable enough to be useful, not because capability jumped overnight.

  • The big shift is from benchmark wins to messy real-world utility: “we moved from competitions to usefulness to users.”
  • Reinforcement learning is escaping math and coding contests and starting to work on actual knowledge work, agentic coding, and scientific tasks.
  • Efficiency matters as much as raw intelligence: the goal is to make models think less, backtrack faster, and deliver better answers with lower latency.

Yann Dubois, who co-leads OpenAI’s Post-Training Frontiers team, frames the current AI wave as a threshold crossing. He says the models didn’t suddenly get magical; they got dependable enough that people can trust them to do real work. That’s why the last few months have felt like a step function. Internally, the same models are also accelerating the people building them, especially because coding tools now speed up research, training, and infrastructure.

Dubois’s core point is that post-training is where the action moved. Early reinforcement learning was tuned for “verifiable rewards” like math problems and coding competitions. Now those same techniques are being pushed into ambiguous, high-value tasks where there isn’t a clean right answer. That’s a much bigger deal than another leaderboard win.

He also makes a sharp distinction between raw compute and useful reasoning. Longer thinking helps, but only up to a point; the real win is getting models to choose better reasoning paths and recognize dead ends sooner. In his words, an expert doesn’t explore ten directions when one is obviously better. That’s the kind of efficiency OpenAI is chasing.

Dubois, originally trained in biomedical engineering in Switzerland before moving through NLP work in Singapore and a PhD at Stanford, is unusually focused on impact. Even his public note telling quant firms not to reach out says a lot: he wants to build systems that matter, not just systems that score well.

STAY UPDATED

Daily builder insights, straight to your inbox.

Prefer RSS? Subscribe via RSS

ARCHIVE