Article

Inside Notion’s $10B AI Coding Experiment: A Journalist’s Two-Day Vibe-Coding Immersion

DATE: 8/22/2025 · STATUS: LIVE

Curious about vibe-code, I embedded myself in Notion’s world to see AI write code—but then everything took an unexpected turn…

Inside Notion’s $10B AI Coding Experiment: A Journalist’s Two-Day Vibe-Coding Immersion

Article content

I’d been mulling an odd idea for weeks: drop into a tech startup and learn to vibe-code—Notion’s term of art for AI-assisted programming. On a late night, I pitched my editors. They blinked, then said yes. Next I emailed Notion, the San Francisco venture-backed app maker valued near $10 billion, and proposed embedding as a temporary engineer. To my surprise, they agreed. The promise of seeing AI write real code felt urgent—I wanted to understand how to survive work in the years ahead.

Notion employs roughly a thousand people. Its core product is an infinitely flexible to-do and note-taking workspace, bristling with templates, databases, kanban boards, calendars and more. Personal-productivity gurus on YouTube spend entire videos explaining how to fold their lives into Notion’s expandable blocks. One clip, titled “How to Get Started in Notion Without Losing Your Mind,” has more than 3.4 million views. Learning to use Notion can seem like a project in itself.

I was due to show up on a Thursday in mid-July. The night before, I roofied myself on tutorials, convinced I’d need encyclopedic knowledge of every widget. On an early onboarding call, a new coworker had advised downloading Cursor, the AI coding platform, and tinkering. I installed it and clicked around, but not a single runnable line emerged. My panic only grew.

I arrived the next morning via Market Street’s crowded Muni platform, then took an elevator to Notion’s sixth floor. The open-plan room smelled of fresh coffee and citrus hand soap. Sarah Sachs, the AI engineering lead, greeted me by name and led me to a desk already set up with a company tote bag, a logo-stamped notebook, and a tall water bottle. She handed me an access badge and said, “Tomorrow you’ll demo your work at our staff meeting. Sound good?” I managed to nod.

A few feet away, Simon Last, one of Notion’s three cofounders, sat hunched over a laptop. He’s lean and quiet, with a shock of dark hair falling over wire-rim glasses. He offered a brief handshake, then turned back to his screen, where lines of code were streaming in real time—clearly a snippet from some AI model. Later Simon likened using these coding agents to managing a swarm of interns.

Notion first introduced an AI assistant inside the app in 2022 to help users draft pages and summaries. Now the company is evolving that helper into a full-fledged “agent” meant to run autonomously, building tables or rewriting entire notes as you focus on other tasks. Rolling out an autonomous assistant at scale requires writing heaps of new infrastructure: API endpoints, UI components, database migrations, performance tests and more.

On a typical day here, engineers launch Cursor in their code editor and pick from several available AI models. Many of the folks I chatted with favored Anthropic’s Claude, often using the specialized Claude Code interface. They type a human-language request—build this widget, fix that bug—and the AI generates code suggestions. Then the engineer reviews, tweaks, adds tests, runs local checks, and eventually pushes to production.

Generative AI isn’t cheap. Models like Claude and OpenAI’s offerings run on massive GPU clusters that burn through tens of thousands of dollars daily. The theoretical return on that investment shows up as saved hours. In theory, CEO Ivan Zhao could finish his work before lunch and head down to the jazz club on the ground floor of this Market Street building. In practice, he just works longer. The fabled four-day week remains a pipedream.

My full run at Notion was only two days—a crash course in what they call a vibe-coding sprint. I agreed to anonymize everyone beyond first name and to treat the code base as off-limits for outside eyes. For each task, I’d pair with a different engineer.

My first assignment was to improve the way a chart type, the mermaid diagram, displays in Notion. Quinn and Modi explained that these charts are rendered as static SVG files and, despite the name scalable vector graphic, appear at a fixed size on each page. Users often couldn’t read the tiny text without zooming their entire browser window.

Quinn tapped his keyboard and expanded Cursor on the side of his code editor. He peered at me and said, “So, the Notion code base? Has a lot of files. You probably, even as an engineer, wouldn’t even know where to go. But we’re going to ignore all that. We’re just going to ask the AI on the sidebar to do that.”

He sketched out their usual approach: start with a diagnostic prompt—“Why is this rendering static?”—so the AI first groks the code structure. Cursor then returns a rough summary. Armed with that, you can refine the prompt into an actionable request.

We assembled a block of text containing the Jira ticket and some notes from Slack:

Ticket: Add Full Screen / Zoom to mermaid diagrams. Clicking on the diagram should zoom it in full screen.

Notes from slack: "mermaid diagrams should be zoom / fullscreenable like uploaded images. they're just svgs right, so we can probably svg -> dataurl -> image component if we want to zoom"

I pasted that into the sidebar and hit the run button. The model began churning, and lines of code streamed in slowly—like a typewriter wheezing out a first draft.

Cursor spat out about a hundred lines of JSX and TypeScript. We clicked “Apply Diff” and watched as the editor marked dozens of modified files. Quinn launched a dev server, navigated to a page with a mermaid chart, and clicked. The image expanded, but not perfectly: some text sat off-center, a few diagrams rendered partly transparent, and dark-mode styling was missing.

We spent the next half hour iterating. I prompted the AI to add padding around the SVG container, enforce an opaque background, and toggle classes for light and dark themes. Quinn showed me how to write a Jest test that simulates a click event and verifies that the CSS class changed. Each fresh snippet in Cursor improved the result. By the end, mermaid diagrams popped into full-screen clearly, regardless of theme.

Next, I moved to Lucy’s desk. She introduced me to Codegen, a different AI-coding tool that she preferred for complex agent workflows. Our ticket: create a new notational skill called Alphabetize. When users generate a list—say, popular dog breeds—they should be able to click one button and have the entries sorted alphabetically.

Lucy opened a boilerplate script and typed:

“Define a Notion AI skill named Alphabetize that takes any list output and sorts entries in ascending order. Include both UI bindings and backend integration.”

The Codegen agent began writing code. Five minutes in, it stalled. Her editor froze. A ping from Sarah Sachs’s phone echoed like a medical alert. She dashed out.

The Claude service was down. A glitch at the cloud provider had knocked the endpoint offline. For ten minutes, all servers returned 503 errors. Lucy and I hovered around her monitor, sipping water out of a branded bottle. Our in-progress agent sat orphaned at line 42. Without Claude, bulldogs would remain listed before beagles.

At last the endpoint came back. The agent resumed, finishing the UI button, wiring up a GraphQL mutation, and generating a minimal test suite. We ran the tests, everything passed, and I clicked “Create Pull Request.” I tagged Lucy and added a note: “Ready for review.”

In the afternoon, I got my final brief: build whatever I wanted. The blank slate felt both exhilarating and paralyzing. Around me, engineers were unleashing new features—a drag-and-drop table view, a calendar-sync bot, a UI tweak that tracked Bevi syrup flavors. I decided to propose an AI-powered to-do generator.

My design: when a user types a natural-language phrase beginning with “to do,” Notion AI would parse it into a checklist, automatically skipping any tasks already on the page. For example: “to do reorder pet food” yields “Buy dog food,” “Refill water bowl,” “Order treats,” but leaves out anything present from earlier lists.

I wrote a prompt outlining the user flow, fed it to Claude via Cursor, and hit run. Seconds later, I saw the draft code. I copied it into a test page and executed the skill. A flurry of tasks appeared—then the same ones again. My duplicate-filter logic was inverted. I stared at my screen, wondering if the fault lay with me or the AI.

Brian, a product designer who’d been lurking nearby, leaned in. “Pretend you’re talking to a smart intern,” he said. “Again with the interns.” I laughed, then typed a more precise prompt: “Check the current page for existing to-do items, remove matches, then generate new entries from this input.” The AI replied, “That’s a great idea,” and dove back in.

Forty minutes later, I had a functioning prototype. It handled multiple pages, ignored duplicates, and surfaced clear error messages if something went wrong. I glanced at the Claude Code token meter—it showed a total cost of seven dollars. Brian told me refactors that let an agent run for hours can rack up hundreds in cloud fees.

It was still light out when I wrapped up the first day. I shut my laptop, grabbed my tote bag, and walked down Market Street under lamps glowing like amber. A light drizzle threatened, but the evening air felt electric.

Friday morning began with the weekly demo. The conference room had a polished wood table, rolling whiteboards scribbled with blue and pink sticky notes, and a projector screen at one end. In the corner sat cheese platters in honor of a Swiss engineer’s birthday. A Bevi machine hissed and bubbled as people queued for flavored water; others grabbed mugs of coffee or cans of Celsius.

One by one, engineers presented. A senior developer demoed a Notion AI agent with memory—now it could mimic a user’s writing style over multiple sessions. Another shared a small web app that tracked which syrup pumps were nearly empty on the Bevi machine. A third showed off a pipeline that generated meeting agendas from calendar events and Slack threads.

At the end of each segment, someone lifted a tiny mallet and struck a xylophone stationed beside the screen. On day two, I was appointed xylophone keeper. Each ring and clink punctuated the demos like applause.

When my turn arrived, I provided a concise walkthrough of the two main features I’d built. One manager asked, “How long had it taken to code the changes to the mermaid diagrams, end to end?”

Quinn and Modi exchanged a look. We counted on our fingers. “Roughly thirty minutes of active coding,” we said, “plus about fifteen minutes of setup and prompt tuning.”

One attendee murmured, “Wow,” and a ripple of laughter followed. It felt oddly satisfying to get that reaction for a half-hour task.

I couldn’t help but recall an essay by Ellen Ullman, the programmer and author. In her 2016 article titled “Programming for the Millions,” she wrote, “I dare to imagine the general public learning how to write code.” Ullman argued that widening access to programming was society’s best chance to loosen software’s grip on daily life.

Ullman enrolled in three massive open online courses to test her theory. She urged novice coders to “Stick a needle into the shiny bubble of the technical world’s received wisdom,” and then “Burst it.” In a follow-up essay, she lamented how primitive auto-graders judged students by flawed algorithms. Still, she maintained that if you slogged through the frustration, “a certain fascination gets through,” she wrote, comparing it to hearing a beautiful piano or sax performance that sparks a longing to learn.

My own stint at Notion didn’t ignite a burning desire to become a full-time engineer. It did crack open a hidden chamber in my mind, though—a place where logic constructs reality and every prompt matters. I was reminded that with human collaborators we tolerate ambiguity. With AI, you must spell out exactly what you want.

During lunch on that second day, an engineer asked, “Do you ever use ChatGPT to write your articles?” I shook my head. “Never,” I replied. Her eyes widened. I explained that it’s a matter of principle, not whether the AI can produce workable prose. I chose not to lecture on how AI-generated summaries and shifts in search have hollowed out traffic to news websites. Plenty of my friends worry about their own livelihoods.

One colleague compared today’s AI moment to the arrival of compilers in the 1970s. Instead of creating a world where a single programmer does the work of a hundred, perhaps every engineer will be a hundred times more productive. A manager agreed: “Yeah, as a manager I would say, like—everybody’s just doing more,” she said. Another teammate pointed out that large-scale challenges still demand collaboration, rigorous design discussions, and step-by-step planning. AI shines in rapid prototyping but can’t replace collective scrutiny.

As I wrapped up my embed, I found myself convinced that humans will stay firmly in the loop, even if productivity increases ten- or hundredfold. Experts in niche domains, I told myself, will remain indispensable.

In my final hour, I ducked into Ivan Zhao’s glass-walled office overlooking the street. I said, “I’m realizing that, this whole time, I didn’t even ask what language we’re coding in.”

He leaned back in his chair and let out a laugh. “It’s TypeScript. It’s like a fancier version of JavaScript.” He paused, then added, “But what language you’re using doesn’t matter. You express your intent on the human-language, English level, and the machines translate it. That’s what language models are fundamentally doing.”

When Zhao and Last first teamed up in the early 2010s, they imagined a no code/low code platform to help people build software without a technical background. That vision flopped. “Nobody cared,” Zhao recalled. “Nobody woke up saying, ‘I want to build software today.’ Most people just needed to finish a spreadsheet for their boss.”

They pivoted to note-taking and database blocks, poured months into refining the product, and eventually hit on the flexible workspace that teams now swear by. By October 2022, when the company numbered only a few hundred employees, the founders gathered in Mexico for an off-site. They hunkered down in hotel rooms, bowls of bottled water at hand, and spent long nights experimenting with early ChatGPT. They saw how generative AI could circle them back to their original dream.

Zhao’s personal aesthetic shines through in his surroundings. Born in China and educated in cognitive science and art in Canada, he appreciates well-crafted tools. He wears a luxury watch gifted by his wife, admires fine furniture, and carries a small stack of design books in his office. He counts Douglas Engelbart, the inventor of the computer mouse, among his heroes.

Curious, I asked whether vibe coding risks flooding the world with low-quality software. He shook his head. “Code is either correct or it isn’t,” he said. “You might be called a bad writer for poor sentences, but if software fails, it simply doesn’t run.”

Still, he acknowledged that AI-generated code can veer off track. He warned that younger engineers might develop overconfidence. “That’s why pair programming is critical,” he said. “Senior-level folks—they have taste, right? They can spot when something feels off and course-correct.”

Simon Last told me he holds AI coding agents to even higher standards than human peers. He dislikes the phrase “vibe coding,” because it undersells the craft. At one point, he was juggling three AI tools at once and described it as overwhelming. Now he usually sticks to a single model for each session.

I asked him about hiring. He sighed. “I mean, at least right now, we’re still super actively hiring engineers. But we do want to hire engineers that are really bullish on coding tools.”

All these shifts have taken shape in the past half year. Notion now has an AI engineer attached to its enterprise sales team, training sales reps in prompt strategy and custom agent scripts. Other companies are racing down the same path. My two-day experiment already felt a step behind.

Simon summed it up plainly: “The world is heating up in many ways, and the sense I have is not ‘I freed up more time’ but that there’s more urgency than ever to use these tools,” he said. He admitted the rapid pace excites him even as it unnerves him. He thinks fondly of the days when he sat alone at a terminal, writing code line by line. “I think it would be crazy not to be a bit scared,” he said.

Only after I stepped out of the Notion building on that Friday afternoon did I recall the one question I’d never asked: Scared of what?

Keep building

Join Skool — Ship Your First Microapp Back to feed