Article

Study Finds AI’s Blueprint Gives Birth to Unexpected Creativity

DATE: 8/24/2025 · STATUS: LIVE

AI image generators max out on memorization yet spark surprising originality through tweaks, learn how these quirks are reshaping creativity…

Study Finds AI’s Blueprint Gives Birth to Unexpected Creativity
Article content

Image generators attempt to reproduce visuals from their training sets, so their knack for producing fresh compositions has puzzled experts. A new study suggests that the creativity we see is an unavoidable side effect of the models’ internal workings. Instead of memorizing each picture perfectly, these systems introduce small technical variations that add up to novel output. Tools such as DALL·E, Imagen and Stable Diffusion feed on these subtle shifts to generate visuals that feel original. Researchers long assumed that perfect copying would eliminate any flair, but the recent findings trace this flair back to the denoising routine itself.

Not long ago, experts predicted fleets of driverless cars and automated household aides. What actually arrived were AI systems that can outplay grandmasters in chess, sift through vast text archives and even craft poetry. This split between physical and cognitive abilities has surprised many. Tasks like opening doors or loading a dishwasher remain tough for robots, yet algorithms now imitate elements of human thought with impressive success. Another puzzle for researchers has been the models’ flair for mixing and matching image parts to create coherent yet new compositions. This behavior seems at odds with their training goal of copying existing data.

Diffusion models underlie services like DALL·E, Imagen and Stable Diffusion. They break images into random noise and then clean up that noise, a process called denoising. In theory, a perfect model would simply reconstruct every training image. In reality, these systems recombine elements to form fresh scenes with coherent shapes and objects. This creates what Giulio Biroli, a physicist and AI researcher at the École Normale Supérieure in Paris, calls a “paradox.” “If they worked perfectly, they should just memorize,” he said. “But they don’t—they’re actually able to produce new samples.” The images generated aren’t just random blobs. Users can ask the model to sketch an astronaut riding a horse and receive a plausible scene that never existed. That creative leap takes place without any direct command to invent.

Think of denoising like shredding a painting into countless specks of dust then trying to reconstruct it piece by piece. Diffusion models repeatedly add noise to every pixel until the original image vanishes, then reverse the process to recover it. For years this puzzle nagged researchers: if the model is simply reassembling pieces, where does originality arise? It feels as odd as taking shredded art and arranging the scraps into a totally new composition. That question has driven attempts to peek inside the pipelines of these generative systems. Researchers threw a spotlight on the random-to-structured feedback loop, but results often landed in theory papers with few practical insights. Without a clear explanation, creativity in these models has felt like magic.

Recently, two physicists presented a bold argument at the International Conference on Machine Learning 2025. They argue that the very imperfections in the denoising steps spark the models’ inventive capacity. In their paper, they built a mathematical framework that treats the trained diffusion process as a set of equations. That framework shows novelty is not a random glitch but a predictable outcome of the model’s structure. By shining a light on the inner math, their work may reshape how AI research explores generative systems. Luca Ambrogioni, a computer scientist at Radboud University in the Netherlands, praised the precision of their predictions. He said their model accounts for subtle patterns most people assumed were random. “The real strength of the paper is that it makes very accurate predictions of something very nontrivial,” he said.

Mason Kamb, a Stanford graduate student in applied physics and the study’s lead author, has a background in biological self-organization. He has long explored morphogenesis: the process that shapes cells into organs and limbs in living creatures. A classic example is a Turing pattern, named for mathematician Alan Turing, which describes how simple local interactions can organize large structures. In embryonic development, no single cell holds the blueprint for an entire organism. Cells react to signals from their neighbors, adjusting and coordinating at a local level. This strategy usually succeeds, though rare errors can cause extra toes or fingers to form on a limb. He wondered if a similar bottom-up approach might explain quirks in image generation models.

When early diffusion outputs surfaced online, a number showed humans with too many digits or odd proportions. Kamb saw a link to morphogenesis glitches: “It smelled like a failure you’d expect from a [bottom-up] system,” he said. That observation drove him to examine whether the local rules governing image assembly might produce both realistic details and strange artifacts. A bottom-up approach in a neural network, without a global planner, might stitch together pixels in ways that feel coherent but stray from reality in subtle ways. He noted that diffusion pipelines rely on two major shortcuts: they process one small patch at a time and they obey translational equivariance, meaning a slight shift in input shifts output by the same amount.

In the patch-based approach, the model focuses on a single cluster of pixels, ignoring the rest until it returns later. Translational equivariance keeps structures aligned: shift the input two pixels right and the output follows suit automatically. These constraints keep the model from cheating by memorizing entire images. In the researchers’ view, they lie at the heart of its creative behavior.

Before this work, many in the AI community treated local attention and equivariance as annoying limitations. They saw these traits as factors that prevented perfect image cloning, not as sources of originality. Creativity was often chalked up to chance or higher-level modeling decisions. Most technical notes on diffusion skipped over these quirks, focusing instead on scaling laws or loss functions. The idea that a model’s restrictions might spawn invention had not been tested. That view held firm until Mason Kamb and his advisor Surya Ganguli put it to the test.

Kamb joined Stanford in 2022 under Ganguli’s mentorship, just as ChatGPT sparked a generative AI boom. With many labs racing to build larger models, Kamb aimed to break open the black box. He proposed that enforcing the model’s focus on a tiny patch at a time along with shift-equivariance would by itself yield creative output. That became the central question of his new paper, which he coauthored with Ganguli. They set out to isolate those two ingredients and see if they could reproduce the magic seen in full diffusion pipelines. The key was to test whether these rules alone drive the same mix of accuracy and surprise.

They called their testbed the equivariant local score (ELS) machine. It’s a set of equations that applies only locality and shift-equivariance, with no training on images. Given a collection of noisy pictures, the ELS machine predicts how each pixel patch should be denoised. They then fed the same noisy inputs to popular diffusion frameworks such as ResNets and U-Nets. To their surprise, the ELS equations produced images that matched the trained models’ outputs with striking precision. On average, the match rate exceeded ninety percent. Surya Ganguli said that level of agreement was unheard of in machine learning. It confirmed that the two simple rules could explain most of what people call creativity in these image systems.

Kamb said the result hit home instantly. “As soon as you impose locality, [creativity] was automatic; it fell out of the dynamics completely naturally,” he said. The constraints that limit each patch’s view help force the system to invent plausible connections. The extra digits that show up in some faces emerge from the model’s patch-by-patch focus, without a global check to catch odd assemblies.

Experts say the paper clarifies one route to novelty but leaves out other factors. For example, large language models display creative text generation yet don’t rely on strict locality or translational equivariance. Giulio Biroli welcomes the insight and noted that it won’t explain all cases. “I think this is a very important part of the story,” he said, “[but] it’s not the whole story.” Many hope future work will connect these findings to creativity in other AI systems. Beyond vision, systems that generate music or code may hide similar local interactions worth studying.

The research offers a rare bridge between machine learning and biology. It’s almost as if neuroscientists had scanned several painters at work and found a shared circuit they could express in formulas. Some researchers think that might reflect how minds assemble images: drawing on stored fragments of memory and recombining them under simple constraints. Benjamin Hoover, who studies diffusion models at Georgia Tech and IBM Research, makes that comparison. “We assemble things based on what we experience, what we’ve dreamed, what we’ve seen, heard, or desire. AI is merely assembling the building blocks from what it’s seen and what it’s asked to do.” This view puts both human and machine creativity on common ground.

Both systems fill gaps in knowledge, assembling new designs from rough sketches of the world. When choices align just right, they produce what we call original ideas. These models, human or artificial, reflect the struggle to make sense of incomplete data. The new work maps that struggle onto clear mathematical rules. That clarity may guide future designs, helping engineers adjust constraints to tune novelty or fidelity.

The paper marks a step toward clarifying how generative models work. By calculating creativity instead of dismissing it, Kamb and Ganguli have set a new bar for theoretical transparency. As AI grows more complex, their equations may offer predictive power beyond brute-force scaling. Labs are already testing whether local rules influence language or audio generation. It seems the start of a more structured phase, where inventiveness emerges from the same math that shapes reproduction.

Keep building
END OF PAGE

Vibe Coding MicroApps (Skool community) — by Scale By Tech

Vibe Coding MicroApps is the Skool community by Scale By Tech. Build ROI microapps fast — templates, prompts, and deploy on MicroApp.live included.

Get started

BUILD MICROAPPS, NOT SPREADSHEETS.

© 2025 Vibe Coding MicroApps by Scale By Tech — Ship a microapp in 48 hours.