New research sheds light on how diffusion models—used in artificial intelligence (AI) image generation—operate across hierarchical data structures, revealing unexpected transitions between high-level and low-level features during the generative process.

AI image generators don’t just create—they reorganize and rebuild, revealing hidden hierarchies in the data they process.
AI has made remarkable strides in generating lifelike images, but how these systems work under the hood remains a mystery. At the heart of these advancements lie diffusion models, algorithms capable of turning noise into intricate, high-quality visuals.
Understanding data structure is crucial in machine learning, especially when handling complex information like images. Unlike random data, real-world visuals—think photographs or paintings—contain layers of structured information. This structure allows AI systems to learn efficiently, even with limited examples, by identifying patterns and hierarchies.
Diffusion models, like those behind cutting-edge tools such as DALL·E or Stable Diffusion, progressively add and remove noise to recreate images. Scientists have long suspected these models tap into the hierarchical composition of images, but the exact mechanisms were unclear. How do these systems balance low-level details, like texture, and high-level concepts, such as identifying an object?
A team led by Antonio Sclocchi, Alessandro Favero, and Matthieu Wyart at EPFL tackled this question, using a combination of experiments and theoretical analysis to reveal how diffusion models transition between different levels of image detail and structure.
The study focused on diffusion models, uncovered a “phase transition” in the generative process: at a certain noise threshold, the system abruptly shifts from preserving high-level concepts, like image class, to blending low-level features into entirely new compositions.
The researchers used a process called forward-backward diffusion. First, they progressively added noise to high-resolution images and then reversed the process to regenerate the visuals. By analyzing thousands of examples from the ImageNet database, they traced how specific features—like the eyes of a leopard or the stripes of a tiger—changed during the reconstruction.
To dig deeper, the team developed a synthetic model that mirrors the hierarchical structure of real-world images. This model allowed them to simulate and predict the behavior of AI systems under varying levels of noise, using mathematical techniques like belief propagation on tree-like data graphs.
The study showed that at low noise levels, diffusion models alter minor details while retaining the broader class of the image. For example, a leopard might still look like a leopard, with only slight changes in fur patterns. But beyond a critical noise threshold, the class information collapses—an image could transform into a wolf or a butterfly, yet retain some visual features, like colors or shapes, from the original leopard.
This phenomenon suggests that diffusion models process data hierarchically, working through layers of abstraction. At early stages, they refine fine details; at later stages, they blend foundational elements to create something new, often from different categories.
The research highlights the potential that diffusion models have to be tools for understanding hierarchical and compositional data in machine learning. Beyond generating captivating visuals, these models could help researchers tackle broader challenges in data representation and organization. Applications range from improving image editing tools to advancing AI-driven creativity in areas like art, design, and even science.
Other Contributors
EPFL Institute of Electrical and Micro Engineering
Funding
Simons Foundation
References
Antonio Sclocchi, Alessandro Favero, Matthieu Wyart. A phase transition in diffusion models reveals the hierarchical nature of data. PNAS 122 (1) e2408799121, 02 January 2025. DOI: 10.1073/pnas.2408799121
Author: Nik Papageorgiou
Source: Basic Sciences | SB