2026-05-28 04:00 UTCOriginal source2 min readUpdated: 2026-06-30 13:03 UTC

Representation-Conditioned Diffusion Models for Guided Training Data Generation

This work proposes representation-conditioned diffusion models that leverage learned representations from DINOv2, DINOv3, and CLIP to generate synthetic image data. On ImageNet100, this approach outperforms class-conditioned generation by +10.76 p.p. top-1 accuracy. Scaling synthetic data can even surpass real-data training by +2.0 p.p. The method also excels in data augmentation and sample filtering, offering a promising way to augment or replace real datasets in large-scale visual learning.

SourcearXiv Computer VisionAuthor: Nithesh Chandher Karthikeyan, Jonas Unger, Gabriel Eilertsen

[2605.27495] Representation-Conditioned Diffusion Models for Guided Training Data Generation

[Submitted on 26 May 2026]

Title:Representation-Conditioned Diffusion Models for Guided Training Data Generation

View a PDF of the paper titled Representation-Conditioned Diffusion Models for Guided Training Data Generation, by Nithesh Chandher Karthikeyan and 2 other authors

View PDF HTML (experimental)

Abstract:Data availability remains a critical bottleneck in many deep learning applications. Large-scale datasets are often expensive to collect, curate and annotate, which can limit the scalability and applicability of supervised learning methods. In this work, we evaluate the classification performance of models trained on synthetic image datasets produced by generative deep learning. In particular, we use latent diffusion models conditioned on learned representations from DINOv2, DINOv3, and CLIP. Our results demonstrates that this representation-conditioned formulation significantly outperforms class-conditioned generation by a large margin (+10.76 p.p. top-1 accuracy on ImageNet100), by improving sample quality and mode coverage. Furthermore, by scaling the size of the synthetic dataset, we are able to outperform a classifier trained on the real data (+2.0 p.p top-1 accuracy).

We also demonstrate how generated images can be used for augmentation purposes, outperforming classical augmentation methods, and how the conditioning space can be used for sample filtering to further improve training value. Collectively, these findings highlight that representation-conditioned diffusion models provide a promising approach for augmenting, complementing, or potentially replacing real-world datasets in large-scale visual learning tasks.

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Cite as: arXiv:2605.27495 [cs.CV]

(or arXiv:2605.27495v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2605.27495

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Nithesh Chandher Karthikeyan Mr [view email] [v1] Tue, 26 May 2026 17:32:50 UTC (299 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled Representation-Conditioned Diffusion Models for Guided Training Data Generation, by Nithesh Chandher Karthikeyan and 2 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.CV

new | recent | 2026-05

Change to browse by:

cs cs.LG

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)