Instance-Level Recognition and Generation Workshop at ICCV'25

Workshop Location: Honolulu, Hawaii

Oct. 19, 8:30am-12:30pm (HST)

ILR+G 2025

The main focus of our workshop is on computer vision tasks that operate at instance-level, including both recognition (instance-level recognition - ILR) and generation (instance-level generation - ILG), denoted as ILR+G. More precisely, ILR+G is the task of identifying, comparing, and generating images of specific objects, scenes, or events.

This year, we expand the scope of our workshop to ILG and the potential synergy between ILG and ILR. We will organize a call for papers, and host keynote talks by renowned speakers and invited paper talks from the main conference.

The 2025 Instance-Level Recognition and Generation (ILR+G) Workshop is a follow-up of six successful editions of our previous workshops — the first two having focused only on landmark recognition (CVPRW18, CVPRW19), the following ones expanding to the domains of artworks and products (ECCVW20, ICCVW21), introducing the universal image embedding problem (ECCVW22), and the latest one introducing a call for papers (ECCVW24).

Workshop Schedule

Welcome Remarks

Oct. 19, 8:30am-8:40am (HST)

Keynote 1

Sara Beery
What do we really need from an instance model?

Oct. 19, 8:40am-9:10am (HST)

Oral session - long papers

Oct. 19, 9:10am-9:35am (HST)

Keynote 2

Matej Kristan
Robust general visual object tracking in the presence of distractors

Oct. 19, 9:35am-10:05am (HST)

Poster session & Coffee break

Oct. 19, 10:05am-11:15am (HST)

Oral Session - Invited papers

Oct. 19, 11:15am-11:55am (HST)

Keynote 3

Mark Boss
No Two Alike: Generation and editing, Instance by Instance

Oct. 19, 11:55am-12:25pm (HST)

Closing Remarks

Oct. 19, 12:25pm-12:30pm (HST)

Keynote Speakers

Sara Beery

Assistant Professor at MIT CSAIL

What do we really need from an instance model?

Matej Kristan

Full professor at the University of Ljubljana

Robust general visual object tracking in the presence of distractors

Mark Boss

Co-Head of 3D & Image at Stability AI

No Two Alike: Generation and editing, Instance by Instance

Accepted Papers

Long Papers

Motion-Refined DINOSAUR for Unsupervised Multi-Object Discovery (Oral)
Xinrui Gong, Oliver Hahn, Christoph Reich, Krishnakant Singh, Simone Schaub-Meyer, Daniel Cremers, Stefan Roth

Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation (Oral)
Volodymyr Havrylov, Haiwen Huang, Dan Zhang, Andreas Geiger

Short Papers

INST-AP: Instance-Aware Vision-Language Pre-Train for Spatial-Temporal Understanding
Ashutosh Kumar, Quan Kong, Jingjing Pan, Rajat Saini, Mustafa Erdogan, Betty Le Dem, Norimasa Kobori

SOS: Synthetic Object Segments Improve Detection, Segmentation, and Grounding
Weikai Huang, Jieyu Zhang, Taoyang Jia, Chenhao Zheng, Ziqi Gao, Jae Sung, Ranjay Krishna

Towards Agentic AI for Multimodal-Guided Video Object Segmentation
Tuyen Tran, Thao Le, Truyen Tran

Invited Papers

ObjectMate: A Recurrence Prior for Object Insertion and Subject‑Driven Generation (Oral)
Daniel Winter, Asaf Shul, Matan Cohen, Dana Berman, Yael Pritch, Alex Rav-Acha, Yedid Hoshen

PanSt3R: Multi-view Consistent Panoptic Segmentation (Oral)
Lojze Zust, Yohann Cabon, Juliette Marrie, Leonid Antsfeld, Boris Chidlovskii, Jerome Revaud, Gabriela Csurka

DiffSim: Taming Diffusion Models for Evaluating Visual Similarity (Oral)
Yiren Song, Xiaokang Liu, Mike Zheng Shou

Steering Guidance for Personalized Text-to-Image Diffusion Models
Sunghyun Park, Seokeon Choi, Hyoungwoo Park, Sungrack Yun

Generalizable Object Re-Identification via Visual In-Context Prompting
Zhizhong Huang, Xiaoming Liu

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?
Ryan Ramos, Vladan Stojnić, Giorgos Kordopatis-Zilos, Yuta Nakashima, Giorgos Tolias, Noa Garcia

Structure Matters: Revisiting Boundary Refinement in Video Object Segmentation
Guanyi Qin, Ziyue Wang, Daiyun Shen, Haofeng Liu, Hantao Zhou, Junde Wu, Runze Hu, Yueming Jin

AstroLoc: Robust Space to Ground Image Localizer
Gabriele Berton, Alex Stoken, Carlo Masone

What Holds Back Open-Vocabulary Segmentation?
Josip Šarić, Ivan Martinović, Matej Kristan, Siniša Šegvić

Infusing fine-grained visual knowledge to Vision-Language Models
Nikolaos-Antonios Ypsilantis, Kaifeng Chen, André Araujo, Ondřej Chum

ILIAS: Instance-Level Image retrieval At Scale
Giorgos Kordopatis-Zilos, Vladan Stojnić, Anna Manko, Pavel Šuma, Nikolaos-Antonios Ypsilantis, Nikos Efthymiadis, Zakaria Laskar, Jiří Matas, Ondřej Chum, Giorgos Tolias

Personalized Representation from Personalized Generation
Shobhita Sundaram, Julia Chae, Yonglong Tian, Sara Beery, Phillip Isola

Call For Papers

We call for novel and unpublished work in the format of long papers (up to 8 pages) and short papers (up to 4 pages). Papers should follow the ICCV proceedings style and will be reviewed in a double-blind fashion. Selected long papers will be invited for oral presentations; all accepted papers will be presented as posters. Only long papers will be published in the ICCV workshop proceedings. All submissions will be handled electronically via the CMT conference submission website.

Topics of interest include

instance-level object classification, detection, segmentation, and pose estimation

particular object (instance-level) and event retrieval

personalized (instance-level) image and video generation

cross-modal/multi-modal recognition at instance-level

image matching, place recognition, video tracking

other ILR+G applications or challenges

ILR+G datasets and benchmarking

The task of person and vehicle re-identification clearly falls within our definition of ILR. Nevertheless, because of its social implications, we intentionally omit it from the list of topics.

Important Dates

Submission deadline: ~~June 21, 2025~~ ~~June 26, 2025~~ ~~June 28, 2025 (23:59 AoE)~~

Paper notification: ~~July 10, 2025 (23:59 AoE)~~

Camera-ready deadline: ~~August 17, 2025 (23:59 AoE)~~

Questions? Please reach out to us at ilr-workshop@googlegroups.com

Organizers

Andre Araujo

Google DeepMind

Bingyi Cao

Google DeepMind

Kaifeng Chen

Google DeepMind

Ondrej Chum

Czech Technical University

Noa Garcia

Osaka University

Guangxing Han

Google DeepMind

Giorgos Kordopatis-Zilos

Czech Technical University (Primary Contact)

Giorgos Tolias

Czech Technical University

Hao Yang

Amazon

Nikolaos-Antonios Ypsilantis

Czech Technical University

Xu Zhang

Amazon

The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.

We thank Jalpc for the jekyll template