Instance-Level Recognition and Generation Workshop at ICCV'25

Workshop Location: Honolulu, Hawaii

Oct. 19, 8:30am-12:30pm (HST)

ILR+G 2025

The main focus of our workshop is on computer vision tasks that operate at instance-level, including both recognition (instance-level recognition - ILR) and generation (instance-level generation - ILG), denoted as ILR+G. More precisely, ILR+G is the task of identifying, comparing, and generating images of specific objects, scenes, or events.

This year, we expand the scope of our workshop to ILG and the potential synergy between ILG and ILR. We will organize a call for papers, and host keynote talks by renowned speakers and invited paper talks from the main conference.

The 2025 Instance-Level Recognition and Generation (ILR+G) Workshop is a follow-up of six successful editions of our previous workshops — the first two having focused only on landmark recognition (CVPRW18, CVPRW19), the following ones expanding to the domains of artworks and products (ECCVW20, ICCVW21), introducing the universal image embedding problem (ECCVW22), and the latest one introducing a call for papers (ECCVW24).

Workshop Schedule

Welcome Remarks

Oct. 19, 8:30am-8:40am (HST)

Keynote 1

Sara Beery
What do we really need from an instance model?

Oct. 19, 8:40am-9:10am (HST)

Keynote 2

Matej Kristan
Robust general visual object tracking in the presence of distractors

Oct. 19, 9:35am-10:05am (HST)

Poster session & Coffee break

Oct. 19, 10:05am-11:15am (HST)

Keynote 3

Mark Boss
No Two Alike: Generation and editing, Instance by Instance

Oct. 19, 11:55am-12:25pm (HST)

Closing Remarks

Oct. 19, 12:25pm-12:30pm (HST)

Keynote Speakers

Sara Beery

Assistant Professor at MIT CSAIL

What do we really need from an instance model?

Matej Kristan

Full professor at the University of Ljubljana

Robust general visual object tracking in the presence of distractors

Mark Boss

Co-Head of 3D & Image at Stability AI

No Two Alike: Generation and editing, Instance by Instance


Accepted Papers

Long Papers

  • Motion-Refined DINOSAUR for Unsupervised Multi-Object Discovery (Oral)
    Xinrui Gong, Oliver Hahn, Christoph Reich, Krishnakant Singh, Simone Schaub-Meyer, Daniel Cremers, Stefan Roth
  • Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation (Oral)
    Volodymyr Havrylov, Haiwen Huang, Dan Zhang, Andreas Geiger
  • Short Papers

  • INST-AP: Instance-Aware Vision-Language Pre-Train for Spatial-Temporal Understanding
    Ashutosh Kumar, Quan Kong, Jingjing Pan, Rajat Saini, Mustafa Erdogan, Betty Le Dem, Norimasa Kobori
  • SOS: Synthetic Object Segments Improve Detection, Segmentation, and Grounding
    Weikai Huang, Jieyu Zhang, Taoyang Jia, Chenhao Zheng, Ziqi Gao, Jae Sung, Ranjay Krishna
  • Towards Agentic AI for Multimodal-Guided Video Object Segmentation
    Tuyen Tran, Thao Le, Truyen Tran
  • Invited Papers

  • ObjectMate: A Recurrence Prior for Object Insertion and Subject‑Driven Generation (Oral)
    Daniel Winter, Asaf Shul, Matan Cohen, Dana Berman, Yael Pritch, Alex Rav-Acha, Yedid Hoshen
  • PanSt3R: Multi-view Consistent Panoptic Segmentation (Oral)
    Lojze Zust, Yohann Cabon, Juliette Marrie, Leonid Antsfeld, Boris Chidlovskii, Jerome Revaud, Gabriela Csurka
  • DiffSim: Taming Diffusion Models for Evaluating Visual Similarity (Oral)
    Yiren Song, Xiaokang Liu, Mike Zheng Shou
  • Steering Guidance for Personalized Text-to-Image Diffusion Models
    Sunghyun Park, Seokeon Choi, Hyoungwoo Park, Sungrack Yun
  • Generalizable Object Re-Identification via Visual In-Context Prompting
    Zhizhong Huang, Xiaoming Liu
  • Processing and acquisition traces in visual encoders: What does CLIP know about your camera?
    Ryan Ramos, Vladan Stojnić, Giorgos Kordopatis-Zilos, Yuta Nakashima, Giorgos Tolias, Noa Garcia
  • Structure Matters: Revisiting Boundary Refinement in Video Object Segmentation
    Guanyi Qin, Ziyue Wang, Daiyun Shen, Haofeng Liu, Hantao Zhou, Junde Wu, Runze Hu, Yueming Jin
  • AstroLoc: Robust Space to Ground Image Localizer
    Gabriele Berton, Alex Stoken, Carlo Masone
  • What Holds Back Open-Vocabulary Segmentation?
    Josip Šarić, Ivan Martinović, Matej Kristan, Siniša Šegvić
  • Infusing fine-grained visual knowledge to Vision-Language Models
    Nikolaos-Antonios Ypsilantis, Kaifeng Chen, André Araujo, Ondřej Chum
  • ILIAS: Instance-Level Image retrieval At Scale
    Giorgos Kordopatis-Zilos, Vladan Stojnić, Anna Manko, Pavel Šuma, Nikolaos-Antonios Ypsilantis, Nikos Efthymiadis, Zakaria Laskar, Jiří Matas, Ondřej Chum, Giorgos Tolias
  • Personalized Representation from Personalized Generation
    Shobhita Sundaram, Julia Chae, Yonglong Tian, Sara Beery, Phillip Isola
  • Call For Papers

    We call for novel and unpublished work in the format of long papers (up to 8 pages) and short papers (up to 4 pages). Papers should follow the ICCV proceedings style and will be reviewed in a double-blind fashion. Selected long papers will be invited for oral presentations; all accepted papers will be presented as posters. Only long papers will be published in the ICCV workshop proceedings. All submissions will be handled electronically via the CMT conference submission website.

    Topics of interest include

  • instance-level object classification, detection, segmentation, and pose estimation
  • particular object (instance-level) and event retrieval
  • personalized (instance-level) image and video generation
  • cross-modal/multi-modal recognition at instance-level
  • image matching, place recognition, video tracking
  • other ILR+G applications or challenges
  • ILR+G datasets and benchmarking

  • The task of person and vehicle re-identification clearly falls within our definition of ILR. Nevertheless, because of its social implications, we intentionally omit it from the list of topics.

    Important Dates

  • Submission deadline: June 21, 2025 June 26, 2025 June 28, 2025 (23:59 AoE)
  • Paper notification: July 10, 2025 (23:59 AoE)
  • Camera-ready deadline: August 17, 2025 (23:59 AoE)

  • Questions? Please reach out to us at ilr-workshop@googlegroups.com

    Organizers

    Andre Araujo

    Google DeepMind

    Bingyi Cao

    Google DeepMind

    Kaifeng Chen

    Google DeepMind

    Ondrej Chum

    Czech Technical University

    Noa Garcia

    Osaka University

    Guangxing Han

    Google DeepMind

    Giorgos Kordopatis-Zilos

    Czech Technical University (Primary Contact)

    Giorgos Tolias

    Czech Technical University

    Hao Yang

    Amazon

    Nikolaos-Antonios Ypsilantis

    Czech Technical University

    Xu Zhang

    Amazon

    The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.

    © 2025 ILR+G 2025

    We thank Jalpc for the jekyll template