Instance-Level Recognition Workshop at ECCV'22


Visual instance-level recognition and retrieval are fundamental tasks in computer vision. Despite the recent advances in this field, many techniques have been evaluated on a limited number of domains, with small number of classes. We believe that the research community can benefit from a new suite of datasets and associated challenges, to improve the understanding about the limitations of current technology, and with an opportunity to introduce new techniques.

This year, we propose the first Universal Image Embedding Challenge, where the goal is to develop image representations that work well across several domains combined.

The Instance-Level Recognition (ILR) Workshop is a follow-up of four successful editions of our previous workshops — the first two having focused only on landmark recognition (CVPRW18, CVPRW19), and the latest two expanded to two extra domains (artworks and products) (ECCVW20, ICCVW21).

Our workshop location is the Grand Ballroom B, at the David Intercontinental hotel

Workshop Topics

Universal Image Embedding Challenge

Large-scale evaluation of universal embedding models. The data comprises imagery from several domains, e.g., landmarks, artworks, cars, furniture, apparel, toys and storefronts.
Challenge Website

Language Assisted Product Retrieval Challenge

Multi-modal product recognition using a product image and user textual feedback to find the product instance fulfilling the user’s request.
Challenge Website

Workshop Schedule

Welcome Remarks

Andre Araujo
Oct. 24th, 9:00am-9:10am (GMT+3)

Keynote talk

Vicente Ordonez, Rice University
(Hosted by Tobias Weyand) Oct. 24th, 9:10am-9:40am (GMT+3)

Universal Embedding Challenge: Overview and Presentations

Bingyi Cao
Oct. 24th, 9:40am-10:10am (GMT+3)

Keynote talk

Minsu Cho, POSTECH
(Hosted by Bohyung Han) Oct. 24th, 10:10am-10:40am (GMT+3)


Oct. 24th, 10:40am-11:10am (GMT+3)

Language Assisted Product Search: Overview

Xu Zhang
Oct. 24th, 11:10am-11:30am (GMT+3)

Invited paper talk - "Granularity-aware Adaptation for Image Retrieval over Multiple Tasks"

Jon Almazan, Byungsoo Ko, Geonmo Gu, Diane Larlus and Yannis Kalantidis
(Hosted by Torsten Sattler) Oct. 24th, 11:30am-11:40am (GMT+3)

Invited paper talk - "Where in the World is this Image? Transformer-based Geo-localization in the Wild"

Shraman Pramanick, Ewa Nowara, Joshua Gleason, Carlos Castillo and Rama Chellappa
(Hosted by Torsten Sattler) Oct. 24th, 11:40am-11:50am (GMT+3)

Invited paper talk - What to Hide from Your Students: Attention-Guided Masked Image Modeling

Ioannis Kakogeorgiou, Spyros Gidaris, Bill Psomas, Yannis Avrithis, Andrei Bursuc, Konstantinos Karantzalos and Nikos Komodakis
(Hosted by Torsten Sattler) Oct. 24th, 11:50am-12:00pm (GMT+3)

Keynote talk

Mathilde Caron, Google Research
(Hosted by Giorgos Tolias) Oct. 24th, 12:00pm-12:30pm (GMT+3)

Closing Remarks

Andre Araujo
Oct. 24th, 12:30pm-12:35pm (GMT+3)

Invited Speakers

Minsu Cho

Associate Professor of CSE & AI, POSTECH

Few-Shot Learning for Object-Aware Visual Recognition

Few-shot learning has been actively studied for visual recognition tasks such as image classification and semantic segmentation. Existing methods, however, are limited in understanding diverse levels of visual cues and analyzing fine-grained correspondence relations between the query and the support images. This prevents few-shot learning from generalizing to and evaluating on more realistic cases in the wild. In this talk, I will introduce our recent work for object-aware few-shot learning that tackles the challenge by leveraging multi-level feature correlations and high-order convolution/self-attention. I will also present an integrative few-shot learning framework that combines two conventional few-shot learning problems, few-shot classification and segmentation, and generalizes them to more realistic episodes with arbitrary image pairs, where each target class may or may not be present in the query. In experiments, the proposed method shows promising performance on the joint problem of classification and segmentation and also achieves the state of the art on standard few-shot segmentation benchmarks.

Vicente Ordonez

Associate Professor of Computer Science, Rice University

Searching for Images: Granularity, Compositionality, Interpretability, and Multimodality

Visual representation learning has made great progress in recent years. We can reuse models that can map images to a common semantic space with a high degree of accuracy, however, using these representations for searching images among large collections is not enough. When users search for images in their personal image collections, on an e-commerce site, or on the web, they generally search with very different intentions. I will outline here some desirable properties of retrieval models and outline some of our work in instance-level image retrieval with Reranking Transformers (RRTs) (, iterative and interpretable retrieval using a Drill-down approach (, and our more recent work on multilingual and multimodal learning (,

Mathilde Caron

Research Scientist at Google Research

Instance-level recognition for self-supervised learning, and vice versa

Self-supervised learning (SSL) consists in training neural network systems without using any human annotations. In this talk, I will present how instance-level recognition has inspired the recent successful approaches in SSL. Secondly, I will show how self-supervised models can give back to the instance-level recognition community by providing features more suited to tackle challenging image retrieval benchmarks than their class-level supervised counterparts.


Andre Araujo

Google Research (Primary Contact)

Bingyi Cao

Google Research

Ondrej Chum

Czech Technical University

Noa Garcia

Osaka University

Bohyung Han

Seoul National University

Shih-Fu Chang

Columbia University

Guangxing Han

Columbia University

Pradeep Natarajan

Amazon Alexa

Giorgos Tolias

Czech Technical University

Tobias Weyand

Google Research

Xu Zhang

Amazon Alexa

Torsten Sattler

Czech Technical University

Sanqiang Zhao

Amazon Alexa

© 2022 ILR2022

We thank Jalpc for the jekyll template