International Challenge on Compositional and Multimodal Perception

International Conference on Computer Vision 2021 Workshop

ICCV2021 Competition

We have 2 challenge tasks in CAMP2021 competition.

Sample code of dataloader

You can use sample scripts for preparing for competition.

dataloader(hierarchical_homage_single_sample.zip)

Important dates

Evaluation page open: September 8, 2021

Evaluation page close: October 7, 2021

Report submission deadline: October 12, 2021

Workshop: October 17, 2021

Challenge #1: Scene-graph Generation

About

We use scene graphs to describe the relationship between a person and the object used during the execution of an action. In this track, the algorithms need to predict per-frame scene graphs, including how they change as the video progresses. For this track, participants are also allowed to leverage audio information. External datasets for pre-training are allowed, but it needs to be clearly documented. Since there are multiple relationships between each pair of human and object, there is no graph constraint (or single-relationship constraint).

competition2

Evaluation Metric

For evaluation of scene graph prediction, we use the evaluation metric as Scene graph classification (SGCLS). The task is to predict object categories and predicate labels between the person and each object. Participants can use input information as video, other modalities and ground truth boxes. Evaluation metrics is recall@k, we compute the fraction of times the ground truth relationship triplets are predicted in the top k most confident relationships predictions in each tested frame. We will use k=10, 20.

Evaluation code and submission Format

Here is the evaluation code for task1.

evaluation_code_task1.zip

Challenge #2: Privacy Concerned Activity Recognition

About

Privacy-sensitive recognition method is very important for practical application. This task is Video-level activity recognition, but in specially, input videos are blurred, look like recorded with a multi-pinhole camera. In training models, participants can use unblurred images, but test data has only blurred images. External datasets for pre-training are allowed, but it needs to be clearly documented.

competition3

Evaluation Metric

This task is video understanding without a clear video image. Participants predict k activity labels for each video, and each candidate predicted label is checked whether or not it is consistent with ground truth activity label to calculate scores. We will use k=1 and k=5 and evaluate recognition methods with the average of these two scores.

Evaluation code and submission Format

Here is the evaluation code for task2.

evaluation_code_task2.zip