Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection

Rui Dai; Srijan Das; Francois F Bremond

Communication Dans Un Congrès Année : 2021

Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection

(1, 2) , (3) , (1, 2)

1
2
3

Rui Dai

Fonction : Auteur
PersonId : 1057956

Université Côte d'Azur

Spatio-Temporal Activity Recognition Systems

Srijan Das

Fonction : Auteur

Stony Brook University [SUNY]

Francois F Bremond

Fonction : Auteur
PersonId : 20805
IdHAL : francois-bremond
ORCID : 0000-0003-2988-2142
IdRef : 138919046

Université Côte d'Azur

Spatio-Temporal Activity Recognition Systems

Résumé

In video understanding, most cross-modal knowledge distillation (KD) methods are tailored for classification tasks, focusing on the discriminative representation of the trimmed videos. However, action detection requires not only categorizing actions, but also localizing them in untrimmed videos. Therefore, transferring knowledge pertaining to temporal relations is critical for this task which is missing in the previous cross-modal KD frameworks. To this end, we aim at learning an augmented RGB representation for action detection, taking advantage of additional modalities at training time through KD. We propose a KD framework consisting of two levels of distillation. On one hand, atomic-level distillation encourages the RGB student to learn the sub-representation of the actions from the teacher in a contrastive manner. On the other hand, sequence-level distillation encourages the student to learn the temporal knowledge from the teacher, which consists of transferring the Global Contextual Relations and the Action Boundary Saliency. The result is an Augmented-RGB stream that can achieve competitive performance as the two-stream network while using only RGB at inference time. Extensive experimental analysis shows that our proposed distillation framework is generic and outperforms other popular cross-modal distillation methods in action detection task.

Mots clés

Action Detection Video Understanding Transfer Learning OPAL-Meso

Domaines

Informatique [cs] Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

Dai_ICCV21.pdf (1.75 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Rui DAI : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03314575

Soumis le : jeudi 5 août 2021-10:22:35

Dernière modification le : lundi 26 février 2024-11:22:14

Archivage à long terme le : samedi 6 novembre 2021-18:13:30

Dates et versions

hal-03314575 , version 1 (05-08-2021)

Identifiants

HAL Id : hal-03314575 , version 1

Citer

Rui Dai, Srijan Das, Francois F Bremond. Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection. ICCV 2021 - IEEE/CVF International Conference on Computer Vision, Oct 2021, Montreal, Canada. ⟨hal-03314575⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA INRIA2 UNIV-COTEDAZUR OPAL 3IA-COTEDAZUR ANR

144 Consultations

122 Téléchargements

Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager