Relative Positional Encoding for Transformers with Linear Complexity

Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix, which is precisely what is avoided by such methods. In this paper, we bridge this gap and present Stochastic Positional Encoding as a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE. The main theoretical contribution is to make a connection between positional encoding and cross-covariance structures of correlated Gaussian processes. We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation.

Domaines

Intelligence artificielle [cs.AI] Informatique et langage [cs.CL] Apprentissage [cs.LG] Son [cs.SD] Traitement du signal et de l'image [eess.SP]

Fichier principal

spe.pdf (7.12 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Ondřej Cífka : Connectez-vous pour contacter le contributeur

https://telecom-paris.hal.science/hal-03256451

Soumis le : jeudi 10 juin 2021-11:42:27

Dernière modification le : vendredi 19 avril 2024-16:18:56

Archivage à long terme le : samedi 11 septembre 2021-18:34:18

Dates et versions

hal-03256451 , version 1 (10-06-2021)

Identifiants

HAL Id : hal-03256451 , version 1
ARXIV : 2105.08399

Citer

Antoine Liutkus, Ondřej Cífka, Shih-Lun Wu, Umut Şimşekli, Yi-Hsuan Yang, et al.. Relative Positional Encoding for Transformers with Linear Complexity. ICML 2021 - 38th International Conference on Machine Learning, Jul 2021, Virtual Only, United States. pp.7067-7079. ⟨hal-03256451⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM ENS-PARIS UNIV-RENNES1 CNRS INRIA IRISA ZENITH LIRMM INRIA2 PSL UR1-MATH-STIC UR1-UFR-ISTIC MIPS UNIV-MONTPELLIER UNIV-RENNES LTCI IDS S2A IP_PARIS ANR PRAIRIE-IA UR1-MATH-NUM

1602 Consultations

674 Téléchargements