A Practical Survey on Faster and Lighter Transformers

Quentin Fournier; Gaétan Marceau Caron; Daniel Aloise

doi:10.1145/3586074

What is it about?

Transformers have replaced recurrent networks (RNNs) to process sequences as they enable modeling arbitrary long dependencies, thereby improving the state-of-the-art across a wide range of tasks such as natural language processing (NLP), computer vision (CV), and biological sequence analysis. However, the Transformer comes at the expense of a quadratic computational and memory complexity with respect to the sequence length. Fortunately, the deep learning community has always been interested in improving the models' efficiency, leading to a plethora of solutions such as parameter sharing, pruning, mixed-precision, and knowledge distillation. Recently, researchers have addressed the Transformer's limitation by designing lower-complexity alternatives such as the Longformer, Reformer, Linformer, and Performer. This survey investigates popular approaches to make Transformers faster and lighter, and provides a comprehensive explanation of the methods' strengths, limitations, and underlying assumptions.

Photo by Dan Meyers on Unsplash

Why is it important?

Due to the wide range of solutions, it has become challenging for researchers and practitioners to determine which methods to apply in practice in order to meet the desired trade-off between capacity, computation, and memory. Furthermore, a lower theoretical complexity is not guaranteed to speed up the model in practice due to implementation and hardware considerations. This survey aims at clarifying the strengths, limitations, and underlying assumptions of the most popular methods proposed to make Transformers faster and lighter.

This page is a summary of: A Practical Survey on Faster and Lighter Transformers, ACM Computing Surveys, March 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3586074.
You can read the full text:

Read

Contributors

The following have contributed to this page

Quentin Fournier

A Practical Survey on Faster and Lighter Transformers

What is it about?

Why is it important?

Contributors

You might also like

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

A Practical Survey on Faster and Lighter Transformers

What is it about?

Featured Image

Why is it important?

Read the Original

Contributors

Share this page:

You might also like

Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues

Theories of Value and Demonstrating their Practical Implementation in Academic Library Services

The gradient clusteron: A model neuron that learns to solve classification tasks via dendritic nonlinearities, structural plasticity, and gradient descent

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management