What is it about?

Transformers have replaced recurrent networks (RNNs) to process sequences as they enable modeling arbitrary long dependencies, thereby improving the state-of-the-art across a wide range of tasks such as natural language processing (NLP), computer vision (CV), and biological sequence analysis. However, the Transformer comes at the expense of a quadratic computational and memory complexity with respect to the sequence length. Fortunately, the deep learning community has always been interested in improving the models' efficiency, leading to a plethora of solutions such as parameter sharing, pruning, mixed-precision, and knowledge distillation. Recently, researchers have addressed the Transformer's limitation by designing lower-complexity alternatives such as the Longformer, Reformer, Linformer, and Performer. This survey investigates popular approaches to make Transformers faster and lighter, and provides a comprehensive explanation of the methods' strengths, limitations, and underlying assumptions.

Featured Image

Why is it important?

Due to the wide range of solutions, it has become challenging for researchers and practitioners to determine which methods to apply in practice in order to meet the desired trade-off between capacity, computation, and memory. Furthermore, a lower theoretical complexity is not guaranteed to speed up the model in practice due to implementation and hardware considerations. This survey aims at clarifying the strengths, limitations, and underlying assumptions of the most popular methods proposed to make Transformers faster and lighter.

Read the Original

This page is a summary of: A Practical Survey on Faster and Lighter Transformers, ACM Computing Surveys, March 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3586074.
You can read the full text:

Read

Contributors

The following have contributed to this page