What is it about?
We study the effect of TanH, ReLU, and ELU activation functions on the neural network (NN) loss landscapes. We observe that the activation functions have no effect on the number of local minima, but significantly change the connectivity between them. ELU yields superior generalisation performance. As a bonus, we show that narrow valleys in the NN landscape contain (1) saturated neurons and (2) self-regularised solutions.
Featured Image
Photo by Uriel SC on Unsplash
Why is it important?
Previous research assumed that solutions that generalise well can be found in flat minima. We challenge this assumption. We also provide an explanation for why a choice of activation function can have a noticeable effect on the success of NN training.
Perspectives
A version of this article was featured as a chapter in my PhD. I am happy to have it published at last!
Anna Bosman
University of Pretoria
Read the Original
This page is a summary of: Empirical Loss Landscape Analysis of Neural Network Activation Functions, July 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3583133.3596321.
You can read the full text:
Resources
Contributors
The following have contributed to this page