The Impact of an Attention Mechanism on the Representations in Neural Networks, Focusing on Catastrophic Forgetting and Robustness to Input Noise
- Authors
- Publication Date
- Jan 01, 2024
- Source
- DiVA - Academic Archive On-line
- Keywords
- Language
- English
- License
- Green
- External links
Abstract
This study explores how attention mechanisms impact representation distributions within neural networks, focusing on catastrophic forgetting and robustness to input noise. We compare Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), their attention-enhanced counterparts (RNNA, LSTMA, GRUA), and the Transformer model using musical sequences from "Daisy Bell". A key finding is the difference in how these models distribute the information in their representation. Base models like RNN, LSTM, and GRU concentrate information within specific nodes, while attention-enhanced models spread information across more nodes, demonstrating greater robustness to input noise. This is shown by significant differences in performance deterioration between base models and their attention-augmented versions. However, base models such as RNN and GRU exhibit better resistance to catastrophic forgetting compared to their attention-enhanced counterparts. Despite this, attention models show a positive correlation between higher overlap percentages in their representations and improved accuracy for certain tasks, alongside a negative correlation with higher numbers of empty nodes. The Transformer model stands out by maintaining high accuracy across tasks, likely due to its self-attention mechanisms. These results suggest that while attention mechanisms enhance robustness to noise, further research is needed to address catastrophic forgetting in neural networks.