Affordable Access

The Impact of an Attention Mechanism on the Representations in Neural Networks, Focusing on Catastrophic Forgetting and Robustness to Input Noise

Authors
  • Abdilrahim, Ahmad
  • Mokhtar, Alsiraira
Publication Date
Jan 01, 2024
Source
DiVA - Academic Archive On-line
Keywords
Language
English
License
Green
External links

Abstract

This study explores how attention mechanisms impact representation distributions within neural networks, focusing on catastrophic forgetting and robustness to input noise. We compare Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), their attention-enhanced counterparts (RNNA, LSTMA, GRUA), and the Transformer model using musical sequences from "Daisy Bell". A key finding is the difference in how these models distribute the information in their representation. Base models like RNN, LSTM, and GRU concentrate information within specific nodes, while attention-enhanced models spread information across more nodes, demonstrating greater robustness to input noise. This is shown by significant differences in performance deterioration between base models and their attention-augmented versions. However, base models such as RNN and GRU exhibit better resistance to catastrophic forgetting compared to their attention-enhanced counterparts. Despite this, attention models show a positive correlation between higher overlap percentages in their representations and improved accuracy for certain tasks, alongside a negative correlation with higher numbers of empty nodes. The Transformer model stands out by maintaining high accuracy across tasks, likely due to its self-attention mechanisms. These results suggest that while attention mechanisms enhance robustness to noise, further research is needed to address catastrophic forgetting in neural networks.

Report this publication

Statistics

Seen <100 times