Chapter 9: Deep Learning / Lesson 43

Recurrent Neural Networks

What are Recurrent Neural Networks (RNNs)?

Recurrent Neural Networks (RNNs) are a class of neural networks designed to process sequential data by maintaining a "memory" of previous inputs. Unlike feedforward networks that process inputs independently, RNNs use feedback connections, allowing information to flow in loops and persist across time steps.

The key innovation of RNNs is their ability to handle sequences of varying lengths and capture temporal dependencies, making them ideal for tasks involving time series, text, speech, and other sequential patterns.

RNN vs Feedforward Network
# Feedforward: Each input processed independently input1 → network → output1 input2 → network → output2 # No memory of previous inputs! # RNN: Maintains hidden state (memory) input1 → RNN → hidden_state1 → output1 input2 → RNN → hidden_state2 → output2 (uses hidden_state1) # Memory flows through time steps!

How RNNs Work

At each time step, an RNN:

  • Receives Input: Takes the current input and the previous hidden state
  • Updates Hidden State: Combines current input with previous memory
  • Produces Output: Generates output based on the updated hidden state
  • Passes Forward: The hidden state flows to the next time step
Simple RNN Structure
# At each time step t: # h_t = tanh(W_hh * h_{t-1} + W_xh * x_t + b) # y_t = W_hy * h_t + b_y # Where: # h_t = hidden state at time t # x_t = input at time t # W_* = weight matrices # b = biases import numpy as np # Simple RNN step simulation def rnn_step(x_t, h_prev, W_hh, W_xh, W_hy): h_t = np.tanh(W_hh @ h_prev + W_xh @ x_t) y_t = W_hy @ h_t return h_t, y_t print("RNN processes sequences step by step")

LSTM and GRU: Solving the Vanishing Gradient Problem

Standard RNNs suffer from the vanishing gradient problem, making it difficult to learn long-term dependencies. Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks address this:

  • LSTM: Uses three gates (forget, input, output) to selectively remember or forget information
  • GRU: Simpler variant with two gates (reset, update), often performs similarly to LSTM with fewer parameters
Building an LSTM Network
from tensorflow import keras from tensorflow.keras import layers # Build an LSTM model for sequence prediction model = keras.Sequential([ layers.LSTM(64, activation='tanh', return_sequences=True, input_shape=(10, 1)), layers.LSTM(32, activation='tanh'), layers.Dense(1, activation='sigmoid') ]) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) print("LSTM can learn long-term dependencies in sequences")

Practical Applications

RNNs excel at sequential data tasks:

  • Natural Language Processing: Language modeling, machine translation, text generation (e.g., autocomplete, chatbots)
  • Time Series Forecasting: Stock prices, weather prediction, demand forecasting
  • Speech Recognition: Converting spoken words to text
  • Music Generation: Creating sequences of musical notes
  • Video Analysis: Understanding temporal patterns in video sequences

💡 When to Use RNNs

Use RNNs when your data has a temporal or sequential structure and the order matters. For very long sequences, consider LSTM or GRU. For modern NLP tasks, Transformer architectures (like BERT, GPT) often outperform RNNs, but RNNs are still valuable for many sequential tasks!

Common Challenges

Working with RNNs presents several challenges:

  • Vanishing/Exploding Gradients: Standard RNNs struggle with long sequences; use LSTM/GRU
  • Sequential Processing: RNNs process sequences sequentially, making them slower than parallel architectures
  • Choosing Sequence Length: Determining the optimal window size for your data
  • Memory Requirements: Processing long sequences requires significant memory

💡 Learning Tip

Start with simple RNNs to understand the concept, then move to LSTM/GRU for practical applications. Use bidirectional RNNs when you need context from both past and future in your sequence!

Exercise: Build an RNN for Sequence Prediction

In the exercise on the right, you'll build a Recurrent Neural Network using LSTM layers to process sequential data. You'll learn how to structure sequences, add LSTM layers, and configure the network for time-series prediction.

This hands-on exercise will help you understand how RNNs maintain memory across time steps and process sequential information.

🎉

Lesson Complete!

Great work! Continue to the next lesson.

main.py
📤 Output
Click "Run" to execute...