Lesson 43: Recurrent Neural Networks

What are Recurrent Neural Networks (RNNs)?

Recurrent Neural Networks (RNNs) are a class of neural networks designed to process sequential data by maintaining a "memory" of previous inputs. Unlike feedforward networks that process inputs independently, RNNs use feedback connections, allowing information to flow in loops and persist across time steps.

The key innovation of RNNs is their ability to handle sequences of varying lengths and capture temporal dependencies, making them ideal for tasks involving time series, text, speech, and other sequential patterns.

RNN vs Feedforward Network
# Feedforward: Each input processed independently
input1 → network → output1
input2 → network → output2
# No memory of previous inputs!

# RNN: Maintains hidden state (memory)
input1 → RNN → hidden_state1 → output1
input2 → RNN → hidden_state2 → output2 (uses hidden_state1)
# Memory flows through time steps!

How RNNs Work

At each time step, an RNN:

Receives Input: Takes the current input and the previous hidden state
Updates Hidden State: Combines current input with previous memory
Produces Output: Generates output based on the updated hidden state
Passes Forward: The hidden state flows to the next time step

Simple RNN Structure
# At each time step t:
# h_t = tanh(W_hh * h_{t-1} + W_xh * x_t + b)
# y_t = W_hy * h_t + b_y

# Where:
# h_t = hidden state at time t
# x_t = input at time t
# W_* = weight matrices
# b = biases

import numpy as np

# Simple RNN step simulation
def rnn_step(x_t, h_prev, W_hh, W_xh, W_hy):
    h_t = np.tanh(W_hh @ h_prev + W_xh @ x_t)
    y_t = W_hy @ h_t
    return h_t, y_t

print("RNN processes sequences step by step")

LSTM and GRU: Solving the Vanishing Gradient Problem

Standard RNNs suffer from the vanishing gradient problem, making it difficult to learn long-term dependencies. Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks address this:

LSTM: Uses three gates (forget, input, output) to selectively remember or forget information
GRU: Simpler variant with two gates (reset, update), often performs similarly to LSTM with fewer parameters

Building an LSTM Network
from tensorflow import keras
from tensorflow.keras import layers

# Build an LSTM model for sequence prediction
model = keras.Sequential([
    layers.LSTM(64, activation='tanh', return_sequences=True, input_shape=(10, 1)),
    layers.LSTM(32, activation='tanh'),
    layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
print("LSTM can learn long-term dependencies in sequences")

Practical Applications

RNNs excel at sequential data tasks:

Natural Language Processing: Language modeling, machine translation, text generation (e.g., autocomplete, chatbots)
Time Series Forecasting: Stock prices, weather prediction, demand forecasting
Speech Recognition: Converting spoken words to text
Music Generation: Creating sequences of musical notes
Video Analysis: Understanding temporal patterns in video sequences

💡 When to Use RNNs

Use RNNs when your data has a temporal or sequential structure and the order matters. For very long sequences, consider LSTM or GRU. For modern NLP tasks, Transformer architectures (like BERT, GPT) often outperform RNNs, but RNNs are still valuable for many sequential tasks!

Common Challenges

Working with RNNs presents several challenges:

Vanishing/Exploding Gradients: Standard RNNs struggle with long sequences; use LSTM/GRU
Sequential Processing: RNNs process sequences sequentially, making them slower than parallel architectures
Choosing Sequence Length: Determining the optimal window size for your data
Memory Requirements: Processing long sequences requires significant memory

💡 Learning Tip

Start with simple RNNs to understand the concept, then move to LSTM/GRU for practical applications. Use bidirectional RNNs when you need context from both past and future in your sequence!

Exercise: Build an RNN for Sequence Prediction

In the exercise on the right, you'll build a Recurrent Neural Network using LSTM layers to process sequential data. You'll learn how to structure sequences, add LSTM layers, and configure the network for time-series prediction.

This hands-on exercise will help you understand how RNNs maintain memory across time steps and process sequential information.

Recurrent Neural Networks