Chapter 9: Deep Learning / Lesson 42

Convolutional Neural Networks

What are Convolutional Neural Networks (CNNs)?

Convolutional Neural Networks (CNNs) are a specialized type of deep learning architecture designed to process grid-like data, particularly images. CNNs excel at automatically detecting and learning spatial hierarchies in visual data, making them the go-to solution for computer vision tasks.

The key innovation of CNNs is their ability to automatically learn hierarchical features: lower layers detect simple patterns like edges and corners, while deeper layers recognize complex objects like faces or animals.

Why CNNs for Images?
# Regular Dense Layer: All pixels connected to all neurons # Input: 28x28 image = 784 pixels # Hidden layer: 128 neurons # Total connections: 784 × 128 = 100,352 parameters! # Convolutional Layer: Local connections, shared weights # Input: 28x28 image # Conv layer: 3x3 filters, 32 filters # Only local connections + weight sharing = far fewer parameters # More efficient and preserves spatial relationships!

Key Components of CNNs

CNNs consist of several specialized layers:

  • Convolutional Layers: Apply filters (kernels) to detect local features like edges, textures, and patterns
  • Activation Functions: Introduce non-linearity (typically ReLU) to enable complex pattern learning
  • Pooling Layers: Reduce spatial dimensions and computational complexity (MaxPooling, AveragePooling)
  • Fully Connected Layers: Final layers that perform classification based on learned features
CNN Architecture Example
from tensorflow import keras from tensorflow.keras import layers model = keras.Sequential([ # First Convolutional Block layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)), layers.MaxPooling2D((2, 2)), # Second Convolutional Block layers.Conv2D(64, (3, 3), activation='relu'), layers.MaxPooling2D((2, 2)), # Flatten and Classify layers.Flatten(), layers.Dense(128, activation='relu'), layers.Dense(10, activation='softmax') # 10 classes ]) print("CNN Architecture:") model.summary()

How Convolution Works

Convolution is a mathematical operation where a filter (small matrix) slides across the input image, computing dot products at each position. This process:

  • Detects Features: Each filter learns to detect specific patterns (edges, textures, shapes)
  • Preserves Spatial Relationships: Unlike fully connected layers, convolution maintains the 2D structure
  • Shares Weights: The same filter is applied across the entire image, making it translation-invariant
Understanding Convolution Operation
import numpy as np from scipy.ndimage import convolve # Simple 5x5 image image = np.array([ [1, 1, 1, 0, 0], [1, 1, 1, 0, 0], [1, 1, 1, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0] ]) # Edge detection filter (vertical edge) filter_kernel = np.array([ [-1, 0, 1], [-1, 0, 1], [-1, 0, 1] ]) # Apply convolution result = convolve(image, filter_kernel, mode='constant') print("Original image shape:", image.shape) print("Filter shape:", filter_kernel.shape) print("Result shape after convolution:", result.shape) print("\\nConvolution highlights edges and patterns!")

Pooling Layers

Pooling layers reduce the spatial dimensions of feature maps, providing:

  • Dimensionality Reduction: Makes the model more computationally efficient
  • Translation Invariance: Helps the model recognize features regardless of their exact position
  • Feature Generalization: Focuses on the most important information
MaxPooling vs AveragePooling
import numpy as np # Example feature map (4x4) feature_map = np.array([ [1, 3, 2, 4], [5, 7, 6, 8], [2, 4, 1, 3], [6, 8, 5, 7] ]) # MaxPooling (2x2): Takes maximum value in each region # Result: [7, 8] (from top-right 2x2 blocks) # [8, 7] # AveragePooling (2x2): Takes average value in each region # Result: [4, 5] (averages of 2x2 blocks) # [5, 4] print("MaxPooling: Keeps strongest features") print("AveragePooling: Smooths features")

Practical Applications

CNNs have revolutionized computer vision and are used in:

  • Image Classification: Identifying objects in photos (e.g., Google Photos search)
  • Object Detection: Finding and localizing multiple objects (e.g., autonomous vehicles)
  • Facial Recognition: Security systems and photo tagging
  • Medical Imaging: Detecting tumors, analyzing X-rays and MRIs
  • Video Analysis: Action recognition, video surveillance

💡 Why CNNs Work So Well

CNNs are inspired by the visual cortex of animals. The hierarchical feature learning (simple → complex) mirrors how our brains process visual information. This biological inspiration makes CNNs particularly effective for visual tasks!

Common Challenges

Working with CNNs presents several challenges:

  • Computational Requirements: Training CNNs requires significant GPU memory and processing power
  • Overfitting: Complex CNNs can memorize training data; use dropout, data augmentation, or regularization
  • Hyperparameter Tuning: Many parameters (filter sizes, stride, padding, number of filters) need careful selection
  • Data Requirements: CNNs typically need large, labeled image datasets for training

💡 Learning Tip

Start with pre-trained models (like those from ImageNet) and fine-tune them for your specific task. This transfer learning approach saves time and resources while achieving good results!

Exercise: Build a CNN for Image Classification

In the exercise on the right, you'll build a Convolutional Neural Network step by step. You'll add convolutional layers, pooling layers, and fully connected layers to create a complete CNN architecture.

This hands-on exercise will help you understand how CNNs are structured and how each component contributes to learning visual features.

🎉

Lesson Complete!

Great work! Continue to the next lesson.

main.py
📤 Output
Click "Run" to execute...