Chapter 8: Neural Networks / Lesson 37

Activation Functions

What are Activation Functions?

Activation functions determine the output of a neuron. They introduce non-linearity into neural networks, allowing them to learn complex patterns. Without activation functions, neural networks would just be linear transformations, no matter how many layers they have.

Different activation functions have different properties—some are smooth, some have gradients that vanish, and some introduce sparsity. Choosing the right activation function is crucial for network performance.

Sigmoid Activation Function

The sigmoid function maps any value to a range between 0 and 1:

sigmoid.py
# Sigmoid Activation Function import math def sigmoid(x): return 1 / (1 + math.exp(-x)) print("Sigmoid Function: 1 / (1 + e^(-x))") print("=" * 50") # Test sigmoid with different inputs test_values = [-5, -2, 0, 2, 5] for x in test_values: result = sigmoid(x) print(f" sigmoid({x:2d}) = {result:.4f}") print("\nSigmoid Characteristics:") print(" - Output range: (0, 1)") print(" - Smooth and differentiable") print(" - Good for binary classification output") print(" - Problem: Vanishing gradient for extreme values")

ReLU (Rectified Linear Unit)

ReLU is the most popular activation function for hidden layers:

relu.py
# ReLU Activation Function def relu(x): return max(0, x) print("ReLU Function: max(0, x)") print("=" * 50") # Test ReLU with different inputs test_values = [-3, -1, 0, 1, 3, 5] for x in test_values: result = relu(x) print(f" relu({x:2d}) = {result}") print("\nReLU Characteristics:") print(" - Output: x if x > 0, else 0") print(" - Computationally efficient") print(" - Solves vanishing gradient problem") print(" - Introduces sparsity (many zeros)") print(" - Problem: Dead neurons (always output 0)")

Tanh (Hyperbolic Tangent)

Tanh is similar to sigmoid but outputs values between -1 and 1:

tanh.py
# Tanh Activation Function import math def tanh(x): return math.tanh(x) print("Tanh Function: tanh(x)") print("=" * 50") # Compare sigmoid and tanh test_values = [-2, -1, 0, 1, 2] print("Comparison: Sigmoid vs Tanh") for x in test_values: sig = 1 / (1 + math.exp(-x)) tan = math.tanh(x) print(f" x={x:2d}: sigmoid={sig:.3f}, tanh={tan:3f}") print("\nTanh Characteristics:") print(" - Output range: (-1, 1)") print(" - Zero-centered (better than sigmoid for hidden layers)") print(" - Stronger gradients than sigmoid") print(" - Still has vanishing gradient problem")

Softmax Activation

Softmax is used for multi-class classification output layers:

softmax.py
# Softmax Activation Function import math def softmax(values): exp_values = [math.exp(v) for v in values] total = sum(exp_values) return [ev / total for ev in exp_values] print("Softmax Function: Converts logits to probabilities") print("=" * 50") # Example: Multi-class classification (3 classes) raw_scores = [2.0, 1.0, 0.1] # Logits before softmax probabilities = softmax(raw_scores) print(f"\nRaw scores (logits): {raw_scores}") print(f"After softmax (probabilities): {[f'{p:.3f}' for p in probabilities]}") print(f"Sum: {sum(probabilities):.3f} (always equals 1.0)") print("\nSoftmax Characteristics:") print(" - Outputs probability distribution") print(" - All outputs sum to 1.0") print(" - Used for multi-class classification") print(" - Amplifies differences between scores")

Choosing the Right Activation Function

Different activation functions work best in different situations:

choosing_activation.py
# Choosing Activation Functions print("Activation Function Selection Guide:") print("=" * 50") print("\nFor Hidden Layers:") print(" ✓ ReLU: Most common choice, fast training") print(" ✓ Tanh: Sometimes better than sigmoid (zero-centered)") print(" ✗ Sigmoid: Rarely used in hidden layers (gradient problems)") print("\nFor Output Layers:") print(" - Binary Classification: Sigmoid (output 0-1)") print(" - Multi-Class Classification: Softmax (probability distribution)") print(" - Regression: Linear (no activation) or ReLU") print("\nModern Alternatives:") print(" - Leaky ReLU: Prevents dead neurons") print(" - ELU: Exponential Linear Unit") print(" - Swish: Self-gated activation")

Exercise: Implement Activation Functions

Complete the exercise on the right side:

  • Task 1: Implement sigmoid activation function
  • Task 2: Implement ReLU activation function
  • Task 3: Compare outputs of different activation functions
  • Task 4: Apply softmax to convert scores to probabilities

Write your code to implement and compare activation functions!

💡 Learning Tip

Practice is essential. Try modifying the code examples, experiment with different parameters, and see how changes affect the results. Hands-on experience is the best teacher!

🎉

Lesson Complete!

Great work! Continue to the next lesson.

main.py
📤 Output
Click "Run" to execute...