What are Activation Functions?
Activation functions determine the output of a neuron. They introduce non-linearity into neural networks, allowing them to learn complex patterns. Without activation functions, neural networks would just be linear transformations, no matter how many layers they have.
Different activation functions have different properties—some are smooth, some have gradients that vanish, and some introduce sparsity. Choosing the right activation function is crucial for network performance.
Sigmoid Activation Function
The sigmoid function maps any value to a range between 0 and 1:
import math
def sigmoid(x):
return 1 / (1 + math.exp(-x))
print("Sigmoid Function: 1 / (1 + e^(-x))")
print("=" * 50")
test_values = [-5, -2, 0, 2, 5]
for x in test_values:
result = sigmoid(x)
print(f" sigmoid({x:2d}) = {result:.4f}")
print("\nSigmoid Characteristics:")
print(" - Output range: (0, 1)")
print(" - Smooth and differentiable")
print(" - Good for binary classification output")
print(" - Problem: Vanishing gradient for extreme values")
ReLU (Rectified Linear Unit)
ReLU is the most popular activation function for hidden layers:
def relu(x):
return max(0, x)
print("ReLU Function: max(0, x)")
print("=" * 50")
test_values = [-3, -1, 0, 1, 3, 5]
for x in test_values:
result = relu(x)
print(f" relu({x:2d}) = {result}")
print("\nReLU Characteristics:")
print(" - Output: x if x > 0, else 0")
print(" - Computationally efficient")
print(" - Solves vanishing gradient problem")
print(" - Introduces sparsity (many zeros)")
print(" - Problem: Dead neurons (always output 0)")
Tanh (Hyperbolic Tangent)
Tanh is similar to sigmoid but outputs values between -1 and 1:
import math
def tanh(x):
return math.tanh(x)
print("Tanh Function: tanh(x)")
print("=" * 50")
test_values = [-2, -1, 0, 1, 2]
print("Comparison: Sigmoid vs Tanh")
for x in test_values:
sig = 1 / (1 + math.exp(-x))
tan = math.tanh(x)
print(f" x={x:2d}: sigmoid={sig:.3f}, tanh={tan:3f}")
print("\nTanh Characteristics:")
print(" - Output range: (-1, 1)")
print(" - Zero-centered (better than sigmoid for hidden layers)")
print(" - Stronger gradients than sigmoid")
print(" - Still has vanishing gradient problem")
Softmax Activation
Softmax is used for multi-class classification output layers:
import math
def softmax(values):
exp_values = [math.exp(v) for v in values]
total = sum(exp_values)
return [ev / total for ev in exp_values]
print("Softmax Function: Converts logits to probabilities")
print("=" * 50")
raw_scores = [2.0, 1.0, 0.1]
probabilities = softmax(raw_scores)
print(f"\nRaw scores (logits): {raw_scores}")
print(f"After softmax (probabilities): {[f'{p:.3f}' for p in probabilities]}")
print(f"Sum: {sum(probabilities):.3f} (always equals 1.0)")
print("\nSoftmax Characteristics:")
print(" - Outputs probability distribution")
print(" - All outputs sum to 1.0")
print(" - Used for multi-class classification")
print(" - Amplifies differences between scores")
Choosing the Right Activation Function
Different activation functions work best in different situations:
print("Activation Function Selection Guide:")
print("=" * 50")
print("\nFor Hidden Layers:")
print(" ✓ ReLU: Most common choice, fast training")
print(" ✓ Tanh: Sometimes better than sigmoid (zero-centered)")
print(" ✗ Sigmoid: Rarely used in hidden layers (gradient problems)")
print("\nFor Output Layers:")
print(" - Binary Classification: Sigmoid (output 0-1)")
print(" - Multi-Class Classification: Softmax (probability distribution)")
print(" - Regression: Linear (no activation) or ReLU")
print("\nModern Alternatives:")
print(" - Leaky ReLU: Prevents dead neurons")
print(" - ELU: Exponential Linear Unit")
print(" - Swish: Self-gated activation")
Exercise: Implement Activation Functions
Complete the exercise on the right side:
- Task 1: Implement sigmoid activation function
- Task 2: Implement ReLU activation function
- Task 3: Compare outputs of different activation functions
- Task 4: Apply softmax to convert scores to probabilities
Write your code to implement and compare activation functions!
💡 Learning Tip
Practice is essential. Try modifying the code examples, experiment with different parameters, and see how changes affect the results. Hands-on experience is the best teacher!
🎉
Lesson Complete!
Great work! Continue to the next lesson.