Ridge and Lasso

Ridge and Lasso Regression

Ridge and Lasso are regularization techniques that prevent overfitting in linear regression. They add a penalty term to the cost function, which helps control model complexity.

Ridge (L2 regularization) shrinks coefficients toward zero but doesn't eliminate them. Lasso (L1 regularization) can completely eliminate features by setting their coefficients to zero, performing automatic feature selection.

The Problem: Overfitting in Linear Regression

Regular linear regression can overfit, especially with many features:

overfitting_problem.py
# Problem: Overfitting in Linear Regression

print("Regular Linear Regression:")
print("  Cost function: Minimize (y - predictions)²")
print("  Problem: With many features, coefficients can become very large")
print("  Result: Model overfits to training data")

print("\nExample with 10 features:")
print("  Regular regression coefficients:")
coefficients = [2.5, -1.8, 3.2, -0.5, 4.1, -2.3, 1.9, -3.7, 0.8, -1.2]
for i, coef in enumerate(coefficients):
    print(f"    Feature {i+1}: {coef:+.2f}")

print("\n  Problem: Large coefficients = high variance = overfitting")
print("  Solution: Add penalty term to shrink coefficients")

Ridge Regression (L2 Regularization)

Ridge adds a penalty based on the sum of squared coefficients:

ridge.py
# Ridge Regression (L2 Regularization)

print("Ridge Regression:")
print("  Cost = (y - predictions)² + α * Σ(coefficients²)")
print("  α (alpha) = regularization strength")
print("  Penalty: Sum of squared coefficients")

print("\nEffect:")
print("  - Shrinks all coefficients toward zero")
print("  - Prevents any coefficient from becoming too large")
print("  - Reduces overfitting")
print("  - Coefficients never become exactly zero")

# Example: Ridge shrinks coefficients
regular_coefs = [2.5, -1.8, 3.2, -0.5]
ridge_coefs = [1.8, -1.2, 2.1, -0.3]  # Shrunk

print("\nCoefficient Comparison:")
for i, (reg, rid) in enumerate(zip(regular_coefs, ridge_coefs)):
    print(f"  Feature {i+1}: {reg:+.2f} → {rid:+.2f} (shrunk)")

print("\nWhen to use Ridge:")
print("  - Many features, all potentially useful")
print("  - Multicollinearity (correlated features)")
print("  - Want to keep all features")

Lasso Regression (L1 Regularization)

Lasso adds a penalty based on the sum of absolute coefficients:

lasso.py
# Lasso Regression (L1 Regularization)

print("Lasso Regression:")
print("  Cost = (y - predictions)² + α * Σ|coefficients|")
print("  α (alpha) = regularization strength")
print("  Penalty: Sum of absolute coefficients")

print("\nEffect:")
print("  - Shrinks coefficients toward zero")
print("  - Can set coefficients to exactly zero")
print("  - Performs automatic feature selection")
print("  - Reduces overfitting")

# Example: Lasso can zero out coefficients
regular_coefs = [2.5, -1.8, 3.2, -0.5, 0.3]
lasso_coefs = [1.9, -1.1, 2.3, 0.0, 0.0]  # Some zeroed

print("\nCoefficient Comparison:")
for i, (reg, las) in enumerate(zip(regular_coefs, lasso_coefs)):
    status = "removed" if las == 0 else "kept"
    print(f"  Feature {i+1}: {reg:+.2f} → {las:+.2f} ({status})")

print("\nWhen to use Lasso:")
print("  - Many features, some irrelevant")
print("  - Want automatic feature selection")
print("  - Need interpretable model (fewer features)")

Ridge vs Lasso Comparison

Understanding when to use each:

comparison.py
# Ridge vs Lasso Comparison

print("Ridge vs Lasso:")
print("=" * 50")

print("\nRidge (L2):")
print("  Penalty: Σ(coefficients²)")
print("  Effect: Shrinks all coefficients")
print("  Feature selection: No (keeps all features)")
print("  Best for: Many correlated features")

print("\nLasso (L1):")
print("  Penalty: Σ|coefficients|")
print("  Effect: Can zero out coefficients")
print("  Feature selection: Yes (removes features)")
print("  Best for: Feature selection needed")

print("\nElastic Net:")
print("  Combines Ridge and Lasso")
print("  Cost = (y - pred)² + α₁*Σ|coef| + α₂*Σ(coef²)")
print("  Best of both worlds")

Exercise: Implement Ridge and Lasso

Complete the exercise on the right side:

Task 1: Calculate Ridge penalty (sum of squared coefficients)
Task 2: Calculate Lasso penalty (sum of absolute coefficients)
Task 3: Compare how Ridge and Lasso affect coefficients
Task 4: Identify which features Lasso would remove (zero out)

Write your code to understand Ridge and Lasso regularization!

💡 Learning Tip

Practice is essential. Try modifying the code examples, experiment with different parameters, and see how changes affect the results. Hands-on experience is the best teacher!

🎉

Lesson Complete!

Great work! Continue to the next lesson.