Lesson 29: Regression Metrics

Regression Metrics Explained

Regression metrics measure how well your model predicts continuous values. Unlike classification (which predicts categories), regression predicts numbers, so we need different metrics to evaluate performance.

Common regression metrics include Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R²). Each metric tells you something different about your model's performance.

Mean Absolute Error (MAE)

MAE measures the average absolute difference between predictions and actual values:

mae.py
# Mean Absolute Error (MAE)

# Example: House price predictions
y_true = [200000, 300000, 250000, 400000, 350000]
y_pred = [210000, 290000, 240000, 410000, 340000]

print("Actual vs Predicted Prices:")
for i, (true, pred) in enumerate(zip(y_true, y_pred)):
    error = abs(true - pred)
    print(f"  House {i+1}: ${true:,} → ${pred:,} (error: ${error:,})")

# Calculate MAE
# MAE = (1/n) * Σ|y_true - y_pred|
absolute_errors = [abs(t - p) for t, p in zip(y_true, y_pred)]
mae = sum(absolute_errors) / len(absolute_errors)

print(f"\nMean Absolute Error (MAE): ${mae:,.0f}")
print("  Interpretation: On average, predictions are off by this amount")
print("  Units: Same as target variable (dollars in this case)")
print("  Lower is better")

print("\nMAE Characteristics:")
print("  ✓ Easy to interpret (average error)")
print("  ✓ Not sensitive to outliers")
print("  ✓ Same units as target")

Mean Squared Error (MSE) and RMSE

MSE squares the errors, penalizing large errors more. RMSE is the square root of MSE:

mse_rmse.py
# Mean Squared Error (MSE) and RMSE

y_true = [200000, 300000, 250000, 400000, 350000]
y_pred = [210000, 290000, 240000, 410000, 340000]

# Calculate MSE
# MSE = (1/n) * Σ(y_true - y_pred)²
squared_errors = [(t - p) ** 2 for t, p in zip(y_true, y_pred)]
mse = sum(squared_errors) / len(squared_errors)

print("Squared Errors:")
for i, (true, pred, sq_err) in enumerate(zip(y_true, y_pred, squared_errors)):
    abs_err = abs(true - pred)
    print(f"  House {i+1}: Error ${abs_err:,} → Squared: {sq_err:,.0f}")

print(f"\nMean Squared Error (MSE): {mse:,.0f}")
print("  Units: Squared units (harder to interpret)")
print("  Penalizes large errors more than small ones")

# Calculate RMSE
# RMSE = √MSE
rmse = mse ** 0.5

print(f"\nRoot Mean Squared Error (RMSE): ${rmse:,.0f}")
print("  Units: Same as target (easier to interpret than MSE)")
print("  Still penalizes large errors more")
print("  Lower is better")

print("\nMSE/RMSE Characteristics:")
print("  ✓ Sensitive to outliers (large errors penalized heavily)")
print("  ✓ RMSE in same units as target (more interpretable)")
print("  ✓ Commonly used in practice")

R-squared (R²)

R² measures how well the model explains the variance in the data:

r_squared.py
# R-squared (Coefficient of Determination)

y_true = [200000, 300000, 250000, 400000, 350000]
y_pred = [210000, 290000, 240000, 410000, 340000]

# Calculate R²
# R² = 1 - (SS_res / SS_tot)
# SS_res = sum of squared residuals (errors)
# SS_tot = sum of squared differences from mean

mean_true = sum(y_true) / len(y_true)
ss_res = sum((t - p) ** 2 for t, p in zip(y_true, y_pred))
ss_tot = sum((t - mean_true) ** 2 for t in y_true)
r2 = 1 - (ss_res / ss_tot) if ss_tot > 0 else 0

print(f"Mean of actual values: ${mean_true:,.0f}")
print(f"Sum of Squared Residuals (SS_res): {ss_res:,.0f}")
print(f"Sum of Squared Total (SS_tot): {ss_tot:,.0f}")

print(f"\nR-squared (R²): {r2:.2%}")
print(f"  Interpretation: Model explains {r2:.0%} of variance")
print("  Range: 0 to 1 (or negative if worse than mean)")
print("  Higher is better")

print("\nR² Interpretation:")
print("  R² = 1.0: Perfect predictions")
print("  R² = 0.8: Model explains 80% of variance")
print("  R² = 0.0: Model is as good as predicting the mean")
print("  R² < 0: Model is worse than predicting the mean")

Comparing Metrics

Different metrics give different insights:

compare_metrics.py
# Comparing Regression Metrics

print("Regression Metrics Comparison:")
print("=" * 50")

print("\nMAE (Mean Absolute Error):")
print("  Use when: Need interpretable average error")
print("  Pros: Easy to understand, robust to outliers")
print("  Cons: Doesn't penalize large errors heavily")

print("\nMSE/RMSE (Mean Squared Error):")
print("  Use when: Large errors are costly")
print("  Pros: Penalizes large errors, RMSE interpretable")
print("  Cons: Sensitive to outliers")

print("\nR² (R-squared):")
print("  Use when: Want to know how much variance is explained")
print("  Pros: Normalized (0-1), easy to compare models")
print("  Cons: Can be misleading with non-linear relationships")

print("\nBest Practice:")
print("  Report multiple metrics for complete picture")
print("  MAE for interpretability")
print("  RMSE for optimization")
print("  R² for variance explanation")

Exercise: Calculate Regression Metrics

Complete the exercise on the right side:

Task 1: Calculate MAE for predictions
Task 2: Calculate MSE and RMSE
Task 3: Calculate R² (R-squared)
Task 4: Compare and interpret all metrics

Write your code to calculate and interpret regression metrics!

Regression Metrics