Model Evaluation

What is Model Evaluation?

Model evaluation is the process of assessing how well your ML model performs. It's crucial because you need to know if your model is actually useful before deploying it in real-world applications.

Evaluation helps you compare different models, identify problems, and make informed decisions about which model to use. Without proper evaluation, you can't trust your model's predictions.

Evaluation Metrics for Classification

For classification problems, we use metrics like accuracy, precision, recall, and F1-score:

classification_metrics.py
# Classification Evaluation Metrics

# Example: Binary classification results
# True labels vs predicted labels
y_true = [1, 0, 1, 1, 0, 1, 0, 0]
y_pred = [1, 0, 1, 0, 0, 1, 1, 0]

print("True labels:", y_true)
print("Predicted labels:", y_pred)

# Calculate accuracy (correct predictions / total)
correct = sum(1 for t, p in zip(y_true, y_pred) if t == p)
accuracy = correct / len(y_true)
print(f"\nAccuracy: {accuracy:.2%}")
print(f"  {correct} out of {len(y_true)} predictions correct")

# Calculate precision (true positives / (true positives + false positives))
# For class 1: TP=3, FP=1, so precision = 3/(3+1) = 0.75
tp = sum(1 for t, p in zip(y_true, y_pred) if t == 1 and p == 1)
fp = sum(1 for t, p in zip(y_true, y_pred) if t == 0 and p == 1)
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
print(f"\nPrecision: {precision:.2%}")
print(f"  Of all positive predictions, {precision:.0%} were correct")

# Calculate recall (true positives / (true positives + false negatives))
fn = sum(1 for t, p in zip(y_true, y_pred) if t == 1 and p == 0)
recall = tp / (tp + fn) if (tp + fn) > 0 else 0
print(f"\nRecall: {recall:.2%}")
print(f"  Of all actual positives, {recall:.0%} were found")

# F1-score (harmonic mean of precision and recall)
f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
print(f"\nF1-Score: {f1:.2%}")
print(f"  Balanced measure of precision and recall")

Evaluation Metrics for Regression

For regression problems, we use different metrics to measure prediction errors:

regression_metrics.py
# Regression Evaluation Metrics

# Example: House price predictions
y_true = [200000, 300000, 250000, 400000, 350000]
y_pred = [210000, 290000, 240000, 410000, 340000]

print("Actual prices:", y_true)
print("Predicted prices:", y_pred)

# Mean Absolute Error (MAE)
# Average of absolute differences
mae = sum(abs(t - p) for t, p in zip(y_true, y_pred)) / len(y_true)
print(f"\nMean Absolute Error (MAE): ${mae:,.0f}")
print("  Average prediction error in dollars")

# Mean Squared Error (MSE)
# Average of squared differences (penalizes large errors more)
mse = sum((t - p) ** 2 for t, p in zip(y_true, y_pred)) / len(y_true)
print(f"\nMean Squared Error (MSE): {mse:,.0f}")
print("  Penalizes large errors more than small ones")

# Root Mean Squared Error (RMSE)
rmse = mse ** 0.5
print(f"\nRoot Mean Squared Error (RMSE): ${rmse:,.0f}")
print("  Error in same units as target (easier to interpret)")

# R-squared (coefficient of determination)
# Measures how well model explains variance
mean_true = sum(y_true) / len(y_true)
ss_res = sum((t - p) ** 2 for t, p in zip(y_true, y_pred))
ss_tot = sum((t - mean_true) ** 2 for t in y_true)
r2 = 1 - (ss_res / ss_tot) if ss_tot > 0 else 0
print(f"\nR-squared (R²): {r2:.2%}")
print(f"  Model explains {r2:.0%} of variance in data")

Confusion Matrix

The confusion matrix shows detailed classification performance:

confusion_matrix.py
# Confusion Matrix for Classification

# Binary classification example
y_true = [1, 0, 1, 1, 0, 1, 0, 0]
y_pred = [1, 0, 1, 0, 0, 1, 1, 0]

# Build confusion matrix manually
#         Predicted
#         0    1
# Actual 0  TN  FP
#        1  FN  TP

tn = sum(1 for t, p in zip(y_true, y_pred) if t == 0 and p == 0)  # True Negative
fp = sum(1 for t, p in zip(y_true, y_pred) if t == 0 and p == 1)  # False Positive
fn = sum(1 for t, p in zip(y_true, y_pred) if t == 1 and p == 0)  # False Negative
tp = sum(1 for t, p in zip(y_true, y_pred) if t == 1 and p == 1)  # True Positive

print("Confusion Matrix:")
print("                Predicted")
print("                0    1")
print(f"Actual  0    {tn:2d}  {fp:2d}")
print(f"        1    {fn:2d}  {tp:2d}")

print("\nMetrics from confusion matrix:")
print(f"  True Negatives (TN): {tn}")
print(f"  False Positives (FP): {fp}")
print(f"  False Negatives (FN): {fn}")
print(f"  True Positives (TP): {tp}")

print("\nCalculated metrics:")
accuracy = (tp + tn) / (tp + tn + fp + fn)
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
recall = tp / (tp + fn) if (tp + fn) > 0 else 0
print(f"  Accuracy: {accuracy:.2%}")
print(f"  Precision: {precision:.2%}")
print(f"  Recall: {recall:.2%}")

Using sklearn for Evaluation

scikit-learn provides functions to calculate metrics easily:

sklearn_evaluation.py
# Using sklearn for Model Evaluation

# Classification metrics
y_true_class = [1, 0, 1, 1, 0]
y_pred_class = [1, 0, 1, 0, 0]

print("Classification Evaluation (using sklearn):")
print("  from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score")
print("  accuracy = accuracy_score(y_true, y_pred)")
print("  precision = precision_score(y_true, y_pred)")
print("  recall = recall_score(y_true, y_pred)")
print("  f1 = f1_score(y_true, y_pred)")

# Regression metrics
y_true_reg = [100, 200, 150, 300]
y_pred_reg = [110, 190, 140, 310]

print("\nRegression Evaluation (using sklearn):")
print("  from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score")
print("  mae = mean_absolute_error(y_true, y_pred)")
print("  mse = mean_squared_error(y_true, y_pred)")
print("  rmse = np.sqrt(mse)")
print("  r2 = r2_score(y_true, y_pred)")

print("\nsklearn makes evaluation much easier!")

Exercise: Evaluate Model Performance

Complete the exercise on the right side:

Task 1: Calculate accuracy for classification predictions
Task 2: Calculate precision, recall, and F1-score
Task 3: Build a confusion matrix (count TP, TN, FP, FN)
Task 4: Calculate MAE and RMSE for regression predictions

Write your code to evaluate model performance using these metrics!

💡 Learning Tip

Practice is essential. Try modifying the code examples, experiment with different parameters, and see how changes affect the results. Hands-on experience is the best teacher!

🎉

Lesson Complete!

Great work! Continue to the next lesson.