What is Model Evaluation?
Model evaluation is the process of assessing how well your ML model performs. It's crucial because you need to know if your model is actually useful before deploying it in real-world applications.
Evaluation helps you compare different models, identify problems, and make informed decisions about which model to use. Without proper evaluation, you can't trust your model's predictions.
Evaluation Metrics for Classification
For classification problems, we use metrics like accuracy, precision, recall, and F1-score:
y_true = [1, 0, 1, 1, 0, 1, 0, 0]
y_pred = [1, 0, 1, 0, 0, 1, 1, 0]
print("True labels:", y_true)
print("Predicted labels:", y_pred)
correct = sum(1 for t, p in zip(y_true, y_pred) if t == p)
accuracy = correct / len(y_true)
print(f"\nAccuracy: {accuracy:.2%}")
print(f" {correct} out of {len(y_true)} predictions correct")
tp = sum(1 for t, p in zip(y_true, y_pred) if t == 1 and p == 1)
fp = sum(1 for t, p in zip(y_true, y_pred) if t == 0 and p == 1)
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
print(f"\nPrecision: {precision:.2%}")
print(f" Of all positive predictions, {precision:.0%} were correct")
fn = sum(1 for t, p in zip(y_true, y_pred) if t == 1 and p == 0)
recall = tp / (tp + fn) if (tp + fn) > 0 else 0
print(f"\nRecall: {recall:.2%}")
print(f" Of all actual positives, {recall:.0%} were found")
f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
print(f"\nF1-Score: {f1:.2%}")
print(f" Balanced measure of precision and recall")
Evaluation Metrics for Regression
For regression problems, we use different metrics to measure prediction errors:
y_true = [200000, 300000, 250000, 400000, 350000]
y_pred = [210000, 290000, 240000, 410000, 340000]
print("Actual prices:", y_true)
print("Predicted prices:", y_pred)
mae = sum(abs(t - p) for t, p in zip(y_true, y_pred)) / len(y_true)
print(f"\nMean Absolute Error (MAE): ${mae:,.0f}")
print(" Average prediction error in dollars")
mse = sum((t - p) ** 2 for t, p in zip(y_true, y_pred)) / len(y_true)
print(f"\nMean Squared Error (MSE): {mse:,.0f}")
print(" Penalizes large errors more than small ones")
rmse = mse ** 0.5
print(f"\nRoot Mean Squared Error (RMSE): ${rmse:,.0f}")
print(" Error in same units as target (easier to interpret)")
mean_true = sum(y_true) / len(y_true)
ss_res = sum((t - p) ** 2 for t, p in zip(y_true, y_pred))
ss_tot = sum((t - mean_true) ** 2 for t in y_true)
r2 = 1 - (ss_res / ss_tot) if ss_tot > 0 else 0
print(f"\nR-squared (R²): {r2:.2%}")
print(f" Model explains {r2:.0%} of variance in data")
Confusion Matrix
The confusion matrix shows detailed classification performance:
y_true = [1, 0, 1, 1, 0, 1, 0, 0]
y_pred = [1, 0, 1, 0, 0, 1, 1, 0]
tn = sum(1 for t, p in zip(y_true, y_pred) if t == 0 and p == 0)
fp = sum(1 for t, p in zip(y_true, y_pred) if t == 0 and p == 1)
fn = sum(1 for t, p in zip(y_true, y_pred) if t == 1 and p == 0)
tp = sum(1 for t, p in zip(y_true, y_pred) if t == 1 and p == 1)
print("Confusion Matrix:")
print(" Predicted")
print(" 0 1")
print(f"Actual 0 {tn:2d} {fp:2d}")
print(f" 1 {fn:2d} {tp:2d}")
print("\nMetrics from confusion matrix:")
print(f" True Negatives (TN): {tn}")
print(f" False Positives (FP): {fp}")
print(f" False Negatives (FN): {fn}")
print(f" True Positives (TP): {tp}")
print("\nCalculated metrics:")
accuracy = (tp + tn) / (tp + tn + fp + fn)
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
recall = tp / (tp + fn) if (tp + fn) > 0 else 0
print(f" Accuracy: {accuracy:.2%}")
print(f" Precision: {precision:.2%}")
print(f" Recall: {recall:.2%}")
Using sklearn for Evaluation
scikit-learn provides functions to calculate metrics easily:
y_true_class = [1, 0, 1, 1, 0]
y_pred_class = [1, 0, 1, 0, 0]
print("Classification Evaluation (using sklearn):")
print(" from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score")
print(" accuracy = accuracy_score(y_true, y_pred)")
print(" precision = precision_score(y_true, y_pred)")
print(" recall = recall_score(y_true, y_pred)")
print(" f1 = f1_score(y_true, y_pred)")
y_true_reg = [100, 200, 150, 300]
y_pred_reg = [110, 190, 140, 310]
print("\nRegression Evaluation (using sklearn):")
print(" from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score")
print(" mae = mean_absolute_error(y_true, y_pred)")
print(" mse = mean_squared_error(y_true, y_pred)")
print(" rmse = np.sqrt(mse)")
print(" r2 = r2_score(y_true, y_pred)")
print("\nsklearn makes evaluation much easier!")
Exercise: Evaluate Model Performance
Complete the exercise on the right side:
- Task 1: Calculate accuracy for classification predictions
- Task 2: Calculate precision, recall, and F1-score
- Task 3: Build a confusion matrix (count TP, TN, FP, FN)
- Task 4: Calculate MAE and RMSE for regression predictions
Write your code to evaluate model performance using these metrics!
💡 Learning Tip
Practice is essential. Try modifying the code examples, experiment with different parameters, and see how changes affect the results. Hands-on experience is the best teacher!