Classification Metrics Explained
Classification metrics help you evaluate how well your model categorizes data. Unlike regression (predicting numbers), classification predicts categories, so we need different metrics like accuracy, precision, recall, F1-score, and ROC-AUC.
Different metrics are important for different scenarios. For example, in medical diagnosis, you might care more about recall (finding all sick patients) than precision (avoiding false positives).
Accuracy, Precision, and Recall
These are the fundamental classification metrics. Let's understand each:
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]
print("True labels: ", y_true)
print("Predicted: ", y_pred)
tp = sum(1 for t, p in zip(y_true, y_pred) if t == 1 and p == 1)
tn = sum(1 for t, p in zip(y_true, y_pred) if t == 0 and p == 0)
fp = sum(1 for t, p in zip(y_true, y_pred) if t == 0 and p == 1)
fn = sum(1 for t, p in zip(y_true, y_pred) if t == 1 and p == 0)
print(f"\nConfusion Matrix Components:")
print(f" True Positives (TP): {tp}")
print(f" True Negatives (TN): {tn}")
print(f" False Positives (FP): {fp}")
print(f" False Negatives (FN): {fn}")
accuracy = (tp + tn) / (tp + tn + fp + fn) if (tp + tn + fp + fn) > 0 else 0
print(f"\nAccuracy: {accuracy:.2%}")
print(" Overall correctness - can be misleading with imbalanced data")
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
print(f"\nPrecision: {precision:.2%}")
print(" Of all spam predictions, how many were actually spam?")
recall = tp / (tp + fn) if (tp + fn) > 0 else 0
print(f"\nRecall: {recall:.2%}")
print(" Of all actual spam emails, how many did we catch?")
F1-Score and Balanced Metrics
F1-score balances precision and recall, which is useful when classes are imbalanced:
scenarios = {
'High Precision': {'precision': 0.95, 'recall': 0.60},
'High Recall': {'precision': 0.70, 'recall': 0.95},
'Balanced': {'precision': 0.85, 'recall': 0.83}
}
print("F1-Score: Harmonic Mean of Precision and Recall")
print("=" * 50")
for name, metrics in scenarios.items():
prec = metrics['precision']
rec = metrics['recall']
f1 = 2 * (prec * rec) / (prec + rec) if (prec + rec) > 0 else 0
print(f"\n{name}:")
print(f" Precision: {prec:.2%}")
print(f" Recall: {rec:.2%}")
print(f" F1-Score: {f1:.2%}")
print("\nF1-Score Characteristics:")
print(" - Harmonic mean (not arithmetic)")
print(" - Penalizes large differences between precision and recall")
print(" - Best when precision and recall are similar")
print(" - Range: 0 to 1 (higher is better)")
Multi-Class Classification Metrics
For problems with more than 2 classes, we need different approaches:
y_true = ['cat', 'dog', 'bird', 'cat', 'dog']
y_pred = ['cat', 'dog', 'cat', 'cat', 'dog']
print("Multi-Class Classification:")
print(f" True: {y_true}")
print(f" Pred: {y_pred}")
correct = sum(1 for t, p in zip(y_true, y_pred) if t == p)
accuracy = correct / len(y_true)
print(f"\nAccuracy: {accuracy:.2%}")
print("\nMulti-Class Metrics:")
print(" 1. Accuracy: Overall correctness")
print(" 2. Per-Class Precision/Recall: Calculate for each class")
print(" 3. Macro Average: Average of per-class metrics")
print(" 4. Weighted Average: Weighted by class frequency")
print("\nFor 'cat' class:")
print(" - Precision: Of cat predictions, how many were actually cats?")
print(" - Recall: Of actual cats, how many were found?")
Exercise: Calculate Classification Metrics
Complete the exercise on the right side:
- Task 1: Calculate TP, TN, FP, FN from predictions
- Task 2: Calculate accuracy, precision, and recall
- Task 3: Calculate F1-score from precision and recall
- Task 4: Build and display a confusion matrix
Write your code to calculate and interpret classification metrics!
💡 Learning Tip
Practice is essential. Try modifying the code examples, experiment with different parameters, and see how changes affect the results. Hands-on experience is the best teacher!