Classification Metrics

Classification Metrics Explained

Classification metrics help you evaluate how well your model categorizes data. Unlike regression (predicting numbers), classification predicts categories, so we need different metrics like accuracy, precision, recall, F1-score, and ROC-AUC.

Different metrics are important for different scenarios. For example, in medical diagnosis, you might care more about recall (finding all sick patients) than precision (avoiding false positives).

Accuracy, Precision, and Recall

These are the fundamental classification metrics. Let's understand each:

basic_metrics.py
# Basic Classification Metrics

# Example: Spam email detection
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]  # 1=spam, 0=not spam
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]

print("True labels: ", y_true)
print("Predicted: ", y_pred)

# Calculate TP, TN, FP, FN
tp = sum(1 for t, p in zip(y_true, y_pred) if t == 1 and p == 1)
tn = sum(1 for t, p in zip(y_true, y_pred) if t == 0 and p == 0)
fp = sum(1 for t, p in zip(y_true, y_pred) if t == 0 and p == 1)
fn = sum(1 for t, p in zip(y_true, y_pred) if t == 1 and p == 0)

print(f"\nConfusion Matrix Components:")
print(f"  True Positives (TP): {tp}")
print(f"  True Negatives (TN): {tn}")
print(f"  False Positives (FP): {fp}")
print(f"  False Negatives (FN): {fn}")

# Accuracy: (TP + TN) / (TP + TN + FP + FN)
accuracy = (tp + tn) / (tp + tn + fp + fn) if (tp + tn + fp + fn) > 0 else 0
print(f"\nAccuracy: {accuracy:.2%}")
print("  Overall correctness - can be misleading with imbalanced data")

# Precision: TP / (TP + FP)
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
print(f"\nPrecision: {precision:.2%}")
print("  Of all spam predictions, how many were actually spam?")

# Recall: TP / (TP + FN)
recall = tp / (tp + fn) if (tp + fn) > 0 else 0
print(f"\nRecall: {recall:.2%}")
print("  Of all actual spam emails, how many did we catch?")

F1-Score and Balanced Metrics

F1-score balances precision and recall, which is useful when classes are imbalanced:

f1_score.py
# F1-Score and Balanced Metrics

# Example with different scenarios
scenarios = {
    'High Precision': {'precision': 0.95, 'recall': 0.60},
    'High Recall': {'precision': 0.70, 'recall': 0.95},
    'Balanced': {'precision': 0.85, 'recall': 0.83}
}

print("F1-Score: Harmonic Mean of Precision and Recall")
print("=" * 50")

for name, metrics in scenarios.items():
    prec = metrics['precision']
    rec = metrics['recall']
    
    # F1 = 2 * (precision * recall) / (precision + recall)
    f1 = 2 * (prec * rec) / (prec + rec) if (prec + rec) > 0 else 0
    
    print(f"\n{name}:")
    print(f"  Precision: {prec:.2%}")
    print(f"  Recall: {rec:.2%}")
    print(f"  F1-Score: {f1:.2%}")

print("\nF1-Score Characteristics:")
print("  - Harmonic mean (not arithmetic)")
print("  - Penalizes large differences between precision and recall")
print("  - Best when precision and recall are similar")
print("  - Range: 0 to 1 (higher is better)")

Multi-Class Classification Metrics

For problems with more than 2 classes, we need different approaches:

multiclass.py
# Multi-Class Classification Metrics

# Example: Classifying images into 3 categories
y_true = ['cat', 'dog', 'bird', 'cat', 'dog']
y_pred = ['cat', 'dog', 'cat', 'cat', 'dog']

print("Multi-Class Classification:")
print(f"  True: {y_true}")
print(f"  Pred: {y_pred}")

# Calculate accuracy (same for multi-class)
correct = sum(1 for t, p in zip(y_true, y_pred) if t == p)
accuracy = correct / len(y_true)
print(f"\nAccuracy: {accuracy:.2%}")

print("\nMulti-Class Metrics:")
print("  1. Accuracy: Overall correctness")
print("  2. Per-Class Precision/Recall: Calculate for each class")
print("  3. Macro Average: Average of per-class metrics")
print("  4. Weighted Average: Weighted by class frequency")

print("\nFor 'cat' class:")
print("  - Precision: Of cat predictions, how many were actually cats?")
print("  - Recall: Of actual cats, how many were found?")

Exercise: Calculate Classification Metrics

Complete the exercise on the right side:

Task 1: Calculate TP, TN, FP, FN from predictions
Task 2: Calculate accuracy, precision, and recall
Task 3: Calculate F1-score from precision and recall
Task 4: Build and display a confusion matrix

Write your code to calculate and interpret classification metrics!

💡 Learning Tip

Practice is essential. Try modifying the code examples, experiment with different parameters, and see how changes affect the results. Hands-on experience is the best teacher!

🎉

Lesson Complete!

Great work! Continue to the next lesson.