π― Project: Complete Model Evaluation
This project will help you apply everything you've learned about model evaluation. You'll evaluate multiple models, compare their performance, identify overfitting/underfitting, and use cross-validation to get reliable estimates.
Proper model evaluation is crucial for building reliable ML systems. This project gives you hands-on experience with the complete evaluation workflow.
Evaluating Multiple Models
In practice, you'll compare multiple models. Here's how to evaluate them systematically:
models = {
'Model A': {'train_acc': 0.95, 'test_acc': 0.72},
'Model B': {'train_acc': 0.85, 'test_acc': 0.83},
'Model C': {'train_acc': 0.60, 'test_acc': 0.58}
}
print("Model Comparison:")
print("=" * 50")
for name, perf in models.items():
gap = perf['train_acc'] - perf['test_acc']
print(f"\n{name}:")
print(f" Train: {perf['train_acc']:.0%}")
print(f" Test: {perf['test_acc']:.0%}")
print(f" Gap: {gap:.0%}")
if gap > 0.15:
print(" Status: OVERFITTING")
elif perf['test_acc'] < 0.70:
print(" Status: UNDERFITTING")
else:
print(" Status: GOOD FIT")
print("\nBest Model: Model B (good test performance, small gap)")
Using Cross-Validation for Evaluation
Cross-validation gives you more reliable performance estimates:
model_cv_scores = {
'Model A': [0.70, 0.72, 0.71, 0.73, 0.72],
'Model B': [0.82, 0.84, 0.83, 0.85, 0.83],
'Model C': [0.57, 0.59, 0.58, 0.60, 0.58]
}
print("Cross-Validation Results:")
print("=" * 50")
for name, scores in model_cv_scores.items():
mean_score = sum(scores) / len(scores)
variance = sum((s - mean_score) ** 2 for s in scores) / len(scores)
std_score = variance ** 0.5
print(f"\n{name}:")
print(f" CV Scores: {[f'{s:.2%}' for s in scores]}")
print(f" Mean: {mean_score:.2%}")
print(f" Std Dev: {std_score:.2%}")
print(f" Range: {min(scores):.2%} - {max(scores):.2%}")
print("\nCV provides:")
print(" - More reliable performance estimate")
print(" - Shows model stability (lower std = more stable)")
print(" - Better model comparison")
Complete Evaluation Workflow
A complete evaluation includes multiple metrics and techniques:
print("Complete Evaluation Checklist:")
print("=" * 50")
print("\n1. Basic Metrics:")
print(" β Accuracy (classification)")
print(" β Precision, Recall, F1-score")
print(" β MAE, RMSE, RΒ² (regression)")
print("\n2. Train-Test Comparison:")
print(" β Compare training vs test performance")
print(" β Identify overfitting/underfitting")
print(" β Calculate performance gap")
print("\n3. Cross-Validation:")
print(" β Use k-fold cross-validation")
print(" β Calculate mean and std of scores")
print(" β Assess model stability")
print("\n4. Confusion Matrix:")
print(" β Build confusion matrix")
print(" β Analyze TP, TN, FP, FN")
print(" β Identify error patterns")
print("\n5. Model Comparison:")
print(" β Compare multiple models")
print(" β Select best model")
print(" β Document findings")
Exercise: Complete Model Evaluation Project
Complete the exercise on the right side:
- Task 1: Evaluate three models with train/test accuracy
- Task 2: Identify which model has overfitting/underfitting
- Task 3: Calculate cross-validation scores for each model
- Task 4: Compare models and select the best one
- Task 5: Calculate confusion matrix for the best model
Write your code to complete this comprehensive model evaluation project!
π‘ Project Tips
Break the project into smaller tasks. Complete and test each part before moving to the next. Don't try to do everything at onceβiterative development leads to better results!
π
Lesson Complete!
Great work! Continue to the next lesson.