Chapter 10: Advanced Topics & Projects / Lesson 48

Model Deployment

What is Model Deployment?

Model deployment is the process of making a trained machine learning model available for use in production environments. A model that works well in development is only valuable if it can serve real users and make predictions on new data reliably and efficiently.

Deployment involves saving the model, creating an interface for predictions, monitoring performance, and ensuring scalability and reliability. This is a critical step that bridges the gap between development and production.

Saving a Trained Model
from tensorflow import keras import pickle import joblib # Save Keras/TensorFlow model model = keras.Sequential([...]) model.compile(...) model.fit(X_train, y_train) model.save('my_model.h5') # H5 format # Save scikit-learn model from sklearn.ensemble import RandomForestClassifier sklearn_model = RandomForestClassifier() sklearn_model.fit(X_train, y_train) joblib.dump(sklearn_model, 'sklearn_model.pkl') # Pickle format print("Models saved and ready for deployment!")

Deployment Strategies

There are several approaches to deploying ML models:

  • REST API: Create a web service that accepts HTTP requests and returns predictions (Flask, FastAPI)
  • Batch Processing: Process data in batches on a schedule (good for non-real-time predictions)
  • Edge Deployment: Deploy models on devices (mobile apps, IoT devices) for offline predictions
  • Cloud Services: Use managed services (AWS SageMaker, Google AI Platform, Azure ML)
Simple Flask API for Model Serving
from flask import Flask, request, jsonify import joblib import numpy as np app = Flask(__name__) model = joblib.load('model.pkl') @app.route('/predict', methods=['POST']) def predict(): data = request.json features = np.array(data['features']).reshape(1, -1) prediction = model.predict(features)[0] return jsonify({'prediction': float(prediction)}) if __name__ == '__main__': app.run(host='0.0.0.0', port=5000) # Client sends: POST /predict with {"features": [1, 2, 3]} # Server returns: {"prediction": 0.85}

Model Serving Considerations

When deploying models, consider:

  • Scalability: Can your deployment handle increased load? Use load balancers and multiple instances
  • Latency: Response time matters for real-time applications; optimize model size and inference speed
  • Versioning: Track model versions and enable rollback if needed
  • Monitoring: Track prediction accuracy, latency, and errors in production
  • Security: Protect API endpoints, validate inputs, handle sensitive data appropriately

💡 Production Best Practices

Always validate input data, handle errors gracefully, log predictions for debugging, and monitor model performance over time. Model drift (performance degradation) can occur as data distributions change, so regular retraining is important!

Loading and Using Saved Models

Once saved, models can be loaded and used for predictions:

Loading Models for Predictions
# Load Keras model from tensorflow import keras model = keras.models.load_model('my_model.h5') # Load scikit-learn model import joblib sklearn_model = joblib.load('sklearn_model.pkl') # Make predictions new_data = [[5.1, 3.5, 1.4, 0.2]] prediction = sklearn_model.predict(new_data) probabilities = sklearn_model.predict_proba(new_data) print(f"Prediction: {prediction[0]}") print(f"Probabilities: {probabilities[0]}")

Practical Applications

Model deployment enables real-world ML applications:

  • E-commerce: Product recommendations, fraud detection, price optimization
  • Healthcare: Medical diagnosis systems, patient risk prediction
  • Finance: Credit scoring, algorithmic trading, fraud detection
  • Manufacturing: Quality control, predictive maintenance
  • Mobile Apps: Image recognition, language translation, voice assistants

Exercise: Save and Load a Model

In the exercise on the right, you'll train a simple model, save it to a file, load it back, and make predictions. This demonstrates the fundamental workflow of model deployment.

This hands-on exercise will help you understand the basics of model serialization and serving.

🎉

Lesson Complete!

Great work! Continue to the next lesson.

main.py
📤 Output
Click "Run" to execute...