Lesson 48: Model Deployment

What is Model Deployment?

Model deployment is the process of making a trained machine learning model available for use in production environments. A model that works well in development is only valuable if it can serve real users and make predictions on new data reliably and efficiently.

Deployment involves saving the model, creating an interface for predictions, monitoring performance, and ensuring scalability and reliability. This is a critical step that bridges the gap between development and production.

Saving a Trained Model
from tensorflow import keras
import pickle
import joblib

# Save Keras/TensorFlow model
model = keras.Sequential([...])
model.compile(...)
model.fit(X_train, y_train)
model.save('my_model.h5')  # H5 format

# Save scikit-learn model
from sklearn.ensemble import RandomForestClassifier
sklearn_model = RandomForestClassifier()
sklearn_model.fit(X_train, y_train)
joblib.dump(sklearn_model, 'sklearn_model.pkl')  # Pickle format

print("Models saved and ready for deployment!")

Deployment Strategies

There are several approaches to deploying ML models:

REST API: Create a web service that accepts HTTP requests and returns predictions (Flask, FastAPI)
Batch Processing: Process data in batches on a schedule (good for non-real-time predictions)
Edge Deployment: Deploy models on devices (mobile apps, IoT devices) for offline predictions
Cloud Services: Use managed services (AWS SageMaker, Google AI Platform, Azure ML)

Simple Flask API for Model Serving
from flask import Flask, request, jsonify
import joblib
import numpy as np

app = Flask(__name__)
model = joblib.load('model.pkl')

@app.route('/predict', methods=['POST'])
def predict():
    data = request.json
    features = np.array(data['features']).reshape(1, -1)
    prediction = model.predict(features)[0]
    return jsonify({'prediction': float(prediction)})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

# Client sends: POST /predict with {"features": [1, 2, 3]}
# Server returns: {"prediction": 0.85}

Model Serving Considerations

When deploying models, consider:

Scalability: Can your deployment handle increased load? Use load balancers and multiple instances
Latency: Response time matters for real-time applications; optimize model size and inference speed
Versioning: Track model versions and enable rollback if needed
Monitoring: Track prediction accuracy, latency, and errors in production
Security: Protect API endpoints, validate inputs, handle sensitive data appropriately

💡 Production Best Practices

Always validate input data, handle errors gracefully, log predictions for debugging, and monitor model performance over time. Model drift (performance degradation) can occur as data distributions change, so regular retraining is important!

Loading and Using Saved Models

Once saved, models can be loaded and used for predictions:

Loading Models for Predictions
# Load Keras model
from tensorflow import keras
model = keras.models.load_model('my_model.h5')

# Load scikit-learn model
import joblib
sklearn_model = joblib.load('sklearn_model.pkl')

# Make predictions
new_data = [[5.1, 3.5, 1.4, 0.2]]
prediction = sklearn_model.predict(new_data)
probabilities = sklearn_model.predict_proba(new_data)

print(f"Prediction: {prediction[0]}")
print(f"Probabilities: {probabilities[0]}")

Practical Applications

Model deployment enables real-world ML applications:

E-commerce: Product recommendations, fraud detection, price optimization
Healthcare: Medical diagnosis systems, patient risk prediction
Finance: Credit scoring, algorithmic trading, fraud detection
Manufacturing: Quality control, predictive maintenance
Mobile Apps: Image recognition, language translation, voice assistants

Exercise: Save and Load a Model

In the exercise on the right, you'll train a simple model, save it to a file, load it back, and make predictions. This demonstrates the fundamental workflow of model deployment.

This hands-on exercise will help you understand the basics of model serialization and serving.

Model Deployment