Chapter 6: Regression Models / Lesson 27

Polynomial Regression

Understanding Polynomial Regression

Polynomial regression extends linear regression by adding polynomial terms (x², x³, etc.) to capture non-linear relationships. While linear regression assumes a straight-line relationship, polynomial regression can model curves.

This is useful when your data shows a curved pattern rather than a straight line. For example, the relationship between temperature and ice cream sales might be curved—very hot and very cold days both reduce sales.

When to Use Polynomial Regression

Use polynomial regression when:

  • Your data shows a curved relationship (not linear)
  • Linear regression gives poor results
  • You need to model non-linear trends
  • The relationship has a clear polynomial pattern

However, be careful—high-degree polynomials can overfit and create wiggly curves that don't generalize well.

Creating Polynomial Features

Polynomial regression works by creating new features from existing ones. For example, if you have x, it creates x², x³, etc.:

polynomial_features.py
# Creating polynomial features from sklearn.preprocessing import PolynomialFeatures import numpy as np # Original feature X = np.array([[1], [2], [3], [4]]) print("Original features:") print(X) # Create polynomial features (degree 2) # This creates: [1, x, x²] for each sample poly = PolynomialFeatures(degree=2) X_poly = poly.fit_transform(X) print("\nPolynomial features (degree 2):") print(X_poly) print("\nColumns: [1, x, x²]") # Degree 3: creates [1, x, x², x³] poly3 = PolynomialFeatures(degree=3) X_poly3 = poly3.fit_transform(X) print("\nPolynomial features (degree 3):") print(X_poly3)

Fitting Polynomial Regression

Once you have polynomial features, you use regular linear regression on them:

polynomial_regression.py
# Polynomial regression example from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression import numpy as np # Data with quadratic relationship: y = 2x² X = np.array([[1], [2], [3], [4], [5]]) y = np.array([2, 8, 18, 32, 50]) # y = 2x² # Create polynomial features poly = PolynomialFeatures(degree=2) X_poly = poly.fit_transform(X) # Train linear regression on polynomial features model = LinearRegression() model.fit(X_poly, y) print("Model coefficients:", model.coef_) print("Intercept:", model.intercept_) # Make predictions X_new = np.array([[6], [7]]) X_new_poly = poly.transform(X_new) predictions = model.predict(X_new_poly) print("\nPredictions:") for x, pred in zip(X_new, predictions): expected = 2 * x[0] ** 2 print(f"x={x[0]}, predicted={pred:.2f}, expected={expected}")

Choosing the Right Degree

The degree of the polynomial is crucial. Too low and you underfit; too high and you overfit:

degree_comparison.py
# Comparing different polynomial degrees from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error import numpy as np X = np.array([[1], [2], [3], [4], [5]]) y = np.array([2, 8, 18, 32, 50]) # Try different degrees for degree in [1, 2, 3, 4]: poly = PolynomialFeatures(degree=degree) X_poly = poly.fit_transform(X) model = LinearRegression() model.fit(X_poly, y) predictions = model.predict(X_poly) mse = mean_squared_error(y, predictions) print(f"Degree {degree}: MSE = {mse:.2f}") print("\nLower MSE is better, but watch for overfitting!")

Real-World Example

Polynomial regression is useful for modeling relationships like:

  • Temperature vs. Energy Usage: Very hot and very cold days use more energy
  • Age vs. Income: Income increases then plateaus with age
  • Speed vs. Fuel Efficiency: Optimal speed exists, efficiency drops at extremes
real_world_example.py
# Example: Temperature vs Energy Usage # Energy usage is lowest at moderate temperatures # High at very hot (AC) and very cold (heating) from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression import numpy as np # Temperature (Fahrenheit) temp = np.array([[50], [60], [70], [80], [90], [100]]) # Energy usage (kWh) - U-shaped curve energy = np.array([120, 80, 60, 80, 120, 180]) # Use degree 2 to capture the curve poly = PolynomialFeatures(degree=2) temp_poly = poly.fit_transform(temp) model = LinearRegression() model.fit(temp_poly, energy) print("Temperature vs Energy Usage Model") print("The model captures the U-shaped relationship!")

Beware of Overfitting

High-degree polynomials can create wiggly curves that fit training data perfectly but fail on new data:

⚠️ Overfitting Warning

• Degree 1-3: Usually safe, captures smooth curves

• Degree 4-6: Risky, may overfit

• Degree 7+: Almost always overfits

Always validate on test data to check for overfitting!

💡 Best Practice

Start with degree 2 (quadratic). If that doesn't work well, try degree 3. Rarely go higher than degree 4 unless you have a very large dataset and strong reason to believe a high-degree polynomial is needed.

🎉

Lesson Complete!

Great work! Continue to the next lesson.

main.py
📤 Output
Click "Run" to execute...