Polynomial Regression

Understanding Polynomial Regression

Polynomial regression extends linear regression by adding polynomial terms (x², x³, etc.) to capture non-linear relationships. While linear regression assumes a straight-line relationship, polynomial regression can model curves.

This is useful when your data shows a curved pattern rather than a straight line. For example, the relationship between temperature and ice cream sales might be curved—very hot and very cold days both reduce sales.

When to Use Polynomial Regression

Use polynomial regression when:

Your data shows a curved relationship (not linear)
Linear regression gives poor results
You need to model non-linear trends
The relationship has a clear polynomial pattern

However, be careful—high-degree polynomials can overfit and create wiggly curves that don't generalize well.

Creating Polynomial Features

Polynomial regression works by creating new features from existing ones. For example, if you have x, it creates x², x³, etc.:

polynomial_features.py
# Creating polynomial features
from sklearn.preprocessing import PolynomialFeatures
import numpy as np

# Original feature
X = np.array([[1], [2], [3], [4]])
print("Original features:")
print(X)

# Create polynomial features (degree 2)
# This creates: [1, x, x²] for each sample
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

print("\nPolynomial features (degree 2):")
print(X_poly)
print("\nColumns: [1, x, x²]")

# Degree 3: creates [1, x, x², x³]
poly3 = PolynomialFeatures(degree=3)
X_poly3 = poly3.fit_transform(X)
print("\nPolynomial features (degree 3):")
print(X_poly3)

Fitting Polynomial Regression

Once you have polynomial features, you use regular linear regression on them:

polynomial_regression.py
# Polynomial regression example
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np

# Data with quadratic relationship: y = 2x²
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 8, 18, 32, 50])  # y = 2x²

# Create polynomial features
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

# Train linear regression on polynomial features
model = LinearRegression()
model.fit(X_poly, y)

print("Model coefficients:", model.coef_)
print("Intercept:", model.intercept_)

# Make predictions
X_new = np.array([[6], [7]])
X_new_poly = poly.transform(X_new)
predictions = model.predict(X_new_poly)

print("\nPredictions:")
for x, pred in zip(X_new, predictions):
    expected = 2 * x[0] ** 2
    print(f"x={x[0]}, predicted={pred:.2f}, expected={expected}")

Choosing the Right Degree

The degree of the polynomial is crucial. Too low and you underfit; too high and you overfit:

degree_comparison.py
# Comparing different polynomial degrees
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np

X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 8, 18, 32, 50])

# Try different degrees
for degree in [1, 2, 3, 4]:
    poly = PolynomialFeatures(degree=degree)
    X_poly = poly.fit_transform(X)
    model = LinearRegression()
    model.fit(X_poly, y)
    predictions = model.predict(X_poly)
    mse = mean_squared_error(y, predictions)
    print(f"Degree {degree}: MSE = {mse:.2f}")

print("\nLower MSE is better, but watch for overfitting!")

Real-World Example

Polynomial regression is useful for modeling relationships like:

Temperature vs. Energy Usage: Very hot and very cold days use more energy
Age vs. Income: Income increases then plateaus with age
Speed vs. Fuel Efficiency: Optimal speed exists, efficiency drops at extremes

real_world_example.py
# Example: Temperature vs Energy Usage
# Energy usage is lowest at moderate temperatures
# High at very hot (AC) and very cold (heating)

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np

# Temperature (Fahrenheit)
temp = np.array([[50], [60], [70], [80], [90], [100]])
# Energy usage (kWh) - U-shaped curve
energy = np.array([120, 80, 60, 80, 120, 180])

# Use degree 2 to capture the curve
poly = PolynomialFeatures(degree=2)
temp_poly = poly.fit_transform(temp)
model = LinearRegression()
model.fit(temp_poly, energy)

print("Temperature vs Energy Usage Model")
print("The model captures the U-shaped relationship!")

Beware of Overfitting

High-degree polynomials can create wiggly curves that fit training data perfectly but fail on new data:

⚠️ Overfitting Warning

• Degree 1-3: Usually safe, captures smooth curves

• Degree 4-6: Risky, may overfit

• Degree 7+: Almost always overfits

Always validate on test data to check for overfitting!

💡 Best Practice

Start with degree 2 (quadratic). If that doesn't work well, try degree 3. Rarely go higher than degree 4 unless you have a very large dataset and strong reason to believe a high-degree polynomial is needed.

🎉

Lesson Complete!

Great work! Continue to the next lesson.