Understanding Polynomial Regression
Polynomial regression extends linear regression by adding polynomial terms (x², x³, etc.) to capture non-linear relationships. While linear regression assumes a straight-line relationship, polynomial regression can model curves.
This is useful when your data shows a curved pattern rather than a straight line. For example, the relationship between temperature and ice cream sales might be curved—very hot and very cold days both reduce sales.
When to Use Polynomial Regression
Use polynomial regression when:
- Your data shows a curved relationship (not linear)
- Linear regression gives poor results
- You need to model non-linear trends
- The relationship has a clear polynomial pattern
However, be careful—high-degree polynomials can overfit and create wiggly curves that don't generalize well.
Creating Polynomial Features
Polynomial regression works by creating new features from existing ones. For example, if you have x, it creates x², x³, etc.:
from sklearn.preprocessing import PolynomialFeatures
import numpy as np
X = np.array([[1], [2], [3], [4]])
print("Original features:")
print(X)
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
print("\nPolynomial features (degree 2):")
print(X_poly)
print("\nColumns: [1, x, x²]")
poly3 = PolynomialFeatures(degree=3)
X_poly3 = poly3.fit_transform(X)
print("\nPolynomial features (degree 3):")
print(X_poly3)
Fitting Polynomial Regression
Once you have polynomial features, you use regular linear regression on them:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 8, 18, 32, 50])
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
model = LinearRegression()
model.fit(X_poly, y)
print("Model coefficients:", model.coef_)
print("Intercept:", model.intercept_)
X_new = np.array([[6], [7]])
X_new_poly = poly.transform(X_new)
predictions = model.predict(X_new_poly)
print("\nPredictions:")
for x, pred in zip(X_new, predictions):
expected = 2 * x[0] ** 2
print(f"x={x[0]}, predicted={pred:.2f}, expected={expected}")
Choosing the Right Degree
The degree of the polynomial is crucial. Too low and you underfit; too high and you overfit:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 8, 18, 32, 50])
for degree in [1, 2, 3, 4]:
poly = PolynomialFeatures(degree=degree)
X_poly = poly.fit_transform(X)
model = LinearRegression()
model.fit(X_poly, y)
predictions = model.predict(X_poly)
mse = mean_squared_error(y, predictions)
print(f"Degree {degree}: MSE = {mse:.2f}")
print("\nLower MSE is better, but watch for overfitting!")
Real-World Example
Polynomial regression is useful for modeling relationships like:
- Temperature vs. Energy Usage: Very hot and very cold days use more energy
- Age vs. Income: Income increases then plateaus with age
- Speed vs. Fuel Efficiency: Optimal speed exists, efficiency drops at extremes
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np
temp = np.array([[50], [60], [70], [80], [90], [100]])
energy = np.array([120, 80, 60, 80, 120, 180])
poly = PolynomialFeatures(degree=2)
temp_poly = poly.fit_transform(temp)
model = LinearRegression()
model.fit(temp_poly, energy)
print("Temperature vs Energy Usage Model")
print("The model captures the U-shaped relationship!")
Beware of Overfitting
High-degree polynomials can create wiggly curves that fit training data perfectly but fail on new data:
⚠️ Overfitting Warning
• Degree 1-3: Usually safe, captures smooth curves
• Degree 4-6: Risky, may overfit
• Degree 7+: Almost always overfits
Always validate on test data to check for overfitting!
💡 Best Practice
Start with degree 2 (quadratic). If that doesn't work well, try degree 3. Rarely go higher than degree 4 unless you have a very large dataset and strong reason to believe a high-degree polynomial is needed.
🎉
Lesson Complete!
Great work! Continue to the next lesson.