Linear Regression
Understanding Linear Regression
Linear regression is one of the most fundamental and widely used machine learning algorithms. It's used to predict continuous numerical values based on input features. The name "linear" comes from the fact that it assumes a linear relationship between the input features and the output.
Think of linear regression as finding the best straight line through your data points. This line can then be used to make predictions for new data points.
How Linear Regression Works
Linear regression finds the line that minimizes the distance between the line and all data points. The equation of a line is:
y = mx + b
Where:
- y is the predicted output (dependent variable)
- x is the input feature (independent variable)
- m is the slope (coefficient) - how much y changes for each unit change in x
- b is the intercept - the value of y when x is 0
In machine learning, we call m the coefficient and b the intercept.
Simple Example
Let's say you want to predict house prices based on size. You have data showing that larger houses cost more. Linear regression will find the best line that describes this relationship:
Using scikit-learn
Scikit-learn makes it easy to implement linear regression. The LinearRegression class handles all the complex math for you:
💡 Key Methods
fit(X, y) - Trains the model on your data
predict(X) - Makes predictions on new data
coef_ - The learned coefficient (slope)
intercept_ - The learned intercept
Multiple Features
Linear regression can handle multiple input features. This is called multiple linear regression:
Evaluating Model Performance
It's important to evaluate how well your model performs. Common metrics include Mean Squared Error (MSE) and R-squared:
Real-World Applications
Linear regression is used in many real-world scenarios:
- Sales Forecasting: Predict sales based on advertising spend
- Price Prediction: Predict house prices, car prices, etc.
- Risk Analysis: Predict insurance claims based on customer data
- Medical: Predict patient outcomes based on treatment data
Assumptions of Linear Regression
Linear regression works best when:
- The relationship between features and target is linear
- There's little or no multicollinearity (features aren't highly correlated)
- Errors are normally distributed
- There's homoscedasticity (constant variance of errors)
When these assumptions aren't met, you might need other algorithms like polynomial regression or regularization techniques.