Linear Regression
Let’s explain linear regression and how it works
Definition
Linear regression is a supervised learning algorithm used to model and analyse the relationship between a dependent variable (label) and one or more independent variable (features). The “best fit” is nonetheless than a mathematical algorithm and it’s the line that minimizes the distance from itself to all the data points.
Basic Concepts
Label is the dependent variable, and it’s what you’re trying to predict. Feature(s) The variable(s) used as an input to predict the label (Y`) Coefficient Intercept
Linear regression with single feature. Linear regression with multiple features.
Basic formula
y = b(n) + w(n)x(n)
y is the value we’re trying to predict b is the y-intercept. It represents the predict value of y when x=0 w(n) is the weight of feature n. It’s the coefficient of x(n). The coefficient tell us how much y is expected to increase/decrease for a one0unite increse in x. x(n) is a feature, a known input.
Assumption of Linear Regression:
Linearity: The relationship between label(s) and feature(s) is linear!
Independence: Observations are independent of each other.
Model Fitting / Cost Function
The cost function illustrate the error between the actual value and the predicting value. How our model is predicting for the given dataset. The less the cost value, the better is the prediction!
How do we minimise this cost function? Last Squares Method:
Gradient Descent: It’s an algorithm we use to minimise the cost function value, therefore our straight line fit best on the data.
The graph will look like a convex, and we need to reach the minima of the convex, to get the best combination of value for our weights.
The process is iterative in steps, to reach the Minima. We will be never be able to converge at the minima, however we will be able to get very close! And that’s sufficient enough.
Model Evaluation
R-Squared:
Mean Squared Error:
Root Mean Squared Error:
Adjusted R-squared:
Overfitting and Regularization
Overfitting Regularization