Linear Regression
Linear regression (LR) is one of the simplest supervised machine learning (SML) approach to perform statistical modeling. LR is useful in finding relationship between independent (predictor) and dependent (response) variable. Central idea of LR is to obtain a linear relationship between varaibles that best fits the data. Best fit line is one in which total prediction error (all data points) Before fitting a linear model to data, we should deterine whether or not there is a relationship between the variabes of interest. If there appears to be no association b/w the proposed explanatory and dependent varaibles (scatter plot doesn’t indicate any increasing or decreasing trends), the fitting a linear regression model to data will not provide a useful model. Association b/w two variables is correlation coefficient (-1 to 1).
Simple LR has one dependent and one independnt variable. A LR line has an equation of the form Y = a + bX, where X is the independent variable and Y is the dependent or response variable. The term a is the intercept (the value of y when x = 0) and slope of the line is b.
Using Calcofi data let us predict the relationship between temperature and salinity
let us take calcofi data
library(ggplot2)
> ggplot(bottle_dat_sal_temp, aes(x = Salnty, y = T_degC)) +
+ geom_point()