# The Linear Model from Scratch in R

When it comes to econometrics, the main take aways from the workshops are primarily in terms of the **syntax** of yet another computer program.

## The Linear Model

Then using the **Ordinary Least Squares** approach to solving a model, we start with the following equation of the OLS model for a univariate regression.

This can be solver for the following (hat denotes the estimator, bar denotes the mean):

\[\hat{\beta_1} = \frac{ \sumˆn_{i=1} (x_i - \bar{x} )(y_i - \bar{y} ) }{(x_i - \bar{x})ˆ2 }\]We start by loading a basic data set.

Inspect the data set.

Assign our variables to objects (in the global environment)

We can now estimate the slope parameter:

Using the slope parameter we can now compute the intercept.

Lets check this using the built in command.

## The matrix model

In matrix form we can specify our general equation as:

\[y = \beta X + \epsilon\]From which we can derive our estimator:

\[\beta = (X^T *X)^{-1} * (X^T*y)\]## The matrix estimation

Use the built in command.

Now we estimate our beta ourselves, the function used to invert is called `solve()`

.

Now lets estimate with an intercept

To hand code this, we need to add a vector of ones (`1`

s).
Note that the single `1`

that we are binding to the vector `X`

will be repeated until it is the same length.

We also have to discuss the above `solve()`

function, `^-1`

is not correct syntax for a matrix inversion. In the above case it would still work correctly because our `X`

matrix is in fact a vector. If we pre-multiply this vector with the transpose of itself, we obtain a scalar.

However, for matrices wider than one column this is not the case.

This `^-1`

will invert every **individual** number in the matrix, rather than the matrix as a whole.

We want to obtain to obtain the inverse of the matrix, because this will allow us to pre-multiply on both sides, eliminating `XI`

on the **Right-Hand Side** (RHS).

We therefore use a different tool, the`solve()`

function from the `base`

package.
This function implements the QR decomposition,
which is an efficient way of deriving an inverse of a matrix.

Now we can use this matrix to estimate a model with an intercept.

Note that this is programmatically exactly the same the way that the `lm()`

function does this.

We can suppress the automatic intercept and include our `XI`

variable and we will obtain the same results.

We have now constructed a univariate (uni**variate**) model, however, from a programmatic point of view, the hurdles of multivariate modelling have already been overcome by estimating a model with an intercept (making `X`

a matrix).

It is therefore very easy to use the same method in a case with two independent variables.

we start by binding the two independent variables together (with vector of `1`

s, since we want an intercept).

Now we estimate our model

And that’s all! The leap from univariate to multivariate modelling was truly very small.

**EDIT:** in tomorrow’s post we use the method we developed here to create an easy to use function.