Hand Coding a Linear Model function

2 minute read

In yesterday’s post we developed a method for constructing a multivariate linear model with an intercept.

Today we will turn the collection of loose commands into an integrated and easy to use function.

A small recap from yesterday. We start by loading data and assigning our variables to objects

data(iris)
x1 <- iris$Petal.Width
x2 <- iris$Sepal.Length
y  <- iris$Petal.Length

We now construction our linear model, the fastest way of doing this is using the QR decomposition.

XI <- cbind(x1, x2, 1) # tie all the independent variables together
solve(t(XI)%*%XI) %*% t(XI)%*%y # solve for beta
##          [,1]
## x1  1.7481029
## x2  0.5422556
##    -1.5071384

In R functions are treated as objects, which means that we ascribe our function to a name, the same way we ascribed our variables to a name (e.g. x1 <- Petal.Length).

In order to create a function, we use the construction function (a function to construct an object of the type function). Logically, but perhaps also somewhat confusingly, this constructor function is called function().

Lets start with creating a function, called squaring, that takes one numeric argument (parameter) and returns it squared.

squaring <- function(a) a^2
squaring(a = 2)
## [1] 4

We can also use negatives of course.

squaring(-3)
## [1] 9

Note that in the above call we did not even specify that the input (-3) had to be assigned to the argument a, this was automatically deduced by R.

We can also construction a function that takes two arguments and multiplies them.

multiplying <- function(a, b) {
  a * b
}
multiplying(5, -6)
## [1] -30

Note that because this function was a bit longer, I put the code on a new line. When doing this, the code has to be enclosed in curly brackets ({ and }).

It is now time to apply these insights to our linear model. The first step is to define our function which takes the arguments y and X.

linear_model <- function (y, X) {

}

We can now simply insert our sollution for beta.

linear_model <- function (y, X) {
  solve(t(X)%*%X) %*% t(X)%*%y # solve for beta
}

We can now estimate our model.

linear_model(y = y, X = XI)
##          [,1]
## x1  1.7481029
## x2  0.5422556
##    -1.5071384

Note that the argument (e.g. X) and the assigned object (XI) do not have to have the same name.

At this point we still need to manually include our incept using cbind(x1, x2, 1).

We can add a small piece to our code, which will allow the function to take care of this.

linear_model <- function (y, X, intercept=TRUE) {
  if (intercept) X <- cbind(X, 1)
  solve(t(X)%*%X) %*% t(X)%*%y # solve for beta
}

The first piece intercept=TRUE creates the argument and sets the default to TRUE. That means that, unless me manually specify default=FALSE, an intercept will be included.

The second piece if (intercept) X <- cbind(X, 1) does the following.

  1. check if intercept == TRUE
  2. if so, overwrite X with cbind(X, 1)

We can now use this function as follows.

X <- cbind(x1, x2)
linear_model(y = y, X = X)
##          [,1]
## x1  1.7481029
## x2  0.5422556
##    -1.5071384

That gives us a pretty complete linear model.