Hand Coding Instumental Variables

2 minute read

In a previous post we discussed the linear model and how to write a function that performs a linear regression. In this post we will use that linear model function to perform a [Two-Stage Least Squares estimation]. This estimation allows us to […]

Recall that we built the follow linear model function.

ols <- function (y, X, intercept=TRUE) {
  if (intercept) X <- cbind(1, X)
  solve(t(X)%*%X) %*% t(X)%*%y # solve for beta
}

We used the following data.

data(iris)
x1 <- iris$Petal.Width
x2 <- iris$Sepal.Length
y  <- iris$Petal.Length

This allowed us to estimate a linear model.

X <- cbind(x1, x2)
ols(y = y, X = X)
##          [,1]
##    -1.5071384
## x1  1.7481029
## x2  0.5422556

Which includes the intercept, since the default value it TRUE (see function definition above), we could estimate it without an intercept using.

ols(y = y, X = X, intercept = FALSE)
##         [,1]
## x1 1.9891655
## x2 0.2373159

Having revisited the above, we can continue with instumental variables. We will replicate an example from the AER (Applied Econometric Regressions) package.

library(AER)
data("CigarettesSW")
rprice  <- with(CigarettesSW, price/cpi)
tdiff   <- with(CigarettesSW, (taxs - tax)/cpi)
packs   <- CigarettesSW$packs

We first need to obtain our first stage estimate (putting the whole function between parentheses allows us to both write it to the object s1 and print it).

( s1 <- ols(y = rprice, X = tdiff) )
##        [,1]
##   91.103739
## X  4.163002

We can now obtain the predicted (fitted) values

Xhat <- s1[1]*tdiff + s1[2]

Using these fitted values, we can finally estimate our second stage.

ols(y = packs, X = Xhat)
##           [,1]
##   126.89145456
## X  -0.04658554

Now compare this to the results using ivreg() fucntion from the AER package.

ivreg(packs ~ rprice | tdiff)
##
## Call:
## ivreg(formula = packs ~ rprice | tdiff)
##
## Coefficients:
## (Intercept)       rprice  
##     219.576       -1.019

When we compare this estimate to the estimate from the linear model, we find that the effect of the price is significantly overestimated there.

lm(packs ~ rprice)
##
## Call:
## lm(formula = packs ~ rprice)
##
## Coefficients:
## (Intercept)       rprice  
##     222.209       -1.044

We can also do this using R’s built in lm function.

# first stage
s1 <- lm(rprice ~ tdiff)

# estimate second stage using fitted values (Xhat)
lm(packs ~ s1$fitted.values)
##
## Call:
## lm(formula = packs ~ s1$fitted.values)
##
## Coefficients:
##      (Intercept)  s1$fitted.values  
##          219.576            -1.019

As an intermediate form, we can manually calculate Xhat (fitted.values) and estimate using that.

# manually obtain fitted values
Xhat <- s1$coefficients[2]*tdiff + s1$coefficients[1]

# estimate second stage using Xhat
lm(packs ~ Xhat)
##
## Call:
## lm(formula = packs ~ Xhat)
##
## Coefficients:
## (Intercept)         Xhat  
##     219.576       -1.019

Note that if you estimate a TSLS using the lm function, that you can only use the coefficients, the error terms will be wrong.