Handcoding a Panel Model

2 minute read

The most basic panel estimation is the Pooled OLS model, this model combines all data across indices and performs a regular Ordinary Least Squares Estimation.

# load the PLM library for panel estimation
library(plm)
# load the Crime data set
data(Crime)
# define the model
m1 <- formula(crmrte ~ prbarr + prbconv + polpc)

# create a panel data.frame (pdata.frame) object
PanelCrime <- pdata.frame(Crime, index=c("county", "year") )

# estimate Pooled OLS using the basic lm function
lm(formula = m1,
   data    = Crime)
##
## Call:
## lm(formula = m1, data = Crime)
##
## Coefficients:
## (Intercept)       prbarr      prbconv        polpc  
##    0.043643    -0.050993    -0.003251     3.055626
# estimate the Pooled OLS using the plm package
plm(formula = m1,
    data    = PanelCrime,
    model   = "pooling"  )
##
## Model Formula: crmrte ~ prbarr + prbconv + polpc
##
## Coefficients:
## (Intercept)      prbarr     prbconv       polpc
##    0.043643   -0.050993   -0.003251    3.055626

A more complex estimation method is the Fixed-Effect (or within) estimator. If our data only contains to time-periods, the results of this estimator are equivalent to a OLS estimation of the first-differenced variables.

# create data.frame with only years 81 and 82
Crime8182      <- subset(Crime, year %in% c(81, 82) )
# put into panel data.frame form (pdata.frame)
PanelCrime8182 <- pdata.frame(Crime8182, index=c("county", "year") )

# first difference the non-panel data.frame
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:plm':
##
##     between
## The following objects are masked from 'package:stats':
##
##     filter, lag
## The following objects are masked from 'package:base':
##
##     intersect, setdiff, setequal, union
Crime8182FD <- Crime8182 %>%
  group_by(county) %>%
  summarise(crmrte  = diff(crmrte),
            prbarr  = diff(prbarr),
            prbconv = diff(prbconv),
            polpc   = diff(polpc)   )

# use lm to estimate the two-period fixed-effects model
lm (formula = m1,
    data    = Crime8182FD    )
##
## Call:
## lm(formula = m1, data = Crime8182FD)
##
## Coefficients:
## (Intercept)       prbarr      prbconv        polpc  
##  -6.133e-05   -1.965e-02   -1.537e-03    3.358e+00
# verify with the plm package
plm(formula = m1,
    data    = PanelCrime8182,
    model   = "fd"           )
##
## Model Formula: crmrte ~ prbarr + prbconv + polpc
##
## Coefficients:
## (intercept)      prbarr     prbconv       polpc
## -6.1332e-05 -1.9645e-02 -1.5365e-03  3.3584e+00

If our data set contains more than two time periods, we need to estimate an proper fixed effects model. This is done by creating a fixed-effect variable for every level along the cross-sectional index (i.e. the non-time index). A simple way of doing this, is by encoding the cross-section index as a factor and including that factor in the regression (more on factors/categorical variables in the post on Handcoding a Linear Model).

fe <- lm (formula = crmrte ~ prbarr + prbconv + polpc + factor(county),
          data    = Crime)
fe$coefficients[2:4]
##       prbarr      prbconv        polpc
## -0.008008440 -0.001010476  2.029003066
      plm(formula = m1,
          data    = PanelCrime,
          model   = "within"   )
##
## Model Formula: crmrte ~ prbarr + prbconv + polpc
##
## Coefficients:
##     prbarr    prbconv      polpc
## -0.0080084 -0.0010105  2.0290031