Longitudinal Survey Designs

7 minute read

Notes for Chapter 4 of Causal Inference with Survey Data on LinkedIn Learning, given by Franz Buscha. I’m using this series of posts to take some notes.

import graphviz as gr
def draw_causal_graph(
    edge_list, node_props=None, edge_props=None, graph_direction="UD"
    """Utility to draw a causal (directed) graph
    Taken from: https://github.com/dustinstansbury/statistical-rethinking-2023/blob/a0f4f2d15a06b33355cf3065597dcb43ef829991/utils.py#L52-L66

    g = gr.Digraph(graph_attr={"rankdir": graph_direction})

    edge_props = {} if edge_props is None else edge_props
    for e in edge_list:
        props = edge_props[e] if e in edge_props else {}
        g.edge(e[0], e[1], **props)

    if node_props is not None:
        for name, props in node_props.items():
            g.node(name=name, **props)
    return g

Surveys with longitudinal data

  • A series of snapshots.
  • Captures information from the same subjects across multiple points in time.
  • Useful in understanding how relationships evolve and spotting trends.

Example: A training program


  • snapshot
  • static
  • limited causality
  • quick and cheap


  • time series
  • dynamic (can follow someone’s productivity over time)
  • better causality
  • slow and expensive

Types of longitudinal data

  1. Panel survey
    • Collect data on individuals, households, or companies over short time periods. Example: studies of demographic dynamics of families.
  2. Cohort survey
    • Follow a group of people who share a common characteristic or experience within a defined survey.
  3. Repeated cross-section
    • Collect data from different samples over time but from the same population.

Statistical Framework

  • Key to working with time is the t subscript
\[Y_{it} = \beta_0 + \beta_1X1_{it} + ... + \beta_nXn_{it} + \epsilon_{it}\]
  • Time subscripts are manipulated by methods in different ways


  • Allow for a deeper level of analysis, especially for cause-and-effect relationships
  • Remember to consider challenges such as data attrition, time-carrying confounders, and complexity of such data
  • They often provide a richer and more nuanced view of the world.

Regression models with time effects

  • Adding time to a regression model can significantly improve causal inference
  • Time flows in one direction
  • Time trends and lagged values are common ways to include time

OLS with longitudinal data

  • Work with time is the t subscript
  • Static model makes no specific use of time from a methods perspective
  • Time can be added to this model

Time manipulation: trends

  • Time can be included as a variable (linear or otherwise)
\[Y_{it} = \beta_0 + \beta_1X1_{it} + \beta_2X2_{it} + \beta_3X3_{it} + \beta_4T_{t} + \epsilon_{it}\]
  • T is simply the survey time variable
  • Many processes trend, so it makes sense to add time as a control

Time manipulation: lags

  • Lags help explain how past values of X are related to present values of Y
  • Help trace how past events affect today’s outcome
  • Termed finite distributed lag models of order N

  • Model of order 2:
\[Y_{it} = \beta_0 + \beta_1X1_{it} + \beta_2X2_{it} + \beta_3X3_{it} + \beta_4X3_{it-1} + \beta_5X3_{it-2} + \epsilon_{it}\]

X1 and X2 are measured in the present. X3 is measured at three timepoints (present, lag of 1 and lag of 2.)

  • $\beta_3, \beta_4, \beta_5$ are independent; they are often summed to estimate a long-run effect of X on Y

  • Powerful model for estimating cause and effect of a variable



  • Capture dynamic effects
  • Temporal causality
  • Flexibility


  • Require lots of data
  • Autocorrelation/multi-collinearity
  • Reverse causality


  • Using time in a regression can be a real game changer
  • You can uncover short- and long-run effects, which cannot be done using static models

Fixed effects regression models

  • A straightforward causal method that requires fewer theoretical assumptions.
  • Has one major disadvantage - it’s terminal.
  • Very frequently used with panel data.

Fixed effect: A DAG approach

  • The focus is on variation over a data unit over time
  • X2 and u are confounders
    • X2 can be controlled for but not u
  • Fixed effects removes both
        ("u", "X1_1"),
        ("u", "X1_2"),
        ("u", "X1_3"),
        ("u", "Y_1"),
        ("u", "Y_2"),
        ("u", "Y_3"),
        ("X2", "X1_1"),
        ("X2", "X1_2"),
        ("X2", "X1_3"),
        ("X2", "Y_1"),
        ("X2", "Y_2"),
        ("X2", "Y_3"),
        ("X1_1", "Y_1"),
        ("X1_2", "Y_2"),
        ("X1_3", "Y_3"),
        ("X1_1", "X1_2"),
        ("X1_2", "X1_3"),
        ("u", "X1_1"): {"style": "dashed"},
        ("u", "X1_2"): {"style": "dashed"},
        ("u", "X1_3"): {"style": "dashed"},
        ("u", "Y1_1"): {"style": "dashed"},
        ("u", "Y1_2"): {"style": "dashed"},
        ("u", "Y1_3"): {"style": "dashed"},


Implementing FE: LSDV

  • Least Squares Dummy Variable approach (LSDV): include all panel units as dummy variables in a regression
\[Y_{it} = \beta_0 + \beta_1X1_{it} + \beta_2X2_{it} + \beta_3X3_{it} + \lambda_uZ_{u_{it}} + \epsilon_{it}\]

Z is a large vector of dummy variables

  • Estimate this using standard regression
  • The many $\lambda$ terms control for time-invariant unobserved effects
  • Method is disadvantaged if panel is large or limited hardware

Implementing FE: Within-transformation

  • The within-transformation subtracts the average panel unit value from each measured data point

  • The within-transformation subtracts the average panel unit value from each measured data point

\[[Y_{it} - \bar{Y_i}] = \beta_0 + \beta_1[X1_{it} - \bar{X1_i}] + \beta_2[X2_{it} - \bar{X2_i}] + \beta_3[X3_{it} - \bar{X3_i}] + [U_{i} - \bar{U_i}] + [\epsilon_{it} - \bar{\epsilon_i}]\]
  • We’re removing data averages from each variable
  • Variables that do not vary over time are removed from the model, including unobserved variables



  • Control for unobserved confounders
  • More credible causal effects


  • Less efficient
  • Cannot analyze time-invariant variables
  • Time-varying confounders


  • A powerful tool for longitudinal data structures
  • Control for unobserved time-invariant confounders
  • Always estimate FE models if you can, to see if the results differ
  • Continue to think about time-varying confounders

Difference-in-difference estimation

  • Older technique, relatively simple
  • It can be done using only four numbers, which is often as good as advanced methods
  • Compares the difference in outcomes before and after for two groups = the difference in the differences
  • DiD is then the effect of the intervention

Basic DiD: Minimum Wages

  • Famous study by Card and Kreuger, won Nobel Prize
  • They were interested in the effect of minimum wages on employment
  • Theory says it can be both good and bad


  • New Jersey: Feb 1992 = $4.25
  • New Jersey: Nov 1992 = $5.05
  • Difference = after - before

“Unreated control”:

  • Pennsylvania: Feb 1992 = $4.25
  • Pennsylvania: Nov 1992 = $4.25

Basic DiD: Minimum Wages (fast food restaurants)

Average Employment FTE New Jersey Penn Difference
Before 20.44 23.33 -2.89
After 21.03 21.17 -0.14
Difference 0.59 -2.16 2.76

To get the full effect of the minimum wage law on employment, in NJ, use the counterfactual: assume NJ would have been on the same trajectory as Penn. That’s why you take the difference of the differences ($ 0.59 - (-2.16) $).

Basic DiD: Visualization

Here is one example but this doesn’t apply to the NJ/Penn example.

  • Key assumption of DiD estimaton is the parallel trend assumption (if nothing had happened, NJ would follow same trend as Penn); represented by dashed line.
  • The difference between the end of dashed line and labor supply in NJ is the DiD

Regression DiD

  • Needed if there are additional confounders:

\(Y_{it} = \alpha + \beta_1 Treatment_i + \beta_2 Time_i + (\beta_3 Treatment_i \times Time_t) + \beta_4 X_{it} + \epsilon_{it}\) \(Treatment = \text{1 if treated, 0 if control}\) \(Time = \text{1 if post-period, 0 if pre-period}\) \(Treatment_i \times Time_t = \text{Interaction}\)


  • Parallel trends
  • No contamination
    • Treatment cannot jump into control group and vice versa



  • Unobserved confounders
  • Easy to apply
  • Flexible (can be combined with other methods like matching techniques or standard regression techniques)


  • Requires assumptions to be met
  • Can’t be used for single-case evaluations


  • Popular evaluation strategy
  • Need good quality data, at least two periods
  • But it can offer compelling insights into the causal impact of policies and interventions, to help guide decision making

Synthetic control methods

  • A new innovation in causal inference (last 15 years)
  • Somewhat similar to DiD
  • Creating a “synthetic control” that looks like the treated (using a weighted combination of potential control units)
  • A great approach when no easy controls are available

Creating the synthetic control

  • Synthetic control is the weighted sum of other potential control units
\[SynCon = w_1 \times ControlUnit_1 + w_2 \times ControlUnit_2 + ... + w_n \times ControlUnit_n\]
  • Sum of weights, w, must equal to 1

Estimating the weights

  • The weights minimize the difference between the treated unit and the control before the intervention
  • Involves an optimization process
  • Typically, it involves control variables that assist in matching the treated and synthetic units

Control variables

  • Must be unaffected by the intervention/policy/treatment
  • Typical control variables include economic or demographic indicators, and other pre-intervention attributes
  • The quality of the synthetic control greatly depends on these variables

Evlauating the intervention

  • The effect of the intervention is the difference between the treated unit and the synthetic control:
\[Effect = Y_{treated, post} - Y_{synthetic, post}\]
  • Synthetic control estimates are tailored to recover the ATT.

Cross-Contamination is allowed

  • This occurs when control units are affected by the intervention
  • Allowed in scenarios where pure separation between treated and control isn’t possible, such as in geographic cases

Example: If CA impelements a new health policy, it’s likely neighboring states follow CA and implement some but not all policies. Neighboring control states are not clean. Example is Prop 99 for tobacco legislation.



  • Handling heterogeneity
  • Complex interventions


  • Many pretreatment periods
  • External validity
  • Pre-intervention fit


  • Synthetic control provides a sophisticated tool
  • Especially useful for analysis of interventions that affect only one unit and cross-contamination exists
%load_ext watermark
%watermark -n -u -v -iv -w
Last updated: Tue May 28 2024

Python implementation: CPython
Python version       : 3.12.3
IPython version      : 8.24.0

graphviz: 0.20.3

Watermark: 2.4.3