Chapter 16 Instrumental variables

Required reading

  • Gertler, Paul, Sebastian Martinez, Patrick Premand, Laura Rawlings, and Christel Vermeersch, ‘Impact Evaluation in Practice’, Chapter 5.

Required viewing

Recommended reading

  • Angrist, Joshua D., and Jörn-Steffen Pischke, 2008, Mostly harmless econometrics: An empiricist’s companion, Princeton University Press, Chapter 4.
  • Cunningham, Scott, ‘Causal Inference: The Mixtape’, Chapter ‘Instrumental variables’, freely available at:
  • Grogger, Jeffrey, Andreas Steinmayr, Joachim Winter, 2020, ‘The Wage Penalty of Regional Accents’, NBER Working Paper No. 26719.
  • Taddy, Matt, 2019, Business Data Science, Chapter 5, pp. 152-162.

Key concepts/skills/etc

  • Identifying opportunities for instrumental variables.
  • Implementing instrumental variables.
  • Challenges to the validity of instrumental variables.

Key libraries

  • estimatr
  • tidyverse

Key functions/etc

  • iv_robust()


  • What is an instrumental variable?
  • What are some circumstances in which instrumental variables might be useful?
  • What conditions must instrumental variables satisfy?
  • Who were some of the early instrumental variable authors?
  • Can you please think of and explain an application of instrumental variables in your own life?

16.1 Introduction

Instrumental variables (IV) is an approach that can be handy when we have some type of treatment and control going on, but we have a lot of correlation with other variables and we possibly don’t have a variable that actually measures what we are interested in. So adjusting for observables will not be enough to create a good estimate. Instead we find some variable - the eponymous instrumental variable - that is:

  1. correlated with the treatment variable, but
  2. not correlated with the outcome.

This solves our problem because the only way the instrumental variable can have an effect is through the treatment variable, and so we are able to adjust our understanding of the effect of the treatment variable appropriately. The trade-off is that instrumental variables must satisfy a bunch of different assumptions, and that, frankly, they are difficult to identify ex ante. Nonetheless, when you are able to use them they are a powerful tool for speaking about causality.

The canonical instrumental variables example is smoking. These days we know that smoking causes cancer. But because smoking is correlated with a lot of other variables, for instance, education, it could be that it was actually education that causes cancer. RCTs may be possible, but they are likely to be troublesome in terms of speed and ethics, and so instead we look for some other variable that is correlated with smoking, but not, in and of itself, with lung cancer. In this case, we look to tax rates, and other policy responses, on cigarettes. As the tax rates on cigarettes are correlated with the number of cigarettes that are smoked, but not correlated with lung cancer, other than through their impact on cigarette smoking, through them we can assess the effect of cigarettes smoked on lung cancer.

To implement instrumental variables we first regress tax rates on cigarette smoking to get some coefficient on the instrumental variable, and then (in a separate regression) regress tax rates on lung cancer to again get some coefficient on the instrumental variable. Our estimate is then the ratio of these coefficients. (Gelman and Hill 2007, 219) describe this ratio as the ‘Wald estimate’.

Following the language of (Gelman and Hill 2007, 216) when we use instrumental variables we make a variety of assumptions including:

  • Ignorability of the instrument.
  • Correlation between the instrumental variable and the treatment variable.
  • Monotonicity.
  • Exclusion restriction.

To summarise exactly what instrumental variables is about, I cannot do better than recommend the first few pages of the ‘Instrumental Variables’ chapter in Cunningham (2020), and this key paragraph in particular (by way of background, Cunningham has explained why it would have been impossible to randomly allocate ‘clean’ and ‘dirty’ water through a randomised controlled trial and then continues…):

Snow would need a way to trick the data such that the allocation of clean and dirty water to people was not associated with the other determinants of cholera mortality, such as hygiene and poverty. He just would need for someone or something to be making this treatment assignment for him.

Fortunately for Snow, and the rest of London, that someone or something existed. In the London of the 1800s, there were many different water companies serving different areas of the city. Some were served by more than one company. Several took their water from the Thames, which was heavily polluted by sewage. The service areas of such companies had much higher rates of cholera. The Chelsea water company was an exception, but it had an exceptionally good filtration system. That’s when Snow had a major insight. In 1849, Lambeth water company moved the intake point upstream along the Thames, above the main sewage discharge point, giving its customers purer water. Southwark and Vauxhall water company, on the other hand, left their intake point downstream from where the sewage discharged. Insofar as the kinds of people that each company serviced were approximately the same, then comparing the cholera rates between the two houses could be the experiment that Snow so desperately needed to test his hypothesis.

16.2 History

The history of instrumental variables is a rare statistical mystery, and Stock and Trebbi (2003) provide a brief overview. The method was first published in Wright (1928). This is a book about the effect of tariffs on animal and vegetable oil. So why might instrumental variables be important in a book about tariffs on animal and vegetable oil? The fundamental problem is that the effect of tariffs depends on both supply and demand. But we only know prices and quantities, so we don’t know what is driving the effect. We can use instrumental variables to pin down causality.

Where is gets interesting, and becomes something of a mystery, is that the instrumental variables discussion is only in Appendix B. If you made a major statistical break-through would you hide it in an appendix? Further, Philip G. Wright, the book’s author, had a son Sewall Wright, who had considerable expertise in statistics and the specific method used in Appendix B. Hence the mystery of Appendix B - did Philip or Sewall write it? Both Cunningham (2020) and Stock and Trebbi (2003) go into more detail, but on balance feel that it is likely that Philip did actually author the work.

16.3 Simulated example

Let’s generate some data. We will explore a simulation related to the canonical example of health status, smoking, and tax rates. So we are looking to explain how healthy someone is based on the amount they smoke, via the tax rate on smoking. We are going to generate different tax rates by provinces. My understanding is that the tax rate on cigarettes is now pretty much the same in each of the provinces, but that this is fairly recent. So we’ll pretend that Alberta had a low tax, and Nova Scotia had a high tax.

As a reminder, we are simulating data for illustrative purposes, so we need to impose the answer that we want. When you actually use instrumental variables you will be reversing the process.



number_of_observation <- 10000

iv_example_data <- tibble(person = c(1:number_of_observation),
                          smoker = sample(x = c(0:1),
                                          size = number_of_observation, 
                                          replace = TRUE)

Now we need to relate the number of cigarettes that someone smoked to their health. We’ll model health status as a draw from the normal distribution, with either a high or low mean depending on whether the person smokes.

iv_example_data <- 
  iv_example_data %>% 
  mutate(health = if_else(smoker == 0,
                          rnorm(n = n(), mean = 1, sd = 1),
                          rnorm(n = n(), mean = 0, sd = 1)
# So health will be one standard deviation higher for people who don't or barely smoke.

Now we need a relationship between cigarettes and the province (because in this illustration, the provinces have different tax rates).

iv_example_data <- 
  iv_example_data %>% 
  rowwise() %>% 
  mutate(province = case_when(smoker == 0 ~ sample(x = c("Nova Scotia", "Alberta"),
                                                                       size = 1, 
                                                                       replace = FALSE, 
                                                                       prob = c(1/2, 1/2)),
                              smoker == 1 ~ sample(x = c("Nova Scotia", "Alberta"),
                                                                       size = 1, 
                                                                       replace = FALSE, 
                                                                       prob = c(1/4, 3/4)))) %>% 

iv_example_data <- 
  iv_example_data %>% 
  mutate(tax = case_when(province == "Alberta" ~ 0.3,
                         province == "Nova Scotia" ~ 0.5,
                         TRUE ~ 9999999

iv_example_data$tax %>% table()
## .
##  0.3  0.5 
## 6206 3794
Table 16.1:
101.11  Alberta0.3
402.48  Alberta0.3
500.617 Alberta0.3
600.748 Nova Scotia0.5

Now we can look at our data.

iv_example_data %>% 
  mutate(smoker = as_factor(smoker)) %>% 
  ggplot(aes(x = health, fill = smoker)) +
  geom_histogram(position = "dodge", binwidth = 0.2) +
  theme_minimal() +
  labs(x = "Health rating",
       y = "Number of people",
       fill = "Smoker") +
  scale_fill_brewer(palette = "Set1") +

Finally, we can use the tax rate as an instrumental variable to estimate the effect of smoking on health.

health_on_tax <- lm(health ~ tax, data = iv_example_data)
smoker_on_tax <- lm(smoker ~ tax, data = iv_example_data)

coef(health_on_tax)["tax"] / coef(smoker_on_tax)["tax"]
##        tax 
## -0.8554502

So we find, luckily, that if you smoke then your health is likely to be worse than if you don’t smoke.

Equivalently, we can think of instrumental variables in a two-stage regression context.

first_stage <- lm(smoker ~ tax, data = iv_example_data)
health_hat <- first_stage$fitted.values
second_stage <- lm(health ~ health_hat, data = iv_example_data)

## Call:
## lm(formula = health ~ health_hat, data = iv_example_data)
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9867 -0.7600  0.0068  0.7709  4.3293 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.91632    0.04479   20.46   <2e-16 ***
## health_hat  -0.85545    0.08911   -9.60   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 1.112 on 9998 degrees of freedom
## Multiple R-squared:  0.009134,   Adjusted R-squared:  0.009034 
## F-statistic: 92.16 on 1 and 9998 DF,  p-value: < 2.2e-16

16.4 Implementation

As with regression discontinuity, although it is possible to use existing functions, it might be worth looking at specialised packages. Instrumental variables has a few moving pieces, so a specialised package can help keep everything organised, and additionally, standard errors need to be adjusted and specialised packages make this easier. The package estimatr is a recommendation, although there are others available and you should try those if you are interested. The estimatr package is from the same team as DeclareDesign.

Let’s look at our example using iv_robust().

iv_robust(health ~ smoker | tax, data = iv_example_data) %>% 
## Call:
## iv_robust(formula = health ~ smoker | tax, data = iv_example_data)
## Standard error type:  HC2 
## Coefficients:
##             Estimate Std. Error t value   Pr(>|t|) CI Lower CI Upper   DF
## (Intercept)   0.9163    0.04057   22.59 3.163e-110   0.8368   0.9958 9998
## smoker       -0.8555    0.08047  -10.63  2.981e-26  -1.0132  -0.6977 9998
## Multiple R-squared:  0.1971 ,    Adjusted R-squared:  0.197 
## F-statistic:   113 on 1 and 9998 DF,  p-value: < 2.2e-16

16.5 Assumptions

As discussed earlier, there are a variety of assumptions that are made when using instrumental variables. The two most important are:

  1. Exclusion Restriction. This assumption is that the instrumental variable only affects the dependent variable through the independent variable of interest.
  2. Relevance. There must actually be a relationship between the instrumental variable and the independent variable.

There is typically a trade-off between these two. There are plenty of variables that

When thinking about potential instrumental variables Cunningham (2020), p. 211, puts it brilliantly:

But, let’s say you think you do have a good instrument. How might you defend it as such to someone else? A necessary but not a sufficient condition for having an instrument that can satisfy the exclusion restriction is if people are confused when you tell them about the instrument’s relationship to the outcome. Let me explain. No one is going to be confused when you tell them that you think family size will reduce female labor supply. They don’t need a Becker model to convince them that women who have more children probably work less than those with fewer children. It’s common sense. But, what would they think if you told them that mothers whose first two children were the same gender worked less than those whose children had a balanced sex ratio? They would probably give you a confused look. What does the gender composition of your children have to do with whether a woman works?

It doesn’t – it only matters, in fact, if people whose first two children are the same gender decide to have a third child. Which brings us back to the original point – people buy that family size can cause women to work less, but they’re confused when you say that women work less when their first two kids are the same gender. But if when you point out to them that the two children’s gender induces people to have larger families than they would have otherwise, the person “gets it”, then you might have an excellent instrument.

Relevance can be tested using regression and other tests for correlation. The exclusion restriction cannot be tested. You need to present evidence and convincing arguments. As Cunningham (2020) p. 225 says ‘Instruments have a certain ridiculousness to them[.] That is, you know you have a good instrument if the instrument itself doesn’t seem relevant for explaining the outcome of interest because that’s what the exclusion restriction implies.’

16.6 Example - Effect of Police on Crime

16.6.1 Overview

Here we’ll use an example of Levitt (2002) that looks at the effect of police on crime. This is interesting because you might think, that more police is associated with lower crime. But, it could actually be the opposite, if more crime causes more police to be hired - how many police would a hypothetical country with no crime need? Hence there is a need to find some sort of instrumental variable that affects crime only through its relationship with the number of police (that is, not in and of itself, related to crime), and yet is also correlated with police numbers. Levitt (2002) suggests the number of firefighters in a city.

Levitt (2002) argues that firefighters are appropriate as an instrument, because ‘(f)actors such as the power of public sector unions, citizen tastes for government services, affirmative action initiatives, or a mayor’s desire to provide spoils might all be expected to jointly influence the number of firefighters and police.’. Levitt (2002) also argues that the relevance assumption is met by showing that ‘changes in the number of police officers and firefighters within a city are highly correlated over time’.

In terms of satisfying the exclusion restriction, Levitt (2002) argues that the number of firefighters should not have a ‘direct impact on crime.’ However, it may be that there are common factors, and so Levitt (2002) adjusts for this in the regression.

16.6.2 Data

The dataset is based on 122 US cities between 1975 and 1995. Summary statistics are provided in Figure 16.1.

Summary statistics for Levitt 2002.

Figure 16.1: Summary statistics for Levitt 2002.

Source: Levitt (2002) p. 1,246.

16.6.3 Model

In the first stage Levitt (2002) looks at police as a function of firefighters, and a bunch of adjustment variables: \[\ln(\mbox{Police}_{ct}) = \gamma \ln(\mbox{Fire}_{ct}) + X'_{ct}\Gamma + \lambda_t + \phi_c + \epsilon_{ct}.\] The important part of this is the police and firefighters numbers which are on a per capita basis. There are a bunch of adjustment variables in \(X\) which includes things like state prisoners per capita, the unemployment rate, etc, as well as year dummy variables and fixed-effects for each city.

Having established the relationship between police and firefights, Levitt (2002) can then use the estimates of the number of police, based on the number of firefighters, to explain crime rates: \[\Delta\ln(\mbox{Crime}_{ct}) = \beta_1 \ln(\mbox{Police}_{ct-1}) + X'_{ct}\Gamma + \Theta_c + \mu_{ct}.\]

The typical way to present instrumental variable results is to show both stages. Figure 16.2 shows the relationship between police and firefighters.

The relationship between firefighters, police and crime.

Figure 16.2: The relationship between firefighters, police and crime.

Source: Levitt (2002) p. 1,247.

And then Figure 16.3 shows the relationship between police and crime, where is it the IV results that are the ones of interest.

The impact of police on crime.

Figure 16.3: The impact of police on crime.

Source: Levitt (2002) p. 1,248.

16.6.4 Discussion

The key finding of Levitt (2002) is that there is a negative effect of the number of police on the amount of crime.

There are a variety of points that I want to raise in regard to this paper. They will come across as a little negative, but this is mostly just because this a paper from 2002, that I am reading today, and so the standards have changed.

  1. It’s fairly remarkable how reliant on various model specifications the results are. The results bounce around a fair bit and that’s just the ones that are reported. Chances are there are a bunch of other results that were not reported, but it would be of interest to see their impact.
  2. On that note, there is fairly limited model validation. This is probably something that I am more aware of these days, but it seems likely that there is a fair degree of over-fitting here.
  3. Levitt (2002) is actually a response, after another researcher, McCrary (2002), found some issues with the original paper: Levitt (1997). While Levitt appears quite decent about it, it is jarring to see that Levitt was thanked by McCrary (2002) for providing ‘both the data and computer code.’ What if Levitt had not been decent about providing the data and code? Or what if the code was unintelligible? In some ways it is nice to see how far that we have come - the author of a similar paper these days would be forced to make their code and data available as part of the paper, we wouldn’t have to ask them for it. But it reinforces the importance of open data and reproducible science.

16.7 Conclusion

Instrumental variables is a useful approach because one can obtain causal estimates even without explicit randomisation. Finding instrumental variables used to be a bit of a white whale, especially in academia. However, I will leave the final (and hopefully motivating) word to Taddy (2019), p. 162:

As a final point on the importance of IV models and analysis, note that when you are on the inside of a firm—especially on the inside of a modern technology firm—explicitly randomised instruments are everywhere…. But it is often the case that decision-makers want to understand the effects of policies that are not themselves randomised but are rather downstream of the things being AB tested. For example, suppose an algorithm is used to predict the creditworthiness of potential borrowers and assign loans. Even if the process of loan assignment is never itself randomised, if the parameters in the machine learning algorithms used to score credit are AB tested, then those experiments can be used as instruments for the loan assignment treatment. Such ‘upstream randomisation’ is extremely common and IV analysis is your key tool for doing causal inference in that setting.’


Cunningham, Scott. 2020. Causal Inference: The Mixtape.

Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models.

Levitt, Steven D. 1997. “Using Electoral Cycles in Police Hiring to Estimate the Effect of Police on Crime.” The American Economic Review 87 (3).

Levitt, Steven D. 2002. “Using Electoral Cycles in Police Hiring to Estimate the Effects of Police on Crime: Reply.” American Economic Review 92 (4): 1244–50.

McCrary, Justin. 2002. “Using Electoral Cycles in Police Hiring to Estimate the Effect of Police on Crime: Comment.” American Economic Review 92 (4): 1236–43.

Stock, James H, and Francesco Trebbi. 2003. “Retrospectives: Who Invented Instrumental Variable Regression?” Journal of Economic Perspectives 17 (3): 177–94.

Taddy, Matt. 2019. Business Data Science. McGraw Hill.

Wright, Philip G. 1928. The Tariff on Animal and Vegetable Oils. Macmillan Company.