WWC Standards for Reviewing Fuzzy Regression Discontinuity Designs


JOSH: Good afternoon, everyone.
Thank you for joining us today. My name is Josh Polanin. I am a
Principal Researcher at the American Institutes for Research, and Project Director for the What
Works Clearinghouse’s Statistics Website And Training,
or SWAT Contract. On behalf of the WWC and the Institute
of Education Sciences, I welcome you to this important webinar. Today, we will focus on the topic of the WWC’s Standards and Procedures for Reviewing Evidence from Fuzzy Regression Discontinuity
Designs, or fuzzy RDDs. The webinar will be led by two experts on fuzzy RDDs and the WWC Standards
and Procedures, Dr. Emily Tanner-Smith, and Dr.
Christina LiCalsi. Dr. Tanner-Smith is an Associate
Professor and Associate Dean for Research in
the College of Education at the University of Oregon. She’s an applied research methodologist with experience in meta-analysis
and research synthesis for evidence-based decision making. Dr. Tanner-Smith leads our training
on RDD and single case design standards and procedures
for the SWAT Contract. Dr. LiCalsi is Principal Researcher at the American Institutes for Research. She has an extensive experience
conducting RDDs in education and related areas, and is a certified WWC reviewer in group design standards, Version
4.0. On today’s webinar, the presenters
will describe the WWC’s criteria for reviewing fuzzy RDDs, which were released in the Version
4.0 standards handbook. In addition, the presenters will
provide several examples of fuzzy RDDs and the WWC review
criteria applied to those studies. The information presented will
be useful in two primary situations. First, the presenters will share information that will be useful to researchers
who intend to conduct a fuzzy RDD and have it meet WWC standards. Second, the presenters will share
information for WWC certified reviewers who will conduct reviews
of these studies. As part of the webinar, the presenters will share links to resources within the chat box. You can also send questions directly to the presenters in that chat box. We have set aside time at the end
of the webinar for our presenters to respond to
the questions. Before we get started, we wanted
to share the goals for this webinar. We hope you leave today with an
understanding of fuzzy RDDs, in general; how to apply the WWC’s
evidence standards to fuzzy RDDs; and how instrumental variables
are used to estimate intervention impacts in fuzzy regression
discontinuity designs. And with that, I’ll turn the webinar
over to our first presenter, who will start with an overview
of Fuzzy RDDs. Christina? CHRISTINA: Thanks, Josh. Before
we jump into the standards around fuzzy regression discontinuity design, I’m going to begin by providing
just a very brief overview over what regression discontinuity design is, and what we mean when we talk about a fuzzy RDD. And then, I’m going to discuss
how we talk about – or how we calculate impact from a fuzzy RDD. A regression discontinuity design
is an appropriate, and potentially very powerful,
causal research design, particularly well-suited when a
program or policy allocates participants to conditions based
on how they score relative to a cutoff value on some
sort of continuous variable. This variable is commonly referred
to as the forcing variable, though you’ll sometimes also hear it called the assignment variable, or even
the running variable. Often, in education research, this
variable will be a score or a percentile rank. I want to mention, also, that,
for this presentation, we’ll generally think about participants
as individuals, but larger units, such as classrooms
or schools or districts, may also be the unit of assignment. There are two broad categories into which regression discontinuity designs fall. The first is called a sharp RDD. In a sharp RDD, all participants
receive their assigned intervention or comparison condition,
meaning, for example, if students below a particular
cutoff score are meant to receive an intervention, and
students above this cutoff are not meant to receive the intervention,
then all students scoring below do, in fact, receive
the intervention, and none who score above do. Statistically speaking, the probability of receiving the intervention increases from
zero to one exactly at the cutoff. The second category of regression
discontinuity designs is called a fuzzy RDD, which is
what we’re going to be discussing, for the most part, today. In a fuzzy RDD, not all participants
receive their assigned condition. Instead, there are what we call
non-compliers. And these are individuals who,
for whatever reason, do not comply with their treatment assignment. Some individuals who should receive
the intervention, based on their score, may not. These individuals are called, by
the very technical-sounding name, of no shows, and some individuals
who should not receive the intervention based on their score, may. And we call these individuals crossovers. In a fuzzy RDD, there may be only
no shows, only crossovers, or both, which we’ll discuss in
greater detail in the coming slides. There must, however, be a jump
in the probability of receiving the intervention right
at the cutoff. But this increase is less than one. So, one of the greatest things,
in my opinion, about regression discontinuity design
is how visually intuitive they are. Here, you see a visual representation
of a sharp RDD. You can see on the X axis that
we have the forcing variable. And on the Y axis, we have the probability of receiving the intervention. On the left-hand side of the cutoff, represented by the dotted line in the middle,
the probability of treatment is zero, while on the right side,
the probability is one. Conceptually speaking, what this
means is that an individual’s score on the variable
X, that forcing variable, and only this score, is determining
their receipt of the intervention. Here again, we have the forcing
variable on the X, and the probability of treatment on the Y. However, here, we see that, although
the probability of receiving the intervention jumps
from what looks like maybe about 0.25 to about 0.75
at the cutoff, the probability of the intervention
receipt is not zero for all values of the forcing variable
below the cutoff. And it’s not one for all values above. This means that something else,
in addition to an individual’s score on the forcing variable, is determining
their intervention receipt. Again, in this example, we have
both no shows above the cutoff and crossovers below, but it may be the case that there’s only one or the other. It’s also common to see that the
probability of intervention receipt is greater closer to the
cutoff on the comparison side, and lower closer to the cutoff
on the intervention side, as in this example, because these
are individuals who are on the margin of receipt.
That need not be the case. If the probability were a flat
0.25 below and a flat 0.75 above, this would
still be a fuzzy RDD. Here, we have another visual depiction
of sharp and fuzzy regression discontinuity designs. In this example, pre-test is the
forcing variable, shown on the X axis, and the cutoff
for treatment is 50. Students with scores below 50 are
assigned to the intervention, while students with scores above are not. On the Y-axis, we have post-test
scores, so we have an outcome. You can see on this first graph,
the one on the left, that all students with pre-test
scores below 50 are represented by little blue triangles, showing that they received the
intervention. And all students with pre-test
scores above 50 are represented by red circles, showing that they’re
in the comparison condition. This is, therefore, sharp RDD. The dashed line on either side
of the cutoff is the regression line on either side,
and you can see that there’s a difference in average test scores
at the cutoff, with students in the intervention group scoring
about five points higher. On the second graph, the one on the right, we can see that there are some
blue triangles to the right of the dotted line,
and some red circles to the left. This means that there are non-compliers:
both no shows and crossovers. There’s still a difference of the dotted line, thus demonstrating a positive effect
of the intervention. Okay, so, due to the non-compliance
in fuzzy RDDs, in order to determine the effect
size of an intervention, we must calculate the effect of
the intervention on participants at the cutoff value who received
the intervention because they were assigned to it. These individuals are called compliers,
and the effect is called the complier average causal effect, or CACE. Sometimes, particularly in economics,
this is referred to as the local average treatment
effect, or the LATE. In order to understand who we’re
talking about when we say compliers, I want you
to take a look at the rectangular graphics in this slide. On the left, we have the intervention group, and on the right, the comparison group. The first row of individuals are
our compliers. These are the folks who, on the
left, when they’re assigned to the intervention, are in fact
treated, T=1. And on the right, when they’re
assigned to the comparison condition, they’re not
treated, T=0. These are the folks for whom we’re
calculating the effect. In the middle row, we have what
are called always-takers. You can see that they are treated
by the intervention, T=1, regardless of whether they’re assigned
to the intervention on the left, or to the comparison
group on the right. Always-takers assigned to the comparison
group are our crossovers. And in the bottom row, we have
the never-takers. These individuals are not treated
by the intervention, regardless of their assignment, so T=0 both on the left and on the right. So, the never-takers assigned to the intervention group are our no shows. So, one thing I want to mention,
because it becomes important in a moment, is that we’re not
able to distinguish compliers from always-takers in the intervention group, or compliers from never-takers
in the comparison group. We’re only observing their treatment condition. So, we don’t know that always-takers
would have participated in the intervention, even if they
had been assigned to the comparison group, or that
never-takers wouldn’t have participated in the intervention,
even if they had been assigned, because they weren’t assigned,
so we can’t observe that. There’s one important assumption
that must be met in order for the RDD to be valid, and this is called the exclusion restriction. The exclusion restriction stipulates
that the only channel through which assignment to conditions
can influence outcomes is by affecting takeup of the intervention. For example, if assignment of a
student to treatment also causes parents to obtain other services
unrelated to the intervention, this is a violation of the exclusion restriction, because it might influence the
student’s outcomes through a channel other than participation
in the intervention. It also must be the case that there
are no defiers. So, that is, the assignment to condition doesn’t influence take-up status
for any individual in the opposite direction of what
they were assigned. And because, as I mentioned on
the previous slide, we are unable to distinguish compliers
from always-takers in the treatment condition, or
compliers from never-takers in the control condition, it must
be the case that the outcomes of the always-takers and the never-takers don’t differ between condition assignment. So, if a particular student who
is an always-taker is assigned to the comparison condition, but participates in the intervention, their outcomes are the same as if they were assigned to the intervention and
participated in it. Okay, so, now we’re going to move
on to discuss how to calculate the complier average
causal effect in a regression discontinuity design. In order to do this, we employ
an instrumental variable approach, which is called two-stage least squares. Broadly speaking, an instrumental
variable approach allows for the causal effect to
be calculated by finding some exogenous variable that’s
associated with an outcome only through its association with
intervention receipt. In a fuzzy RDD, the side of the
cutoff is the instrument. Whether an individual falls just
below or just above is only associated with the outcome
because it influences the probability that an individual
receives the treatment, as we stipulated in the exclusion restriction. The CACE is calculated as the ratio
of two discontinuities at the cutoff: the first is the
impact on the outcome at the cutoff. So, this is the difference in the
outcome between the intervention side of the cutoff and the comparison
side of the cutoff. We get this difference by subtracting the outcome on the comparison side from the
outcome on the intervention side. The second is the impact on the intervention participation at the cutoff. This is the difference in the percent
of individuals receiving the intervention between
the intervention side of the cutoff and the comparison
side of the cutoff. Again, we get this difference by
subtracting the percent of individuals receiving the intervention
on the comparison side from the percent of individuals
receiving the intervention on the intervention side. Then lastly, we divide the first
discontinuity, the impact on the outcome at the cutoff, by the
second discontinuity, the impact on the intervention
participation at the cutoff. And just to bring us back for a
moment, to fuzzy versus sharp RDDs, because the difference in intervention
participation at the cutoff is one in a sharp RDD, right? It
goes from zero up to one. And that’s the denominator in this ratio, the causal effect for a sharp RDD is simply the difference in the outcome at
the cutoff, so just the top. Okay, so, we’ve covered a lot in
a short amount of time. And we’re going to take a moment to pause for two knowledge checks using
the following example. Since 2002, all grade 3 students
in Florida are required to meet the Level 2 benchmark or higher
on the statewide reading test in order to be promoted to the
fourth grade. However, there are a number of
what are called good cause exemptions, which allow
students to be promoted despite failing to score at the
Level 2 benchmark or above. For example, scoring above a certain threshold on an alternative test, passing
a teacher portfolio, or being designated as an English learner. Researchers have used this natural
experiment to examine the effect of third grade retention
on later outcomes, such as test scores, graduation,
and suspensions. Our first knowledge check question:
is this a sharp RDD or a fuzzy RDD? So, you should have a box that
just popped up on your screen. You might want to move that over
if you want to be able to read the slides, and then you can
vote on your answer. Okay. It looks like about 70% of
you have answered. So, let’s see what you all think. Okay, so, 94% of you say that this
is a fuzzy RDD. So, let’s see what the answer is. That is correct. This is a fuzzy RDD. There are exemptions to being retained for students who fail to score at the Level
2 benchmark. Therefore, the probability of being
retained before the cutoff is not 1 – or below the cutoff,
I should say, is not 1. Also, there is nothing stipulating
that students who score at Level 2 or above cannot be retained,
so, the probability of being retained above the cutoff
is likely greater than zero. In short, there’s something else
other than just the student’s test scores that’s determining
whether or not they’re retained. Okay, so, thinking about the same policy, consider the second knowledge check question. Students who scored just below
the cutoff for retention have an average reading test score
of 300 one year later, while students who scored just
above the cutoff have an average reading test score
of 288. If 80% of students who scored just
below the cutoff are retained, and 20% of students above the cutoff
are retained, what is the complier average causal
effect of being retained on reading test scores one year later? And remember that the CACE is the impact on the outcome at the cutoff, divided by the impact on participation
at the cutoff. So, is the CACE 12, 15, 20 or 25? We should have a box for you to
vote in in just a second. There we go. Okay. So, let’s see what you all think. So, 52% of you say 20. Let’s see what the answer is. So, that is the correct answer.
The answer is 20. So, let’s go through how we calculate that. First, we need to calculate the
impact on the outcome at the cutoff. So, this is the difference in later test scores between students just below on
the retention side of the cutoff and those just above on the promotion
side of the cutoff. So, that’s 300 minus 288, which
is 12. We then calculate the impact on retention of scoring just below the cutoff. So, this is the difference in the
percentage of students retained at the cutoff, which is
0.8 – 80% of students – minus 0.2 – 20% of students – so
here, we have 0.6. Finally, we divide 12 by 0.6, which
equals 20. So, the average effect of being
retained on reading test scores one year later for students who scored at the threshold for retention
and were retained is 20 points. Now, I’m going to turn it over
to my colleague, Dr. Tanner-Smith, who is going to discuss the WWC
standards for reviewing fuzzy RDDs. EMILY: Thank you, Dr. LiCalsi. So, now that we’ve provided a general overview of fuzzy regression discontinuity designs and the methods that we can use to estimate complier average causal effects, in the next section of the webinar,
we’ll now move to discussing the WWC standards for reviewing
evidence from fuzzy RDDs. So, first, it’s important to clarify
what types of regression discontinuity designs
are eligible for WWC review, regardless of whether those might
be sharp or fuzzy RDDs. A study that uses a regression
discontinuity design must meet four criteria to be eligible
for review by the WWC. The first criterion is that treatment assignments must be based on a numerical forcing variable where participants on one side
of a cutoff value on that forcing variable are assigned
to an intervention condition, and then participants on the other
side of that cutoff value are assigned to the comparison condition. In other words, studies that use
multiple assignment variables, or multiple cutoffs for the same sample, would not be currently eligible
for WWC review under the regression discontinuity
design standards. The second criterion is that the
forcing variable must be ordinal, and thus, have an inherent ordering
of values from lowest to highest. And that ordinal forcing variable
must have at least four unique values, both above
and below the cutoff value. The third criterion is that the
study must not have a confounding factor, or some component of the study design that’s perfectly aligned with either the intervention or comparison group. And finally, the forcing variable
used to calculate intervention impacts must be the
actual forcing variable that was used for assignment to conditions, and not a proxy or estimated forcing variable. The WWC considers a variable to
be a proxy forcing variable if its correlation with the actual forcing
variable is less than one. So, a if a regression discontinuity
design study meets all four of these criteria, then
that study is, indeed, eligible for WWC review using the WWC’s
RDD standards. So, let’s us pause here for a quick
knowledge check on the four criteria for whether
a study would be considered eligible for WWC review using the
WWC’s RDD standards. So, let’s say that the State of
North Carolina successfully competed for federal Race To The Top funds
to turn around the lowest 5% of the state schools
through the Turning Around the Lowest Achieving Schools program,
or the TALAS program. Assignment to the TALAS program
was based on a school’s 2010 composite score, which is
calculated as the percentage of reading, math, and science, and
end of course tests passed out of all such tests taken
in a given school. The bottom 5% of each schools in
each type were placed in the TALAS program, but additional high schools were also placed in the program
based on low graduation rates. Overall, 89 of the 1,772 North
Carolina public elementary and middle schools were eligible
for TALAS in the year 2010. TALAS included a variety of program
elements, including mentoring for new teachers, regional leadership
academies for principals, and customized support and professional development. Schools were considered TALAS program schools, regardless of how many or which elements of the TALAS model they utilized. The bottom 5% of composite scores was not used for any other non-TALAS programs or supports. And all schools below the cutoff
participated in the program, as did two schools that scored
above the cutoff value. Let’s say a study used a regression
discontinuity design to examine the effects of TALAS
program participation on students’ later math and reading test scores. Let’s say that the authors had
access to the school composite scores that were used to determine eligibility
for the TALAS program. Now, we’ll again use the Zoom polling
feature, and I’d like you to answer either yes or no to the
following question: is this study eligible for WWC review as a regression discontinuity design? And I’ll pause here for one minute to allow you to submit your answers. Okay. It looks like many of you
have submitted your answers. So, let’s see what you thought. Okay, so, we see that 80% of you do believe that this study would be eligible
for WWC review. So, let’s take a look at the answer. So, yes, the results are in, and
so this study would indeed be eligible for WWC review as a
regression discontinuity design. Let’s walk through the answer just
for a second. Here in this example, the study
would meet the first criterion, namely that the intervention assignments are based on a numerical forcing variable. The study would also meet the second criterion, because the forcing variable is
ordinal, and does have at least four unique values on either side
of the cutoff value. This study would also meet the
third criterion, because there are no other obvious
confounding factors in the study that would be perfectly aligned with either the intervention or
comparison condition. And then, finally, it does appear, based on the information provided, that the
authors had access to the true or actual forcing variable used
for assignments to conditions. So, in summary, this study would
meet those four criteria, and would be eligible for review as an RDD. So, for those studies that are
eligible for review as a regression discontinuity design,
the WWC has five standards that are used to review evidence from RDDs. And these standards are used to
ultimately determine whether a study meets WWC RDD standards
without reservations, meets those standards with reservations, or does not meet those standards. For each of the five RDD standards, there are a series of criteria
used to determine whether a study completely satisfies,
partially satisfies, or does not satisfy that standard. In order for a study to be rated,
meets WWC RDD standards without reservations, the highest
possible rating, that study must completely satisfy
all five of the RDD standards. Now, for this webinar, we’ll be
focusing solely on the RDD standard number five, which
is the standard that applies to fuzzy regression discontinuity designs. More in-depth consideration of
the RDD standards one through four are covered in the WWC’s in-person RDD reviewer certification training. But as you can see here, the standard
number five must be completely or partially satisfied in order for a study to receive those highest
possible ratings of meets RDD standards with or
without reservations. So, now, to drill down further
into the RDD standard number five, this fuzzy RDD standard includes
eight criteria that are used for reviewing evidence
from fuzzy RDDs. This table which you see on your
slide can be found on page 71 of the WWC Standards Handbook (Version
4.0). And this table summarizes the decision rules that are used to determine whether a study completely or partially satisfies
the standard number five. You see here that, in order for
a study to completely or partially satisfy the fuzzy
RDD standard, studies must satisfy the first
six criteria. The last two criteria clarify what distinguishes between whether a study completely or partially satisfies
the standard number five. And so, now, what we’ll do is walk through each of these eight criteria in turn. Criterion A of the fuzzy RDD standard
states that the intervention participation indicator
must be a binary indicator for taking up at least a portion
of the intervention. And that could, for instance, include
a binary indicator for receiving any positive dosage
at the intervention. But simply put, the WWC does not
synthesize evidence about the impacts of intervention dosage
as a continuous variable, so a study would not meet this criterion if it used a continuous dosage
measure for participation. Next, criterion B of the fuzzy RDD standard states that the model used to estimate
intervention effects must have exactly one participation indicator. The WWC does not currently have standards for evaluating fuzzy RDD studies
that use more than one participation indicator in
that estimated impact model. Next, criterion C of the fuzzy
RDD standard states that the indicator variable, which indicates
whether participants are above or below the cutoff value
on the forcing variable, that that indicator variable must
be a binary indicator for the groups to which participants
are assigned. And then, criterion D specifies
that the same covariates, one of which must be the forcing variable, that those same covariates must
be included in both the analysis that estimates impacts
on participation, as well as the analysis that estimates
the impact on outcomes. So, for those authors using two-stage
least squares estimation, as we discussed previously in the webinar, this would mean that the same covariates must be used in both those first and second
stage equations. Moving on to criterion E, this criterion of the fuzzy RDD standard states
that there must be no clear violations of the exclusion restriction. As we discussed previously in the webinar, the exclusion restriction means
that the only channel through which assignment to conditions
can influence outcomes is by affecting takeup of the intervention. So, in other words, assignment
to conditions should not influence takeup status,
meaning that the outcomes of always-takers and never-takers
should not differ. Some examples of common violations
of the exclusion restriction include, for instance, when intervention
participation is defined inconsistently for the intervention
and comparison conditions, or when the assignment to the intervention group changes the behavior of the participants, even if they do not take up the
intervention. So, now, moving on to criterion
F of the fuzzy RDD standard. This criterion states that the
study shall provide evidence that the forcing variable
is a strong predictor of participation in the intervention. So, in a regression of program
participation on a treatment indicator and other covariates,
the WWC would operationally define strong evidence
of participation as having a minimum F-statistic
of 16, or a minimum t-statistic value
of 4. So, for criterion G, this criterion
states that the study must use a local regression or
related nonparametric approach in which the fuzzy regression discontinuity
design impacts are estimated within a justified bandwidth using one of three potential approaches. Now, it’s important to note here
that the WWC defines a justified bandwidth selection
procedure as one that’s selected based on a systematic procedure
described in a peer-reviewed journal or methodological article that
describes the procedure, but also demonstrates its effectiveness. So, for example, cross-validation,
plug-in, and robust CCT procedures would all be considered justified
bandwidth selection procedures. In the context of criterion G,
a study must use one of three possible approaches. First, the justified bandwidth
selection procedure can be used for the fuzzy RDD impact estimate,
namely that impact ratio we saw earlier with the case estimates. Or, the second acceptable approach
is when authors use separate justified bandwidths for
the numerator and the denominator of the impact ratio. And then, finally, the third acceptable approach for satisfying criterion G is when
the authors use a justified bandwidth in the numerator only, as long as that justified bandwidth
is less than or equal to the justified bandwidth for the
denominator of the impact ratio. Now, you may recall that, in order
for a fuzzy RDD to be eligible to receive the highest
possible WWC rating, that study must meet criterion G. For fuzzy RDDs that do not satisfy criterion G, they can instead satisfy criterion
H, and still be eligible to be rated, meets WWC RDD standards
with reservations. For criterion H, the study can
estimate the fuzzy RDD impact using one of two approaches: namely,
using a justified bandwidth in the numerator only, or estimating
the denominator of the impact using a best fit functional form,
which we define here as a functional form of the relationship between program receipt and the
forcing variable. And that fit has been shown to
be a better fit to the data than at least two other functional forms. And this best fit can be based
on any measure of goodness of fit from the methodological literature,
such as AIC, BIC, or adjusted R-squared fit statistics. Now that we’ve discussed each of
the eight criteria used to review evidence from fuzzy RDDs, we can take another look at the
summary table as a reminder of how those eight criteria are
used to determine whether a study completely or partially satisfies this RDD standard number five. And again, it’s important to remember
that, for fuzzy RDDs that partially or completely satisfy
the standard number five, their highest possible WWC rating will be meets WWC RDD standards
with or without reservations. But it’s important to remember
that the final WWC rating for a fuzzy RDD will also incorporate
information relevant to the other four RDD standards,
and those are the standards relating to the integrity of the
forcing variable, sample attrition, continuity in the outcome forcing
variable relationship, as well as bandwidth functional
form specifications. So, now that we’ve walked through
each of these eight fuzzy RDD criteria and clarified
how each of these criteria relate to the fuzzy RDD’s final
WWC study rating, for the remainder of the webinar,
my colleague, Dr. LiCalsi, will walk through an extended example demonstrating the application of
these criteria. CHRISTINA: Thank you, Dr. Tanner-Smith. As Emily just said, we’re going
to move on to applying the WWC review criteria
for fuzzy RDDs to a fictitious study example that’s based on the third grade retention policy
that we discussed earlier. So, a US state requires schools
to retain third grade students who do not perform at a basic proficiency level on the state reading exam – so, a pass/fail. However, some students who pass
the reading exam may be eligible for an exemption,
and thus, are not retained. Furthermore, teachers may also
elect to retain students who pass the reading exam, but
are deemed in need of retention. Researchers used a fuzzy RDD to estimate the complier average causal effect
of third grade retention on students’ subsequent academic performance on standardized reading achievement tests. Students are assigned to the intervention condition, which is retained, and the comparison
condition, promoted, based on a continuous numerical
forcing variable, the state reading exam. The state reading exam ranges from
0 to 100 with a cutoff value of 50, and
the researchers have access to state administrative records containing the actual state reading exam scores
that were used to assign students to a condition. The researchers provide no indication
that this cutoff value was used to assign students to
any other interventions or services. Is this study eligible for WWC
review as an RDD? Yes, it is. So, this study meets all four criteria
for being eligible for review. First, the intervention assignments
are based on a numerical forcing variable, so in this case,
reading test scores. Second, the forcing variable is
ordinal, with at least four unique values each above and
below the cutoff. So, there are a hundred possible values in this example with 50 on each side. Third, there are no confounding
factors in the study that are perfectly aligned with
either condition. And lastly, the forcing variable
used to calculate impacts is the actual forcing variable
used to assignment to conditions. Okay, so, the authors estimate
the complier average causal effects using a two-stage least squares
instrumental variables estimation. The first stage and second stage
equation are as follows. The first stage estimates the probability
of the student receiving the intervention, which
is being retained in the third grade, noted as R. R is a function of a dummy indicator,
C, for whether the student fell below the cutoff value on
the forcing variable, which is 50 on a state reading
exam, as we mentioned. F, which is the continuous measure of the forcing variable, centered
at the cutoff. So, in this case, 50 becomes 0,
51 is 1, 49 is -1, et cetera. Then, we have C times F, which
allows the relationship between the forcing variable, reading
score, and R, the probability of retention, to
differ on each side of the cutoff. And we have Z, which is a vector of student demographic characteristics,
such as age, gender, race and prior year reading achievement. We have E, which is the error term. In the second stage, we’re estimating students’ reading achievement in the fourth
grade, the outcome of interest. In the second stage, you can see
that the equation is the same as in that first stage equation,
with the exception of that, instead of C, a dummy indicator
for whether the student fell below the cutoff value of
the forcing variable, we instead have R, which is whether the student was retained in fourth grade. Other than that, we’ve got F, the
continuous measure of the forcing variable; we have
that interaction between F and C to allow for differential slopes
on either side of the cutoff, and we have Z, the same vector
of student characteristics, and we have an error term. So, does this study meet criterion A? Is there a binary participation indicator for taking up some portion of the
intervention? Yes, there is a binary indicator
for participation in the intervention, R. Being retained is just a yes or no here. There are no degrees of retention
or dosage specified. Does the study meet criterion B? Does the model include only one
participation indicator? Yes, there’s only one participation
indicator here, which is R. And does this study meet criterion C? Is the participation indicator
binary? Yes, R is binary. It is a yes or no, in terms of
being retained or not. Now, we’re going to look at whether
the study meets criterion D. Are the same covariates used in
estimates of impact on participation and impact on outcomes? And again, the answer is yes. The same covariates are used to
estimate the impact on participation in the first stage as are used to estimate the impact on outcomes
in the second stage. So, we have that same Z vector
of demographic characteristics. So, the authors explicitly state in the study that grade retention is defined consistently for the intervention and comparison groups. However, they also state that many
parents of retained students reported seeking supplementary
reading tutoring and instruction after being notified
of their student’s failing grade on the reading exam. Does this study meet criterion E? Are there any violations of the
exclusion restriction? So, yes, this study fails to meet criterion E. This is a violation of the exclusion restriction. Specifically, assignment to the
intervention group changes behavior of the participants, even if they do not take up the
intervention. The authors estimate the two-stage
least squares model using a bandwidth of 16 – that
is, 8 test square points on either side of the cutoff value
of 50. This bandwidth was selected using an approved optimal bandwidth algorithm, which identified an optimal bandwidth of 16 for
the impact on participation, and 24 for the impact on the outcomes. The authors reported that the t-statistic
for the instrument was 3.87. Is there evidence that the forcing variable is a strong predictor of participation? Given this information, the study
fails to meet criterion F. The forcing variable is not considered a strong predictor of participation, which is defined in the standard
as a minimum t-statistic of 4. Does the study meet criterion G? Are impacts estimated within a
justified bandwidth? Yes, they are, the study does meet criterion G. The impacts are estimated with
a justified bandwidth. The authors used a bandwidth of
16, which is the optimal bandwidth
for participation, the numerator. The bandwidth is smaller than the
justified bandwidth for the denominator, and thus satisfies criterion G. Does the study meet criterion H? Is the justified bandwidth for
the numerator only, or the denominator estimated using
a best fit functional form? And this is not applicable, because
the study met criterion G. Great, so, let’s take a moment
and review and determine the highest possible rating for this study. So, if we look at the criteria
that a fuzzy RDD must meet to completely or partially satisfy
standard five, we see that, unfortunately, this
study does not satisfy WWC RDD standards, either completely
or partially, for a fuzzy RDD. The study fails to satisfy criteria E and F, which it must satisfy for either condition. So, the highest possible rating is that this study does not meet
WWC RDD standards. I’m going to turn it over now to Dr. Polanin for a summary of what we did today. JOSH: Thanks, Christina. And thank
you both for sharing that helpful information and expanding
our understanding of the WWC standards as they apply
to fuzzy RDDs. Our participants submitted questions during the registration process and throughout
the webinar today. And we’re going to spend a few
minutes and turn to those now. And the first question today is for Emily. And it is, what resources does
the WWC offer to evaluate fuzzy RDDs? EMILY: Yeah, this is a great question. So, as we discussed in today’s
webinar, the WWC’s main resource regarding evaluating and reviewing
evidence from fuzzy RDDs is going to be in section three of the WWC Standards Handbook (Version
4.0), and so, that’s covered on pages 68 through
71 of the Standards Handbook. I’ll also note that the WWC also
has a reporting guide for study authors on regression
discontinuity designs. And so, that reporting guide provides
guidance to study authors on the types of information they
should be reporting in their own RDD studies, whether
those be sharp or fuzzy RDDs, and again, providing those recommendations to ensure that authors can meet the highest
possible WWC ratings. I think what has also been mentioned
in the chat box today is that, after this webinar, the
materials from this webinar will also be posted on the WWC website. And so, our hope is that today’s webinar, and responses to these questions and answers, will provide an additional resource
for participants. JOSH: Great. Thanks, Emily. Next question for you, Christina, are there future updates planned for the
WWC’s fuzzy RDD standards? CHRISTINA: Thanks, Josh. So, the
upcoming release for Version 4.1 of the WWC Standards and Procedures
Handbook does provide additional guidance for reviewers
on methods for estimating impact estimates from RDDs, whether
those be sharp or fuzzy. Otherwise, there are no immediate
planned updates for the WWC fuzzy RDD standards. JOSH: Great. Okay, the next question,
the WWC Standards Handbook indicates that all RDDs must completely satisfy the fuzzy RDD standard to receive a rating of meets WWC standards without reservations. Is that true for sharp RDD studies? And do the fuzzy RDD criteria still
apply in that case? And Emily, let’s have you answer this one. EMILY: Yeah. This is a great question. Because I do think that the language
in the Handbook could be confusing to some readers
on this point. As we discussed in today’s webinar, the fuzzy RDD standard, or namely the RDD standard number five that
we walked through in detail, that fuzzy RDD standard is waived
for sharp RDDs. And it’s also waived for fuzzy
RDDs that use a reduced form model to estimate ITT impacts. So, to put it in other words, sharp RDD studies do not need to completely satisfy
this fuzzy RDD standard in order to receive that highest
possible rating of meets WWC RDD standards without
reservations. As we talked about on today’s webinar,
the fuzzy RDD criteria, those eight criteria, they’re not
applicable to sharp RDD studies. So, sharp RDD studies do not need
to completely satisfy the fuzzy RDD standard in order to receive that highest possible WWC disposition. JOSH: Thank you, Emily. And time
for one more question. This time, back to you, Christina. Do fuzzy RDDs have different attrition
requirements, compared to those of sharp RDDs,
to meet WWC standards? For example, do fuzzy RDDs have more or less strict attrition boundaries
for the non-compliers? CHRISTINA: Thanks, Josh. This is
a great question. No, fuzzy RDDs do not have more or less strict attrition boundaries
than sharp RDDs. The WWC attrition boundaries are the same for both sharp and fuzzy RDDs. As discussed in the webinar, however,
for all RDDs, whether they be sharp or fuzzy,
it’s important to remember that the samples used to calculate
attrition must include all subjects who were eligible
to be assigned to the intervention or comparison group using the forcing variable. And not only a subset of those
subjects known to the researcher. So, attrition cannot be assessed
unless the subjects who were eligible to be assigned
to conditions are known. And for all of these subjects, their assigned condition must be known. But again, the WWC does not have
different attrition requirements for fuzzy RDDs or sharp RDDs; these
are for all RDDs. JOSH: Fantastic. Thanks, Christina. And thanks to all of you for those
great questions. This concludes our time for the
fuzzy RDD webinar. I’d like to thank our presenters
once again, Emily and Christina, for a great presentation, as well
as our participants for joining us today as we deepened
our knowledge of fuzzy RDDs. Just as a reminder, the WWC SWAT
team engages in webinars throughout the year to highlight
important aspects of the WWC Standards and Procedures. Our planned webinars vary in topic, however all have the goal of deepening
our WWC knowledge. To learn more about webinars offered
by the WWC and IES, be sure to sign up to receive notices
through the IES ListServ. The link in the chat box will take
you to that ListServ sign up. We will be posting an archive of
the webinar, along with the Q&A, including responses to questions
we weren’t able to cover today. And with that, I’d like to thank
you all again for joining us, and have a great rest of your day.

Leave a Reply

Your email address will not be published. Required fields are marked *