# WWC Standards for Reviewing Fuzzy Regression Discontinuity Designs

JOSH: Good afternoon, everyone.

Thank you for joining us today. My name is Josh Polanin. I am a

Principal Researcher at the American Institutes for Research, and Project Director for the What

Works Clearinghouse’s Statistics Website And Training,

or SWAT Contract. On behalf of the WWC and the Institute

of Education Sciences, I welcome you to this important webinar. Today, we will focus on the topic of the WWC’s Standards and Procedures for Reviewing Evidence from Fuzzy Regression Discontinuity

Designs, or fuzzy RDDs. The webinar will be led by two experts on fuzzy RDDs and the WWC Standards

and Procedures, Dr. Emily Tanner-Smith, and Dr.

Christina LiCalsi. Dr. Tanner-Smith is an Associate

Professor and Associate Dean for Research in

the College of Education at the University of Oregon. She’s an applied research methodologist with experience in meta-analysis

and research synthesis for evidence-based decision making. Dr. Tanner-Smith leads our training

on RDD and single case design standards and procedures

for the SWAT Contract. Dr. LiCalsi is Principal Researcher at the American Institutes for Research. She has an extensive experience

conducting RDDs in education and related areas, and is a certified WWC reviewer in group design standards, Version

4.0. On today’s webinar, the presenters

will describe the WWC’s criteria for reviewing fuzzy RDDs, which were released in the Version

4.0 standards handbook. In addition, the presenters will

provide several examples of fuzzy RDDs and the WWC review

criteria applied to those studies. The information presented will

be useful in two primary situations. First, the presenters will share information that will be useful to researchers

who intend to conduct a fuzzy RDD and have it meet WWC standards. Second, the presenters will share

information for WWC certified reviewers who will conduct reviews

of these studies. As part of the webinar, the presenters will share links to resources within the chat box. You can also send questions directly to the presenters in that chat box. We have set aside time at the end

of the webinar for our presenters to respond to

the questions. Before we get started, we wanted

to share the goals for this webinar. We hope you leave today with an

understanding of fuzzy RDDs, in general; how to apply the WWC’s

evidence standards to fuzzy RDDs; and how instrumental variables

are used to estimate intervention impacts in fuzzy regression

discontinuity designs. And with that, I’ll turn the webinar

over to our first presenter, who will start with an overview

of Fuzzy RDDs. Christina? CHRISTINA: Thanks, Josh. Before

we jump into the standards around fuzzy regression discontinuity design, I’m going to begin by providing

just a very brief overview over what regression discontinuity design is, and what we mean when we talk about a fuzzy RDD. And then, I’m going to discuss

how we talk about – or how we calculate impact from a fuzzy RDD. A regression discontinuity design

is an appropriate, and potentially very powerful,

causal research design, particularly well-suited when a

program or policy allocates participants to conditions based

on how they score relative to a cutoff value on some

sort of continuous variable. This variable is commonly referred

to as the forcing variable, though you’ll sometimes also hear it called the assignment variable, or even

the running variable. Often, in education research, this

variable will be a score or a percentile rank. I want to mention, also, that,

for this presentation, we’ll generally think about participants

as individuals, but larger units, such as classrooms

or schools or districts, may also be the unit of assignment. There are two broad categories into which regression discontinuity designs fall. The first is called a sharp RDD. In a sharp RDD, all participants

receive their assigned intervention or comparison condition,

meaning, for example, if students below a particular

cutoff score are meant to receive an intervention, and

students above this cutoff are not meant to receive the intervention,

then all students scoring below do, in fact, receive

the intervention, and none who score above do. Statistically speaking, the probability of receiving the intervention increases from

zero to one exactly at the cutoff. The second category of regression

discontinuity designs is called a fuzzy RDD, which is

what we’re going to be discussing, for the most part, today. In a fuzzy RDD, not all participants

receive their assigned condition. Instead, there are what we call

non-compliers. And these are individuals who,

for whatever reason, do not comply with their treatment assignment. Some individuals who should receive

the intervention, based on their score, may not. These individuals are called, by

the very technical-sounding name, of no shows, and some individuals

who should not receive the intervention based on their score, may. And we call these individuals crossovers. In a fuzzy RDD, there may be only

no shows, only crossovers, or both, which we’ll discuss in

greater detail in the coming slides. There must, however, be a jump

in the probability of receiving the intervention right

at the cutoff. But this increase is less than one. So, one of the greatest things,

in my opinion, about regression discontinuity design

is how visually intuitive they are. Here, you see a visual representation

of a sharp RDD. You can see on the X axis that

we have the forcing variable. And on the Y axis, we have the probability of receiving the intervention. On the left-hand side of the cutoff, represented by the dotted line in the middle,

the probability of treatment is zero, while on the right side,

the probability is one. Conceptually speaking, what this

means is that an individual’s score on the variable

X, that forcing variable, and only this score, is determining

their receipt of the intervention. Here again, we have the forcing

variable on the X, and the probability of treatment on the Y. However, here, we see that, although

the probability of receiving the intervention jumps

from what looks like maybe about 0.25 to about 0.75

at the cutoff, the probability of the intervention

receipt is not zero for all values of the forcing variable

below the cutoff. And it’s not one for all values above. This means that something else,

in addition to an individual’s score on the forcing variable, is determining

their intervention receipt. Again, in this example, we have

both no shows above the cutoff and crossovers below, but it may be the case that there’s only one or the other. It’s also common to see that the

probability of intervention receipt is greater closer to the

cutoff on the comparison side, and lower closer to the cutoff

on the intervention side, as in this example, because these

are individuals who are on the margin of receipt.

That need not be the case. If the probability were a flat

0.25 below and a flat 0.75 above, this would

still be a fuzzy RDD. Here, we have another visual depiction

of sharp and fuzzy regression discontinuity designs. In this example, pre-test is the

forcing variable, shown on the X axis, and the cutoff

for treatment is 50. Students with scores below 50 are

assigned to the intervention, while students with scores above are not. On the Y-axis, we have post-test

scores, so we have an outcome. You can see on this first graph,

the one on the left, that all students with pre-test

scores below 50 are represented by little blue triangles, showing that they received the

intervention. And all students with pre-test

scores above 50 are represented by red circles, showing that they’re

in the comparison condition. This is, therefore, sharp RDD. The dashed line on either side

of the cutoff is the regression line on either side,

and you can see that there’s a difference in average test scores

at the cutoff, with students in the intervention group scoring

about five points higher. On the second graph, the one on the right, we can see that there are some

blue triangles to the right of the dotted line,

and some red circles to the left. This means that there are non-compliers:

both no shows and crossovers. There’s still a difference of the dotted line, thus demonstrating a positive effect

of the intervention. Okay, so, due to the non-compliance

in fuzzy RDDs, in order to determine the effect

size of an intervention, we must calculate the effect of

the intervention on participants at the cutoff value who received

the intervention because they were assigned to it. These individuals are called compliers,

and the effect is called the complier average causal effect, or CACE. Sometimes, particularly in economics,

this is referred to as the local average treatment

effect, or the LATE. In order to understand who we’re

talking about when we say compliers, I want you

to take a look at the rectangular graphics in this slide. On the left, we have the intervention group, and on the right, the comparison group. The first row of individuals are

our compliers. These are the folks who, on the

left, when they’re assigned to the intervention, are in fact

treated, T=1. And on the right, when they’re

assigned to the comparison condition, they’re not

treated, T=0. These are the folks for whom we’re

calculating the effect. In the middle row, we have what

are called always-takers. You can see that they are treated

by the intervention, T=1, regardless of whether they’re assigned

to the intervention on the left, or to the comparison

group on the right. Always-takers assigned to the comparison

group are our crossovers. And in the bottom row, we have

the never-takers. These individuals are not treated

by the intervention, regardless of their assignment, so T=0 both on the left and on the right. So, the never-takers assigned to the intervention group are our no shows. So, one thing I want to mention,

because it becomes important in a moment, is that we’re not

able to distinguish compliers from always-takers in the intervention group, or compliers from never-takers

in the comparison group. We’re only observing their treatment condition. So, we don’t know that always-takers

would have participated in the intervention, even if they

had been assigned to the comparison group, or that

never-takers wouldn’t have participated in the intervention,

even if they had been assigned, because they weren’t assigned,

so we can’t observe that. There’s one important assumption

that must be met in order for the RDD to be valid, and this is called the exclusion restriction. The exclusion restriction stipulates

that the only channel through which assignment to conditions

can influence outcomes is by affecting takeup of the intervention. For example, if assignment of a

student to treatment also causes parents to obtain other services

unrelated to the intervention, this is a violation of the exclusion restriction, because it might influence the

student’s outcomes through a channel other than participation

in the intervention. It also must be the case that there

are no defiers. So, that is, the assignment to condition doesn’t influence take-up status

for any individual in the opposite direction of what

they were assigned. And because, as I mentioned on

the previous slide, we are unable to distinguish compliers

from always-takers in the treatment condition, or

compliers from never-takers in the control condition, it must

be the case that the outcomes of the always-takers and the never-takers don’t differ between condition assignment. So, if a particular student who

is an always-taker is assigned to the comparison condition, but participates in the intervention, their outcomes are the same as if they were assigned to the intervention and

participated in it. Okay, so, now we’re going to move

on to discuss how to calculate the complier average

causal effect in a regression discontinuity design. In order to do this, we employ

an instrumental variable approach, which is called two-stage least squares. Broadly speaking, an instrumental

variable approach allows for the causal effect to

be calculated by finding some exogenous variable that’s

associated with an outcome only through its association with

intervention receipt. In a fuzzy RDD, the side of the

cutoff is the instrument. Whether an individual falls just

below or just above is only associated with the outcome

because it influences the probability that an individual

receives the treatment, as we stipulated in the exclusion restriction. The CACE is calculated as the ratio

of two discontinuities at the cutoff: the first is the

impact on the outcome at the cutoff. So, this is the difference in the

outcome between the intervention side of the cutoff and the comparison

side of the cutoff. We get this difference by subtracting the outcome on the comparison side from the

outcome on the intervention side. The second is the impact on the intervention participation at the cutoff. This is the difference in the percent

of individuals receiving the intervention between

the intervention side of the cutoff and the comparison

side of the cutoff. Again, we get this difference by

subtracting the percent of individuals receiving the intervention

on the comparison side from the percent of individuals

receiving the intervention on the intervention side. Then lastly, we divide the first

discontinuity, the impact on the outcome at the cutoff, by the

second discontinuity, the impact on the intervention

participation at the cutoff. And just to bring us back for a

moment, to fuzzy versus sharp RDDs, because the difference in intervention

participation at the cutoff is one in a sharp RDD, right? It

goes from zero up to one. And that’s the denominator in this ratio, the causal effect for a sharp RDD is simply the difference in the outcome at

the cutoff, so just the top. Okay, so, we’ve covered a lot in

a short amount of time. And we’re going to take a moment to pause for two knowledge checks using

the following example. Since 2002, all grade 3 students

in Florida are required to meet the Level 2 benchmark or higher

on the statewide reading test in order to be promoted to the

fourth grade. However, there are a number of

what are called good cause exemptions, which allow

students to be promoted despite failing to score at the

Level 2 benchmark or above. For example, scoring above a certain threshold on an alternative test, passing

a teacher portfolio, or being designated as an English learner. Researchers have used this natural

experiment to examine the effect of third grade retention

on later outcomes, such as test scores, graduation,

and suspensions. Our first knowledge check question:

is this a sharp RDD or a fuzzy RDD? So, you should have a box that

just popped up on your screen. You might want to move that over

if you want to be able to read the slides, and then you can

vote on your answer. Okay. It looks like about 70% of

you have answered. So, let’s see what you all think. Okay, so, 94% of you say that this

is a fuzzy RDD. So, let’s see what the answer is. That is correct. This is a fuzzy RDD. There are exemptions to being retained for students who fail to score at the Level

2 benchmark. Therefore, the probability of being

retained before the cutoff is not 1 – or below the cutoff,

I should say, is not 1. Also, there is nothing stipulating

that students who score at Level 2 or above cannot be retained,

so, the probability of being retained above the cutoff

is likely greater than zero. In short, there’s something else

other than just the student’s test scores that’s determining

whether or not they’re retained. Okay, so, thinking about the same policy, consider the second knowledge check question. Students who scored just below

the cutoff for retention have an average reading test score

of 300 one year later, while students who scored just

above the cutoff have an average reading test score

of 288. If 80% of students who scored just

below the cutoff are retained, and 20% of students above the cutoff

are retained, what is the complier average causal

effect of being retained on reading test scores one year later? And remember that the CACE is the impact on the outcome at the cutoff, divided by the impact on participation

at the cutoff. So, is the CACE 12, 15, 20 or 25? We should have a box for you to

vote in in just a second. There we go. Okay. So, let’s see what you all think. So, 52% of you say 20. Let’s see what the answer is. So, that is the correct answer.

The answer is 20. So, let’s go through how we calculate that. First, we need to calculate the

impact on the outcome at the cutoff. So, this is the difference in later test scores between students just below on

the retention side of the cutoff and those just above on the promotion

side of the cutoff. So, that’s 300 minus 288, which

is 12. We then calculate the impact on retention of scoring just below the cutoff. So, this is the difference in the

percentage of students retained at the cutoff, which is

0.8 – 80% of students – minus 0.2 – 20% of students – so

here, we have 0.6. Finally, we divide 12 by 0.6, which

equals 20. So, the average effect of being

retained on reading test scores one year later for students who scored at the threshold for retention

and were retained is 20 points. Now, I’m going to turn it over

to my colleague, Dr. Tanner-Smith, who is going to discuss the WWC

standards for reviewing fuzzy RDDs. EMILY: Thank you, Dr. LiCalsi. So, now that we’ve provided a general overview of fuzzy regression discontinuity designs and the methods that we can use to estimate complier average causal effects, in the next section of the webinar,

we’ll now move to discussing the WWC standards for reviewing

evidence from fuzzy RDDs. So, first, it’s important to clarify

what types of regression discontinuity designs

are eligible for WWC review, regardless of whether those might

be sharp or fuzzy RDDs. A study that uses a regression

discontinuity design must meet four criteria to be eligible

for review by the WWC. The first criterion is that treatment assignments must be based on a numerical forcing variable where participants on one side

of a cutoff value on that forcing variable are assigned

to an intervention condition, and then participants on the other

side of that cutoff value are assigned to the comparison condition. In other words, studies that use

multiple assignment variables, or multiple cutoffs for the same sample, would not be currently eligible

for WWC review under the regression discontinuity

design standards. The second criterion is that the

forcing variable must be ordinal, and thus, have an inherent ordering

of values from lowest to highest. And that ordinal forcing variable

must have at least four unique values, both above

and below the cutoff value. The third criterion is that the

study must not have a confounding factor, or some component of the study design that’s perfectly aligned with either the intervention or comparison group. And finally, the forcing variable

used to calculate intervention impacts must be the

actual forcing variable that was used for assignment to conditions, and not a proxy or estimated forcing variable. The WWC considers a variable to

be a proxy forcing variable if its correlation with the actual forcing

variable is less than one. So, a if a regression discontinuity

design study meets all four of these criteria, then

that study is, indeed, eligible for WWC review using the WWC’s

RDD standards. So, let’s us pause here for a quick

knowledge check on the four criteria for whether

a study would be considered eligible for WWC review using the

WWC’s RDD standards. So, let’s say that the State of

North Carolina successfully competed for federal Race To The Top funds

to turn around the lowest 5% of the state schools

through the Turning Around the Lowest Achieving Schools program,

or the TALAS program. Assignment to the TALAS program

was based on a school’s 2010 composite score, which is

calculated as the percentage of reading, math, and science, and

end of course tests passed out of all such tests taken

in a given school. The bottom 5% of each schools in

each type were placed in the TALAS program, but additional high schools were also placed in the program

based on low graduation rates. Overall, 89 of the 1,772 North

Carolina public elementary and middle schools were eligible

for TALAS in the year 2010. TALAS included a variety of program

elements, including mentoring for new teachers, regional leadership

academies for principals, and customized support and professional development. Schools were considered TALAS program schools, regardless of how many or which elements of the TALAS model they utilized. The bottom 5% of composite scores was not used for any other non-TALAS programs or supports. And all schools below the cutoff

participated in the program, as did two schools that scored

above the cutoff value. Let’s say a study used a regression

discontinuity design to examine the effects of TALAS

program participation on students’ later math and reading test scores. Let’s say that the authors had

access to the school composite scores that were used to determine eligibility

for the TALAS program. Now, we’ll again use the Zoom polling

feature, and I’d like you to answer either yes or no to the

following question: is this study eligible for WWC review as a regression discontinuity design? And I’ll pause here for one minute to allow you to submit your answers. Okay. It looks like many of you

have submitted your answers. So, let’s see what you thought. Okay, so, we see that 80% of you do believe that this study would be eligible

for WWC review. So, let’s take a look at the answer. So, yes, the results are in, and

so this study would indeed be eligible for WWC review as a

regression discontinuity design. Let’s walk through the answer just

for a second. Here in this example, the study

would meet the first criterion, namely that the intervention assignments are based on a numerical forcing variable. The study would also meet the second criterion, because the forcing variable is

ordinal, and does have at least four unique values on either side

of the cutoff value. This study would also meet the

third criterion, because there are no other obvious

confounding factors in the study that would be perfectly aligned with either the intervention or

comparison condition. And then, finally, it does appear, based on the information provided, that the

authors had access to the true or actual forcing variable used

for assignments to conditions. So, in summary, this study would

meet those four criteria, and would be eligible for review as an RDD. So, for those studies that are

eligible for review as a regression discontinuity design,

the WWC has five standards that are used to review evidence from RDDs. And these standards are used to

ultimately determine whether a study meets WWC RDD standards

without reservations, meets those standards with reservations, or does not meet those standards. For each of the five RDD standards, there are a series of criteria

used to determine whether a study completely satisfies,

partially satisfies, or does not satisfy that standard. In order for a study to be rated,

meets WWC RDD standards without reservations, the highest

possible rating, that study must completely satisfy

all five of the RDD standards. Now, for this webinar, we’ll be

focusing solely on the RDD standard number five, which

is the standard that applies to fuzzy regression discontinuity designs. More in-depth consideration of

the RDD standards one through four are covered in the WWC’s in-person RDD reviewer certification training. But as you can see here, the standard

number five must be completely or partially satisfied in order for a study to receive those highest

possible ratings of meets RDD standards with or

without reservations. So, now, to drill down further

into the RDD standard number five, this fuzzy RDD standard includes

eight criteria that are used for reviewing evidence

from fuzzy RDDs. This table which you see on your

slide can be found on page 71 of the WWC Standards Handbook (Version

4.0). And this table summarizes the decision rules that are used to determine whether a study completely or partially satisfies

the standard number five. You see here that, in order for

a study to completely or partially satisfy the fuzzy

RDD standard, studies must satisfy the first

six criteria. The last two criteria clarify what distinguishes between whether a study completely or partially satisfies

the standard number five. And so, now, what we’ll do is walk through each of these eight criteria in turn. Criterion A of the fuzzy RDD standard

states that the intervention participation indicator

must be a binary indicator for taking up at least a portion

of the intervention. And that could, for instance, include

a binary indicator for receiving any positive dosage

at the intervention. But simply put, the WWC does not

synthesize evidence about the impacts of intervention dosage

as a continuous variable, so a study would not meet this criterion if it used a continuous dosage

measure for participation. Next, criterion B of the fuzzy RDD standard states that the model used to estimate

intervention effects must have exactly one participation indicator. The WWC does not currently have standards for evaluating fuzzy RDD studies

that use more than one participation indicator in

that estimated impact model. Next, criterion C of the fuzzy

RDD standard states that the indicator variable, which indicates

whether participants are above or below the cutoff value

on the forcing variable, that that indicator variable must

be a binary indicator for the groups to which participants

are assigned. And then, criterion D specifies

that the same covariates, one of which must be the forcing variable, that those same covariates must

be included in both the analysis that estimates impacts

on participation, as well as the analysis that estimates

the impact on outcomes. So, for those authors using two-stage

least squares estimation, as we discussed previously in the webinar, this would mean that the same covariates must be used in both those first and second

stage equations. Moving on to criterion E, this criterion of the fuzzy RDD standard states

that there must be no clear violations of the exclusion restriction. As we discussed previously in the webinar, the exclusion restriction means

that the only channel through which assignment to conditions

can influence outcomes is by affecting takeup of the intervention. So, in other words, assignment

to conditions should not influence takeup status,

meaning that the outcomes of always-takers and never-takers

should not differ. Some examples of common violations

of the exclusion restriction include, for instance, when intervention

participation is defined inconsistently for the intervention

and comparison conditions, or when the assignment to the intervention group changes the behavior of the participants, even if they do not take up the

intervention. So, now, moving on to criterion

F of the fuzzy RDD standard. This criterion states that the

study shall provide evidence that the forcing variable

is a strong predictor of participation in the intervention. So, in a regression of program

participation on a treatment indicator and other covariates,

the WWC would operationally define strong evidence

of participation as having a minimum F-statistic

of 16, or a minimum t-statistic value

of 4. So, for criterion G, this criterion

states that the study must use a local regression or

related nonparametric approach in which the fuzzy regression discontinuity

design impacts are estimated within a justified bandwidth using one of three potential approaches. Now, it’s important to note here

that the WWC defines a justified bandwidth selection

procedure as one that’s selected based on a systematic procedure

described in a peer-reviewed journal or methodological article that

describes the procedure, but also demonstrates its effectiveness. So, for example, cross-validation,

plug-in, and robust CCT procedures would all be considered justified

bandwidth selection procedures. In the context of criterion G,

a study must use one of three possible approaches. First, the justified bandwidth

selection procedure can be used for the fuzzy RDD impact estimate,

namely that impact ratio we saw earlier with the case estimates. Or, the second acceptable approach

is when authors use separate justified bandwidths for

the numerator and the denominator of the impact ratio. And then, finally, the third acceptable approach for satisfying criterion G is when

the authors use a justified bandwidth in the numerator only, as long as that justified bandwidth

is less than or equal to the justified bandwidth for the

denominator of the impact ratio. Now, you may recall that, in order

for a fuzzy RDD to be eligible to receive the highest

possible WWC rating, that study must meet criterion G. For fuzzy RDDs that do not satisfy criterion G, they can instead satisfy criterion

H, and still be eligible to be rated, meets WWC RDD standards

with reservations. For criterion H, the study can

estimate the fuzzy RDD impact using one of two approaches: namely,

using a justified bandwidth in the numerator only, or estimating

the denominator of the impact using a best fit functional form,

which we define here as a functional form of the relationship between program receipt and the

forcing variable. And that fit has been shown to

be a better fit to the data than at least two other functional forms. And this best fit can be based

on any measure of goodness of fit from the methodological literature,

such as AIC, BIC, or adjusted R-squared fit statistics. Now that we’ve discussed each of

the eight criteria used to review evidence from fuzzy RDDs, we can take another look at the

summary table as a reminder of how those eight criteria are

used to determine whether a study completely or partially satisfies this RDD standard number five. And again, it’s important to remember

that, for fuzzy RDDs that partially or completely satisfy

the standard number five, their highest possible WWC rating will be meets WWC RDD standards

with or without reservations. But it’s important to remember

that the final WWC rating for a fuzzy RDD will also incorporate

information relevant to the other four RDD standards,

and those are the standards relating to the integrity of the

forcing variable, sample attrition, continuity in the outcome forcing

variable relationship, as well as bandwidth functional

form specifications. So, now that we’ve walked through

each of these eight fuzzy RDD criteria and clarified

how each of these criteria relate to the fuzzy RDD’s final

WWC study rating, for the remainder of the webinar,

my colleague, Dr. LiCalsi, will walk through an extended example demonstrating the application of

these criteria. CHRISTINA: Thank you, Dr. Tanner-Smith. As Emily just said, we’re going

to move on to applying the WWC review criteria

for fuzzy RDDs to a fictitious study example that’s based on the third grade retention policy

that we discussed earlier. So, a US state requires schools

to retain third grade students who do not perform at a basic proficiency level on the state reading exam – so, a pass/fail. However, some students who pass

the reading exam may be eligible for an exemption,

and thus, are not retained. Furthermore, teachers may also

elect to retain students who pass the reading exam, but

are deemed in need of retention. Researchers used a fuzzy RDD to estimate the complier average causal effect

of third grade retention on students’ subsequent academic performance on standardized reading achievement tests. Students are assigned to the intervention condition, which is retained, and the comparison

condition, promoted, based on a continuous numerical

forcing variable, the state reading exam. The state reading exam ranges from

0 to 100 with a cutoff value of 50, and

the researchers have access to state administrative records containing the actual state reading exam scores

that were used to assign students to a condition. The researchers provide no indication

that this cutoff value was used to assign students to

any other interventions or services. Is this study eligible for WWC

review as an RDD? Yes, it is. So, this study meets all four criteria

for being eligible for review. First, the intervention assignments

are based on a numerical forcing variable, so in this case,

reading test scores. Second, the forcing variable is

ordinal, with at least four unique values each above and

below the cutoff. So, there are a hundred possible values in this example with 50 on each side. Third, there are no confounding

factors in the study that are perfectly aligned with

either condition. And lastly, the forcing variable

used to calculate impacts is the actual forcing variable

used to assignment to conditions. Okay, so, the authors estimate

the complier average causal effects using a two-stage least squares

instrumental variables estimation. The first stage and second stage

equation are as follows. The first stage estimates the probability

of the student receiving the intervention, which

is being retained in the third grade, noted as R. R is a function of a dummy indicator,

C, for whether the student fell below the cutoff value on

the forcing variable, which is 50 on a state reading

exam, as we mentioned. F, which is the continuous measure of the forcing variable, centered

at the cutoff. So, in this case, 50 becomes 0,

51 is 1, 49 is -1, et cetera. Then, we have C times F, which

allows the relationship between the forcing variable, reading

score, and R, the probability of retention, to

differ on each side of the cutoff. And we have Z, which is a vector of student demographic characteristics,

such as age, gender, race and prior year reading achievement. We have E, which is the error term. In the second stage, we’re estimating students’ reading achievement in the fourth

grade, the outcome of interest. In the second stage, you can see

that the equation is the same as in that first stage equation,

with the exception of that, instead of C, a dummy indicator

for whether the student fell below the cutoff value of

the forcing variable, we instead have R, which is whether the student was retained in fourth grade. Other than that, we’ve got F, the

continuous measure of the forcing variable; we have

that interaction between F and C to allow for differential slopes

on either side of the cutoff, and we have Z, the same vector

of student characteristics, and we have an error term. So, does this study meet criterion A? Is there a binary participation indicator for taking up some portion of the

intervention? Yes, there is a binary indicator

for participation in the intervention, R. Being retained is just a yes or no here. There are no degrees of retention

or dosage specified. Does the study meet criterion B? Does the model include only one

participation indicator? Yes, there’s only one participation

indicator here, which is R. And does this study meet criterion C? Is the participation indicator

binary? Yes, R is binary. It is a yes or no, in terms of

being retained or not. Now, we’re going to look at whether

the study meets criterion D. Are the same covariates used in

estimates of impact on participation and impact on outcomes? And again, the answer is yes. The same covariates are used to

estimate the impact on participation in the first stage as are used to estimate the impact on outcomes

in the second stage. So, we have that same Z vector

of demographic characteristics. So, the authors explicitly state in the study that grade retention is defined consistently for the intervention and comparison groups. However, they also state that many

parents of retained students reported seeking supplementary

reading tutoring and instruction after being notified

of their student’s failing grade on the reading exam. Does this study meet criterion E? Are there any violations of the

exclusion restriction? So, yes, this study fails to meet criterion E. This is a violation of the exclusion restriction. Specifically, assignment to the

intervention group changes behavior of the participants, even if they do not take up the

intervention. The authors estimate the two-stage

least squares model using a bandwidth of 16 – that

is, 8 test square points on either side of the cutoff value

of 50. This bandwidth was selected using an approved optimal bandwidth algorithm, which identified an optimal bandwidth of 16 for

the impact on participation, and 24 for the impact on the outcomes. The authors reported that the t-statistic

for the instrument was 3.87. Is there evidence that the forcing variable is a strong predictor of participation? Given this information, the study

fails to meet criterion F. The forcing variable is not considered a strong predictor of participation, which is defined in the standard

as a minimum t-statistic of 4. Does the study meet criterion G? Are impacts estimated within a

justified bandwidth? Yes, they are, the study does meet criterion G. The impacts are estimated with

a justified bandwidth. The authors used a bandwidth of

16, which is the optimal bandwidth

for participation, the numerator. The bandwidth is smaller than the

justified bandwidth for the denominator, and thus satisfies criterion G. Does the study meet criterion H? Is the justified bandwidth for

the numerator only, or the denominator estimated using

a best fit functional form? And this is not applicable, because

the study met criterion G. Great, so, let’s take a moment

and review and determine the highest possible rating for this study. So, if we look at the criteria

that a fuzzy RDD must meet to completely or partially satisfy

standard five, we see that, unfortunately, this

study does not satisfy WWC RDD standards, either completely

or partially, for a fuzzy RDD. The study fails to satisfy criteria E and F, which it must satisfy for either condition. So, the highest possible rating is that this study does not meet

WWC RDD standards. I’m going to turn it over now to Dr. Polanin for a summary of what we did today. JOSH: Thanks, Christina. And thank

you both for sharing that helpful information and expanding

our understanding of the WWC standards as they apply

to fuzzy RDDs. Our participants submitted questions during the registration process and throughout

the webinar today. And we’re going to spend a few

minutes and turn to those now. And the first question today is for Emily. And it is, what resources does

the WWC offer to evaluate fuzzy RDDs? EMILY: Yeah, this is a great question. So, as we discussed in today’s

webinar, the WWC’s main resource regarding evaluating and reviewing

evidence from fuzzy RDDs is going to be in section three of the WWC Standards Handbook (Version

4.0), and so, that’s covered on pages 68 through

71 of the Standards Handbook. I’ll also note that the WWC also

has a reporting guide for study authors on regression

discontinuity designs. And so, that reporting guide provides

guidance to study authors on the types of information they

should be reporting in their own RDD studies, whether

those be sharp or fuzzy RDDs, and again, providing those recommendations to ensure that authors can meet the highest

possible WWC ratings. I think what has also been mentioned

in the chat box today is that, after this webinar, the

materials from this webinar will also be posted on the WWC website. And so, our hope is that today’s webinar, and responses to these questions and answers, will provide an additional resource

for participants. JOSH: Great. Thanks, Emily. Next question for you, Christina, are there future updates planned for the

WWC’s fuzzy RDD standards? CHRISTINA: Thanks, Josh. So, the

upcoming release for Version 4.1 of the WWC Standards and Procedures

Handbook does provide additional guidance for reviewers

on methods for estimating impact estimates from RDDs, whether

those be sharp or fuzzy. Otherwise, there are no immediate

planned updates for the WWC fuzzy RDD standards. JOSH: Great. Okay, the next question,

the WWC Standards Handbook indicates that all RDDs must completely satisfy the fuzzy RDD standard to receive a rating of meets WWC standards without reservations. Is that true for sharp RDD studies? And do the fuzzy RDD criteria still

apply in that case? And Emily, let’s have you answer this one. EMILY: Yeah. This is a great question. Because I do think that the language

in the Handbook could be confusing to some readers

on this point. As we discussed in today’s webinar, the fuzzy RDD standard, or namely the RDD standard number five that

we walked through in detail, that fuzzy RDD standard is waived

for sharp RDDs. And it’s also waived for fuzzy

RDDs that use a reduced form model to estimate ITT impacts. So, to put it in other words, sharp RDD studies do not need to completely satisfy

this fuzzy RDD standard in order to receive that highest

possible rating of meets WWC RDD standards without

reservations. As we talked about on today’s webinar,

the fuzzy RDD criteria, those eight criteria, they’re not

applicable to sharp RDD studies. So, sharp RDD studies do not need

to completely satisfy the fuzzy RDD standard in order to receive that highest possible WWC disposition. JOSH: Thank you, Emily. And time

for one more question. This time, back to you, Christina. Do fuzzy RDDs have different attrition

requirements, compared to those of sharp RDDs,

to meet WWC standards? For example, do fuzzy RDDs have more or less strict attrition boundaries

for the non-compliers? CHRISTINA: Thanks, Josh. This is

a great question. No, fuzzy RDDs do not have more or less strict attrition boundaries

than sharp RDDs. The WWC attrition boundaries are the same for both sharp and fuzzy RDDs. As discussed in the webinar, however,

for all RDDs, whether they be sharp or fuzzy,

it’s important to remember that the samples used to calculate

attrition must include all subjects who were eligible

to be assigned to the intervention or comparison group using the forcing variable. And not only a subset of those

subjects known to the researcher. So, attrition cannot be assessed

unless the subjects who were eligible to be assigned

to conditions are known. And for all of these subjects, their assigned condition must be known. But again, the WWC does not have

different attrition requirements for fuzzy RDDs or sharp RDDs; these

are for all RDDs. JOSH: Fantastic. Thanks, Christina. And thanks to all of you for those

great questions. This concludes our time for the

fuzzy RDD webinar. I’d like to thank our presenters

once again, Emily and Christina, for a great presentation, as well

as our participants for joining us today as we deepened

our knowledge of fuzzy RDDs. Just as a reminder, the WWC SWAT

team engages in webinars throughout the year to highlight

important aspects of the WWC Standards and Procedures. Our planned webinars vary in topic, however all have the goal of deepening

our WWC knowledge. To learn more about webinars offered

by the WWC and IES, be sure to sign up to receive notices

through the IES ListServ. The link in the chat box will take

you to that ListServ sign up. We will be posting an archive of

the webinar, along with the Q&A, including responses to questions

we weren’t able to cover today. And with that, I’d like to thank

you all again for joining us, and have a great rest of your day.