contact us: gse-csail@gse.upenn.edu
Policy evaluation, shiny objects, and sustained educational improvement
It’s 2017, which means we’re in year six of the Common Core experiment. The big question that everyone wants the answer to is “Is Common Core working?” Many states seem poised to move in a new direction, especially with a new administration in Washington, and research evidence could play an instrumental role in helping states make the decision of whether to keep the standards, revise them, or replace them altogether. (Of course, it might also be that policymakers’ views on the standards are impervious to evidence.)
There are more than a handful of examples where the early evidence on a policy turned out to be misleading, or where a policy seemed to have delayed effects.
To my knowledge, there are two existing studies that try to assess Common Core’s impact on student achievement, both by Tom Loveless. They compare state NAEP gains between Common Core adopting and non-adopting states or compare states based on an index of the quality of their implementation of the standards. Both studies find, in essence, no effects of the standards, and the media have covered these studies using that angle. The C-SAIL project, on which I am co-principal investigator, is also considering a related question (in our case, we are asking about the impact of college- and career-readiness standards in general, including, but not limited to, the Common Core standards).
There are many challenges with doing this kind of research. A few of the most serious are:
- The need to use sophisticated quasi-experimental methods, since experimental methods are not available.
- The limited array of outcome variables available, since NAEP (which is not perfectly aligned to the Common Core) is really the only assessment that has the national comparability required and many college and career outcomes are difficult to measure.
- The fact that the timing of policy implementation is not clear when states varied so much in the timing of related policies like assessment and textbook adoptions.
Thus, it is not obvious when will be the right time to evaluate the policy, and with what outcomes.
Policymakers want to effect positive change through policy, and they often need to make decisions on a short cycle—after all, they often make promises in their elections, and it behooves them to show evidence that their chosen policies are working in advance of the next round of elections. The consequence is that there is a high demand for rapid evidence about policy effects, and the early evidence often contributes overwhelmingly to shaping the narrative about whether policies are working or not.
Unfortunately, there are more than a handful of examples where the early evidence on a policy turned out to be misleading, or where a policy seemed to have delayed effects. For example, the Gates Foundation’s small school reforms were widely panned as a flop in early reviews relying on student test scores, but a number of later rigorous studies showed (sometimes substantial) positive effects on outcomes such as graduation and college enrollment. It was too late, however—the initiative had already been scrapped by the time the positive evidence started rolling in.
No Child Left Behind acquired quite a negative reputation over its first half dozen years of implementation. Its accountability policies were seen as poorly targeted (they were), and it was labeled as encouraging an array of negative unintended consequences. These views quickly became well established among both researchers and policymakers. And yet, a series of recent studies have shown meaningful effects of the law on student achievement, which has done precisely zero to change public perception.
There are all manner of policies that may fit into this category to a greater or lesser extent. A state capacity building and technical assistance policy implemented in California was shelved after a few years, but evaluations found the policy improved student learning. Several school choice studies have found null or modest effects on test scores only to turn up impacts on longer-term outcomes like graduation. Even School Improvement Grants and other turnaround strategies may qualify in this category—though the recent impact evaluation was neutral, several studies have found positive effects and many have found impacts that grow as the years progress (suggesting that longer-term evaluations may yet show effects).
How does this all relate back to Common Core and other college- and career-readiness standards? There are implications for both researchers and policymakers.
For researchers, these patterns suggest that great care needs to be taken in interpreting and presenting the results of research conducted early in the implementation of Common Core and other policies. This is not to say that researchers should not investigate the early effects of policies, but rather that they should be appropriately cautious in describing what their work means. Early impact studies will virtually never provide the “final answer” as to the effectiveness of any given policy, and researchers should explicitly caution against the interpretation of their work as such.
There is good reason to move on from the “shiny object” approach to education policy and focus instead on giving old and seemingly dull objects a chance to demonstrate their worth before throwing them in the policy landfill.
For policymakers, there are at least two implications. First, when creating new policies, policymakers should think about both short- and long-term outcomes that are desired. Then, they should build into the law ample time before such outcomes can be observed (i.e., ensuring that decisions are not made before the law can have its intended effects). Even if this time is not explicitly built into the policy cycle, policymakers should at least be aware of these issues and adopt a stance of patience toward policy revisions. Second, to the extent that policies build in funds or plans for evaluation, these plans should include both short- and long-term evaluations.
Clearly, these suggestions run counter to prevailing preferences for immediate gratification in policymaking, but they are essential if we are to see sustained improvement in education. At a minimum, this approach might keep us from declaring failure too soon on policies that may well turn out to be successful. Since improvement through policy is almost always a process of incremental progress, failing to learn all the lessons of new policies may hamstring our efforts to develop better policies later. Finally, jumping around from policy to policy likely contributes to reform fatigue among educators, which may even undermine the success of future unrelated policies. In short, regardless of your particular policy preferences, there is good reason to move on from the “shiny object” approach to education policy and focus instead on giving old and seemingly dull objects a chance to demonstrate their worth before throwing them in the policy landfill.