contact us: gse-csail@gse.upenn.edu
Beyond Proficiency: Toward a Better Measure for School Success
For the past dozen years—the No Child Left Behind era—the primary metric for measuring school performance in most states and in federal policy has been the proficiency rate. That is, the proportion of a school’s students scoring above a state-determined proficiency threshold in math and English language arts, and whether this proportion met state targets.
From the very earliest days of the law’s enactment, researchers questioned whether proficiency rates were really the best measures of school performance[i]. The most common critiques were:
- Proficiency rates essentially measure who is enrolled in a school rather than how well the school is doing at educating them[ii]. Because such status measures merely capture the current performance levels of students, proficiency rates are highly correlated with student socioeconomic status and other demographics. Growth-based measures, on the other hand, can show students’ year-to-year changes and better demonstrate the school’s effectiveness or contribution to student learning.
- Even among status measures of performance, proficiency is an especially poor one[iii]. It creates an incentive to focus efforts primarily on students very near the threshold, as very low and very high achievers are unlikely to change from proficient to not (or vice versa). It is of course based heavily on where the state decides to set proficiency, and states have varied tremendously in that decision[iv]. It also throws away a great deal of information—a student one point above the proficiency threshold looks exactly the same as a student one hundred points above the threshold. Research suggests that this flaw led to a focus on “bubble kids” near the proficiency threshold at the expense of high and low achievers[v] (though I am not aware of research showing there were differential achievement impacts of NCLB accountability based on prior achievement).
Despite these concerns, proficiency rates remained the dominant measure of school performance under NCLB.
The passage of the Every Student Succeeds Act (ESSA) a year ago seemed to offer some relief from the tyranny of proficiency, but the language was not especially clear. The law required a status measure of student performance, but it was not spelled out whether proficiency rates were required or whether states could pick a different metric. Even in the draft regulations that were meant to clarify the law, this point was fuzzy.
A student one point above the proficiency threshold looks exactly the same as a student one hundred points above the threshold.
Owing to this ambiguity, I penned a letter [vi] to the Department of Education during the comment period on draft regulations arguing that they should broadly interpret the ESSA statute to allow states to use status measures of performance other than percent proficient. In particular, I recommended allowing states to use average scale scores (i.e., the simple average of students’ test scores in a school) as their status measure because this made better use of the available data. Failing that, I recommended a performance index that gives schools credit for performance all along the achievement distribution. For instance, the proficiency rate is calculated by assigning each proficient student a value of 1 and each non-proficient student a value of 0 and then taking the average score across students. A performance index might instead give each advanced student a score of 1.1, each proficient student a score of 1, each basic student a score of .7, each below basic student a score of .3, and each far below basic student a score of zero. Again, the average score across students would be the school’s performance level. This letter was endorsed by more than 100 researchers, policymakers, and educators.
On this point, we earned a partial victory[vii]. Specifically, the department’s final regulations allow performance indexes, but not average scale scores, to become the primary status metric of accountability [viii]. The only caveat appears to be that these performance indices are allowed:
“so long as (1) a school receives less credit for the performance of a student that is not yet proficient than for the performance of a student at or above the proficient level; and (2) the credit a school receives for the performance of a more advanced student does not fully compensate for the performance of a student who is not yet proficient.”
This decision is not perfect, but it is considerably better than requiring proficiency rates alone. All states should take advantage of this flexibility, because it substantially reduces the design flaws associated with proficiency rates. Furthermore, as I read this requirement, states can, in fact, get their performance indices very close to average scale scores if they simply create many score categories. For instance, rather than just putting students in four categories, why not put them in 10, 20, or 100? As long as the score boost from the above-proficient students does not outweigh the below-proficient ones, I see no reason why the Department wouldn’t approve it.
Beyond the issue of measuring performance levels, the law and regulations offer states considerable discretion as to how much they weight growth versus status in their systems. I want to exhort states to put as much weight on growth as possible, because only that can come close to measuring the true contributions of schools to student success. Growth-based measures will better target the schools that are helping students to learn and those that need support or intervention.
Together, these two changes—measuring performance levels using a performance index with as many categories as possible and weighting growth more heavily than status—will go a considerable way toward improving the design of accountability policy and reducing unintended consequences.
[i] See for instance Linn, R. L., Baker, E. L., & Betebenner, D. W. (2002). Accountability systems: Implications of requirements of the No Child Left Behind Act of 2001. Educational Researcher, 31, 3–16.
[ii] Ho, A. (2008). The problem with ‘proficiency’: Limitations of statistics and policy under No Child Left Behind. Educational Researcher, 37, 351-360.
Linn, R. (2003). Accountability: Responsibility and reasonable expectations. Educational Researcher, 32(7), 3-13.
[iii] Neal, D., & Schanzenbach, D. W. (2010). Left behind by design: Proficiency counts and test-based accountability. Review of Economics and Statistics, 92, 263–283.
[iv] Bandeira de Mello, V., Blankenship, C., & McLaughlin, D.H. (2009). Mapping state proficiency standards onto NAEP scales: 2005-2007 (NCES 2010-456). National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education. Washington, DC.
[v] Booher-Jennings, J. (2005). Below the bubble: “Educational triage” and the Texas accountability system. American Educational Research Journal, 42(1), 231–268.
[vi] https://morganpolikoff.com/2016/07/12/a-letter-to-the-u-s-department-of-...
[vii] While I would, of course, hope that my letter actually mattered, I have no way of knowing whether these regulations were actually responding to what we wrote.
[viii] http://www.ed.gov/news/press-releases/education-department-releases-fina...