Testing: How much is too much? How young is too young?

Andy Porter
Friday, May 13, 2016


Student achievement testing, especially as done by states, is once again coming under scrutiny along with the Common Core State Standards. A common criticism is that too much time is spent testing K-12 students. In response, several policy initiatives are being considered by one or more states. Michigan decided to limit state testing time to no more than 2% of the school year; for a typical 180-day school year that equals 3.6 days. Some states allow students to opt out of taking the tests, presumably again, to reduce time spent taking tests. New York is moving toward putting time limits on their tests so that students are not allowed to stay at the test for unusually long periods of time. Some states are giving districts the freedom to select what tests they will use rather than requiring all districts in the state to use the same test. This would not necessarily cut down on testing time, but it would give the appearance of greater flexibility.

Why is student achievement testing such a hot button item? Probably because testing is relatively easy to manipulate and inexpensive. In comparison to the school budget and the length of the school year, testing represents only a trivially small fraction of money and time. Some believe that what is tested is what is taught and learned. There is evidence that this is partly true, but only partly. Would that reforming instruction and student learning was that easy!

The comparisons must be valid and fair, which requires that each student be tested under standardized conditions. It also requires that all students participate to ensure that normative comparisons, such as school-to-school, are fair.

I believe that quality testing is important and useful, but I strongly support keeping the amount of time spent testing to a reasonable amount. But how does one determine what is a reasonable amount of time for testing?   Limiting state testing to 3.6 days per school year per child seems reasonable both from a burden point of view and because the technical quality of assessments from that much time can be excellent. And, the New York move toward time limits on students for completing a test also is reasonable. That would not compromise the quality of results. After all, the tests are written to be completed in a reasonable amount of time.

There are two common ways that test results are used and both involve making comparisons. One comparison is norm-referenced: Is one student achieving better than another, one class better than another, one school, one district, one state, one country? The other comparison is called criterion-referenced and asks whether a student is performing at an acceptable level. Typically, “proficient” is just one of several performance standards: advanced, proficient, basic, and below basic.

But the comparisons must be valid and fair, which requires that each student be tested under standardized conditions. It also requires that all students participate to ensure that normative comparisons, such as school-to-school, are fair.

Most of the uses of test results are descriptive, answering the question “how well are students achieving?” But test results can also be used for accountability, such as was the case under No Child Left Behind for states, schools, and districts; and under Race to the Top for professional educators. Descriptive uses of test results have a long history of support in our country from parents, schools, and the public. Accountability uses of test results have a more checkered past and present. The higher the stakes attached to test results, the greater the challenge to keep the comparisons fair and valid. The possibility of cheating comes to mind. When test results are used for school and educator accountability a common response has been to spend large amounts of time on so-called “test prep”. Most research shows that a better investment of student and teacher time would be on providing good instruction to learn the material rather than on teaching students test taking skills. Limiting testing time does not limit the amount of time that schools waste on test prep. I am strongly in favor of using test results for accountability of students, educators and schools. At the same time I recognize that anything worth doing can be done poorly. When results are to be used for accountability, extra measures must be taken protect the integrity of the resulting comparisons.

Some argue in favor of interim assessments, assessments given three or four times across the school year to see how students are progressing. But they take more time and to be useful, they require that teachers and students know what to do with the results. Most of the evidence suggests that principals and teachers need considerable training to productively use interim assessment results; simply providing the data has not proven sufficient. Usually interim assessments have been shown to be a waste of time and money.

Usually interim assessments have been shown to be a waste of time and money.

The recent initiatives to allow students to opt out of testing and allow districts to select which tests they will give are not good ideas. Both compromise the utility of the results by compromising the ability to make comparisons. In addition, it’s hard to see what the benefits are for either of these two options. How does a student who opts out of the test spend the time “saved”? Almost certainly not by receiving high-quality instruction; the teachers are serving as proctors for the test. And what is the benefit to a district deciding to use a test that not all other districts are using? If the selected test has good reliability and validity, which is virtually always the case for state tests, then they will have good comparisons but only within their district, not to other districts.  

There is one good idea in testing that is increasingly coming within reach as we move rapidly toward computer administered testing. I refer to adaptive testing, testing that begins with a small set of locator items and based on the student’ s performance on those items assesses the student increasingly on content at their level of achievement. The technology for this is well-developed and affordable. The results can be reported in norm-referenced and criterion-referenced ways. The length of time needed to assess the student is reduced dramatically because little time is spent by the student on items that are clearly too hard or too easy for that student. As an aside, another benefit to adaptive testing is that because each student takes a test tailored to their achievement level, not only is time reduced, accuracy improved and student frustration diminished, but the test measures accurately all students over a broad range of achievement. There are no floor or ceiling effects. Floor effects occur when the test is too difficult; students may get all or most of the answers wrong, and so the test is unable to measure their level of achievement. Ceiling effects occur when a test is too easy. Students get most or all the answers right, thus limiting the ability of the test to measure how much the student knows.

Let’s keep testing time under control, allotting only as much time as is required to support the uses made from the results to improve education. In so doing, let’s protect the integrity of the comparisons that can be made to support uses of the results. This means avoiding reforms such as letting students opt out and giving districts choice in what test to use, both of which compromise the integrity of the comparisons. Computer adaptive testing holds great promise for addressing all of these issues.