Addressing Two Commonly Unrecognized Sources of Score Instability in Annual State Assessments
The work reported in this paper reflects a collaborative effort of many individuals representing multiple organizations. It began during a session at the October 2008 meeting of TILSA when a representative of a member state asked the group if any of their programs had experienced unexpected fluctuations in the annual state assessment scores, and if so, whether they have a reasonable explanation for such instability in the scores. Gary Phillips, representing AIR, offered that he had been investigating what he called “score drift” and did have a hypothesis about this phenomenon and offered a brief explanation regarding what he characterized as two unrecognized sources of error that underlie such fluctuations. Dr. Phillips was invited to make a much longer presentation on his hypothesis and findings at the June 2009 meeting of TILSA. In April of 2010 Gary Phillips presented a full day pre-session at the annual meeting of the National Council on Measurement in Education (NCME) on: (1) the errors associated with the failure to
use scientific sampling methodology in combination with cluster sampling designs and (2) errors associated with equating test forms. Following this session, because of the potential significance of the argument being made, the TILSA program advisors approached Dr. Phillips and ask him if he would be willing to have his hypothesis, data, and recommendations reviewed by a select group of senior measurement experts. He agreed to, and welcomed, the opportunity to have a group of his peers critique his work and a subsequent full day session was held at the CCSSO offices on September 30, 2010.