Framework for English Language Proficiency Development Standards corresponding to the Common Core State Standards and the Next Generation Science Standards

The ELPD Framework outlines the underlying English language practices found in the CCSS and the NGSS, communicates to ELL stakeholders the language that all ELLs must acquire in order to successfully engage the CCSS and NGSS, and specifies a procedure by which to evaluate the degree of alignment present between the Framework and ELP standards under consideration or adopted by states.


Growth Model Comparison Study: A Summary of Results

This paper is a summary of the Goldschmidt multi-state study (2012) and is intended to provide timely and comprehensible information for practitioners and other stakeholders who may not have a technical background in using assessment data for educational accountability.


Growth Model Comparison Study: Practical Implications of Alternative Models for Evaluating School Performance

This paper reports on an empirical study that compared the results of applying nine different analytic growth models to a large multi-state longitudinal data set (four consecutive years divided into two elementary and two middle school three-year cohorts) representing more than 2,500 buildings and 220,000 students. This applied study was well suited to address empirically whether certain models are more likely than others to provide accurate, fair, unbiased, precise, and consistent results. The study is important because it is the first designed to make it possible to see whether results are different or similar when (a) the same growth models are estimated in different states and (b) different models are estimated in the same state.


A Guide to Computer Adaptive Testing (CAT) Systems

This guide is intended to provide those responsible for procuring CAT systems with questions they can use to tease out differences across competing CAT delivery systems to make better-informed decisions. The scope of this guide is largely limited to differences across systems that can affect the quality, comparability, and usefulness of the test scores that a system produces. The substance of what is measured (the format and nature of the test questions) and test delivery issues (testing sites and test presentation software and hardware) will be considered only to the extent that they directly impact score properties. Davey also presented staff development sessions to TILSA on the subject of computer adaptive testing.


Commonly Unrecognized Error Variance in Statewide Assessment Programs: Sources of Error Variance and What Can Be Done to Reduce Them

This report summarizes the Phillips’ work on commonly unrecognized sources of error variance (or random variations in assessment results) and provides actions states can take to identify and reduce these errors in existing and future assessment systems. These “errors” are not mistakes in the traditional sense of the word, but reflect random variations in student assessment results. Unlike the technical report by Phillips on which it is based, this report is written for policymakers and educators who are not assessments experts. It provides an explanation of the sources of error variance, their impact on the ability to make data-based decisions with confidence, and the actions states and their contractors should take as quickly as is feasible to improve the accuracy and trustworthiness of their state testing program results.


Addressing Two Commonly Unrecognized Sources of Score Instability in Annual State Assessments

Phillips contends that there are two unrecognized sources of error variance in statistics based on state testing data that contribute to score instability: (a) increased error due to sampling of students, or design effects, and (b) error due to equating. Changes in psychometric practice to manage these sources of error variance can increase the ability to detect real change and draw meaningful inferences. The issues and the recommendations for increasing score stability are described in detail. Phillips has presented several staff development sessions for TILSA members about these sources of instability and training on statistical software necessary for identifying and controlling for these sources of error, and he has conducted a NCME pre session (2011) on this topic.


State End-of-Course Testing Programs: A Policy Brief

State policy makers increasingly rely on end-of-course (EOC) tests to support a variety of purposes and uses. Prominent among these uses are accountability initiatives at the student, teacher, and/or school level. Each of these uses connects to a variety of critical issues related to design and implementation and serve as the organizational framework for this document.


State Assessment Program Item Banks: Model Language for Requests for Proposals (RFP) and Contracts

This document provides recommendations for request for proposal (RFP) and contract language that state education agencies can use to specify their requirements for access to test item banks. An item bank is a repository for test items and data about those items. Item banks are used by state agency staff to view items and associated data; to evaluate items; to search the database of items; to inventory the item bank, evaluate pool sufficiency, and project future item development needs; to view statistical results; to help manage the flow of test development work; and to collaborate with the test development contractor on forms construction.


Building an Interim Assessment System: A Workbook for School

The 2008 publication and the companion workbook (2010) provide a definition and corresponding framework for a thorough, district-level self-examination of readiness for the development or acquisition of an interim assessment system. The workbook can also be used by staff at state departments of education when district-level educators contact a state agency seeking guidance in this area. The workbook is built around a series of questions intended to guide a structured consideration of building or revisiting a district’s interim assessment system. A definition of interim assessment is provided as well as a thoughtful comparison with formative assessment.


A Review and Empirical Comparison of Approaches for Improving the Reliability of Objective Level Scores

This paper addresses the psychometric issues associated with estimating objective level scores, often referred to as subscores. The paper introduces the concepts of reliability and validity for subscores in the context of statewide achievement tests. Methods suggested in the literature for increasing the reliability of these subscore estimates are then reviewed and an empirical study comparing some of the more promising procedures is described. The study used test score data from four different statewide testing programs, representing three different subject areas and various testing conditions. The comparison of subscore augmentation approaches found that generally all methods were very successful in dramatically increasing the reliability of subscore estimates. However, this increase was accompanied by near-perfect correlations among the subscore estimates. This finding called into question the validity of the resultant subscores, and therefore the usefulness of the subscore augmentation process. Implications for practice are discussed.