WORKSHOP TITLE: Put your test to the test



Bas Hemker and Cor Sluijter


Presenters’ Bios:

Bas Hemker is a senior research scientist at Cito’s department of Psychometrics and Research and the Team Lead for International Research and Consultancy. He is a member of the COTAN, the Dutch Committee on Tests and Testing since 2007 and an official assessor for the review system for the quality of tests and exams of the Dutch Research Centre for Examinations and  Certification (RCEC). Bas is a fellow of AEA Europe since 2015 and member of its Professional Development Committee.


Cor Sluijter is a part time senior consultant at Cito’s department for Training and Consultancy. He is the former director of Cito’s department of Psychometrics and Research (2014-2020). He is a lecturer on educational measurement at the teachers college of Fontys University of Applied Sciences. And he is an official assessor for the review system for the quality of tests and exams of the Dutch Research Centre for Examinations and  Certification (RCEC). Cor is a Fellow of AEA Europe since 2014 and its current treasurer.


Why AEA members should attend this workshop:

This workshop -an in-person rendering of a pre-conference workshop for the 2021 online conference- provides participants with all the tools necessary to formally assess an educational test,  either computer-based or paper-based, or testing system of their own choice, by having them actually evaluate it by using a validated reviewing system. This will provide them with valuable information on the quality of the instrument they chose and can help them to improve that quality. Moreover, the workshop will provide them with guidelines on how to proceed with the development of new instruments, thus improving the chances of them efficiently producing high-quality instruments in future.


Who this Workshop is for:

The target audience consists of people involved in test development and/or test improvement  and evaluation. Participants should have experience with at least some of the elements of test production. They also should have an understanding of the basic psychometric principles of testing and test development. They also will have to be familiar with basic concepts like reliability and validity.



Educational tests serve a specific goal, such as evaluation, monitoring, diagnostics, selection or guidance. Such a goal is only met, if the test is of sufficient quality. This workshop aims to provide participants with practical tools to evaluate the quality of a test.

In the theoretical part of the workshop we give an overview of evaluation systems, like the Standards for Educational and Psychological Testing, the EFPA review model, the ETS Standards for Quality and Fairness, etcetera and show their similarities and differences.

In the applied part of the workshop we put the theory to practice, by having participants actually evaluate the quality of a test of their own choice, based on relevant information pertaining to the test. This will be done by applying the COTAN (Dutch Committee on Tests and Testing) review system for evaluating test quality to assess the information provided on the test. Relevant material includes research reports on how its norms are determined, and the reliability and validity of the test, the test manual, etcetera. The workshop leaders assist participants in applying seven different evaluation criteria to their own test:

  • Theoretical basis of the test construction – This criterion is rated by determining to what extent the content of the test reflects its intended purpose, its theoretical background and its operationalization.
  • Quality of the test materials – This criterion pertains to the level of standardization of test items, scoring and instructions, and whether sufficient directions are provided on how to take the test.
  • Quality of the test manual – This criterion focusses on the information supplied to support test users for the administration and interpretation of the test.
  • Norms – This criterion is rated with different criteria for norm-referenced interpretation and for content-referenced or criterion-referenced interpretation
  • Reliability – The size of a set of possible reliability coefficients is evaluated, followed by the quality of the research carried out to collect information on the reliability of the test scores.
  • Construct validity – The outcomes are evaluated first, followed by the quality of the research carried out on the construct validity. There are explicit statements on what kind of research data serve to support construct validity.
  • Criterion validity – This evaluation criterion is based on the relation between an external measure related to the test outcome. When relevant, this criterion reflects the strength of this relation and the quality of the research carried out.

In the final discussion, the findings of each participants are discussed and we round off with a list of practical lessons learned.


Preparation for the workshop:

The reviewing system has seven different criteria to evaluate the quality of a test. Attendees should bring all relevant information pertaining to the test with them. This includes the test manual, all research reports deemed relevant and the test itself if possible.


Tentative Schedule





Coffee and registration



Welcome & introductions

Outline of the Workshop

Bas Hemker/Cor Sluijter


Theoretical introduction to evaluating test quality/overview of reviewing systems

Bas Hemker/Cor Sluijter





Evaluating test quality part 1: test manual/ norms & theoretical basis of the test

Cor Sluijter





Evaluating test quality part 2: quality of test materials and reliability

Bas Hemker





Evaluating test quality part 3: validity/Lessons learned

Cor Sluijter


Workshop close