Newsletter – Spring 2018

Editor’s Note

Dear Readers,

Greetings to you all! 2017 has just slipped by and it is once more my pleasure to share AEA-Europe’s news in the Spring 2018 edition of our newsletter.

In this issue, our AEA-Europe President, Dr. Thierry Rocher announces the theme and details of the association’s next annual conference in Arnhem in 2018. Read next to discover the content of our first e-assessment webinar organised last week by the AEA-Europe Special Interest Group. If you did not manage to join us then, a link to the recording of the webinar will be posted soon on our Facebook page and on our website.

After the AEA-Europe conference in Prague in November, we encouraged participants to simply send in a few lines about their presentation to be published here, hence sharing their work with a wider audience. In this edition, I have included four conference contributions which offer various perspectives related to the creation of examinations and the methodological aspects of national assessments.

The first one from Cambridge Assessment, UK, explores the stages involved when writers compose questions for secondary-school-level examination papers. The second one from the National Institute for Testing & Evaluation in Israel, addresses the usefulness of perception–based evidence within the context of the validity of large-scale tests. The third contribution from WJEC, one of the UK’s main examination boards, describes a research study on designing alternative, more reliable methods of capturing expert judgement, and which clarifies the role that such evidence should play in the process of maintaining educational standards in the UK’s high stake examinations. The fourth contribution is a joint one from CITO Netherlands and Training Center National Bank, Kazakhstan and describes the status of high-stakes national assessments and how they are valued in different social and cultural contexts.

Several announcements are then posted. With the upcoming Kathleen Tattersall New Assessment Researcher Award for 2018, readers are encouraged to identify potential applicants who could be nominated for the award. If you are considering doing a PhD, be sure check out the details of the 15 openings for research offered by the European Training Network (ETN) Outcomes and Causal Inference in International Comparative Assessments. Opportunities also exist for Masters students to learn how to develop and improve educational assessments in their own settings through the MSc programme in educational assessment run by Oxford University, UK. The Educational Research Centre (ERC) in Ireland is also advertising a senior position for a role in leadership in education research, quality assurance and management. For news on assessment research with the wider assessment community, access articles in The Research Matters journal from the Research Division of Cambridge Assessment UK.

Once again, I wish to thank our contributing authors for using this space to share their latest news on educational assessment. As I remind readers each time, this is a great space to share news about your research areas in educational assessment, policy programs being piloted or implemented to improve system and/or classroom assessment, education technology tools that you are using to add a new dimension to student assessment, or collaborative initiatives between researchers and practitioners to impact teaching and learning.

Should you wish to share your story in the next issue of our newsletter in May, please feel free to reach out to me at

Amina Afif
AEA-Europe Newsletter Editor
Publications Committee Editor’s Note

A word from the President

Once again I have the privilege to share a few thoughts in this space about the news of the association. After the successful organisation of our conference last November in Prague, I am now delighted to announce the details of the 19th AEA-Europe Annual Conference, to be held in the Netherlands at Arnhem/Nijmegen on 8th – 10th November 2018. The theme of the conference is “Building bridges to future educational assessment” where participants will be invited to reflect on how professionals are adapting their assessment instruments in response to the changing demands of education and the way we measure it. As researchers, practitioners, specialists and policy-makers, our attention is continuously directed towards anticipating future assessment of educational skills. This conference provides the opportunity to consider how the role of assessment is evolving in terms of its content, the growing transition towards using technology instead of paper and the psychometric validity when measuring the 21st century skills. Please do stay tuned to our website for more details on the conference, on the keynote presentations, paper and poster sessions, symposia and discussion groups as well as the respective deadlines for submission.

I am also happy to note that following the launch of the AEA-Europe E-Assessment Special Interest Group (SIG), we have already successfully organised the first live webinar last week which was attended by close to 30 participants across Europe. We hope to follow up in future with further webinars so as to share the e-assessment experiences of the SIG participants.

As 2018 rolls on, I would like to warmly thank each and every one of you for your confidence and your commitment to AEA-Europe, which helps it to fulfil its mission over the years. I look forward to your continued support and may we continue sharing our rich experiences in the field of educational assessment.

Happy reading!

AEA-Europe SIG e-assessment –  First webinar 31st January 2018

In Prague, AEA-Europe launched its first Special Interest Group (SIG) focusing on e-assessment in all its applications and across all phases of education including vocational. The e-assessment SIG aims to provide a dedicated forum for the exchange of ideas, research and experiences regarding “using digital approaches within the educational assessment”. To get the exchange off to a good start, the SIG conducted its first international webinar on 31 January 2018, chaired by AEA-Europe’s President, Thierry Rocher. He introduced the FLIP initiative, where four countries (France, Luxembourg, Italy and Portugal) have jointly agreed to set up a community in order to share knowledge and experiences as well as costs and content within the context of e-assessment. During the webinar, the FLIP framework was presented, including its underlying principles and the digital tools planned for development in the short-term. The long-term goal is to collaborate with all entities whose goal is to build assessment solutions for non-profit organisations and for the development of education worldwide. Several other countries have already shown interest in joining FLIP. A link to the recording of the webinar will be posted soon in our Facebook group.

The webinar will be one of a series offered in the run up to the AEA 2018 conference in Arnhem. In addition, the SIG board aims to organise specific SIG-related activities at the 2018 conference, which you can join or contribute to. To stay informed and receive invitations, you can visit our Facebook group: AEA Europe E-assessment SIG or register for the email list by contacting Mary Richardson ( We hope many of our SIG members will start using the Facebook group to share e-assessment related information and experiences, so it can play a role in building knowledge and networks.

General themes of the e-assessment SIG mentioned at the launch panel discussion in Prague were the summative as well as formative use of e-assessment across all formal education including vocational, the validity and reliability of technology enhanced assessment, and defining and conceptualising e-assessment. For now, the discussion participants expressed a preference to keep the scope of the SIG and its definition of e-assessment broad and to include the end-to-end cycle of e-assessment rather than narrowing the focus to specific aspects which may limit the appeal and value of the SIG. However, covering a broad range of interpretations and areas of interest from the start can be difficult for a fledgling SIG. It would be helpful to get a clearer picture of the range of interest of our current group so we can better plan activities and encourage exchanges that fit your interests. For this, we request that you please fill in our five-question online survey, which you can access here.

Currently the SIG e-assessment is led by Martyn Ware, acting Chair, together with Naomi Gafni, Rebecca Hamer, Jenifer Moody, Mary Richardson, Jurik Stiller and Lesley Wiseman.

How do question writers compose examination questions?

Over the past two years we have been exploring how the writers of examination questions go about this task. Our ambition in doing this research is to better understand some of the complexities of this form of professional work and to pass on this expertise in training provision to new question writers. It is perhaps surprising that although examination question writing practice is a ubiquitous one, there is a limited amount of research that has looked at the process of question writing.

At the 2016 AEA-Europe Conference in Cyprus we reported on the first phase of this project, which studied how seven question writers from four subject areas composed single questions for secondary school-level examination papers. A major outcome of the project was that we discovered that writers moved through a three-stage process based on the “plan-write-review” model of individual question writing, subsequently published in Johnson, Constantinou and Crisp (2017).

In the second phase of the project we extended the question writing model by considering whether it also described how question writers developed questions when writing a whole examination paper. To explore this issue we observed how six question writers from three different subjects wrote full secondary school-level examination papers.

To capture evidence of question writing, we asked each participant to write an examination paper whilst thinking aloud. During this task a researcher sat alongside the writer using a structured observation schedule. The purpose of the schedule was to capture the nature and sequence of elements of the writing activity, and to consider whether the writers were accommodating previous or nascent questions in their concurrent question drafting.

For our data analysis we adopted a sociocultural approach that encouraged us to take into account both the cognitive and the social dimensions of professional question writing practice. A specific area of interest for analysis was the consideration of how social perspectives may be evident within the writing model, and how these may have an important role in quality assurance as the writer seeks to impose their authority on the intended reader.

Our project findings extended the original question writing model that we constructed in Phase 1 to a four-stage model of question paper writing. We found that, within the context of a continual phase of preparation that gives some insight into the lived experience of question writing, the original Plan-Write-Review model was nested within a broader model. This extended model also included the requirement for writers to develop a planning framework for the examination paper as well as to consider formal quality assurance processes that could refine the examination paper.

For more information about our project and its references, please contact Mr. Martin Johnson and Ms Nicky Rushton, from Cambridge Assessment, UK.

Perception-Based Evidence of Test Validity

In the recent past, negative attitudes toward large-scale testing have increased, fueled by the media and web-based social platforms. This is an example of face invalidity — a situation in which stakeholders perceive a test to be inappropriate for its intended use. Face invalidity can have a negative impact on examinee motivation in  preparing for and performing well on the test, and can even reduce the willingness of examinees to take it (Nevo, 1985). Negative public opinion can put pressure on policymakers, who decide that the test must be modified or eliminated altogether. If face invalidity can have such a critical effect on a test, why do we generally dismiss the notion of face validity?

The definition of face validity is whether, on the face of it, a test appears to measure what it purports to measure. The consensus among measurement experts is that asking laypeople if they think a test seems to be valid, is not sufficient evidence to support the interpretation and use of the test scores (Messick, 1989). Indeed, the term face-valid should not be used to describe a test because it is misleading, flawed  or  outdated. However, we believe that in warning against face validity, measurement experts have obscured the usefulness of the evidence emerging from stakeholders’ perceptions about a test. We refer to this as perception–based evidence (PBE), which is psychometrically different from performance-based evidence (that is: test scores, reaction time, etc.).

Perception is an interpretive process influenced by a variety of factors, such as past experiences, knowledge, beliefs, and attitudes. PBE reflects perceptions about various aspects of the test (for example, its purpose, item quality, content relevance, score usage, etc.), as seen through the eyes of stakeholders other than those of test developers. PBE can support the interpretation of validity evidence by providing insights about why items perform a certain way and may lead to understanding why examinees are confused about their scores. In this way, it can be used to improve the overall quality of a test and minimize its adverse aspects.

PBE is specifically useful within the validity argument framework (Kane, 2013). Perceptions of various stakeholders represent alternative claims that are essential for the evaluation of validity arguments. Using PBE in this way can help support or refute validity claims. Moreover, PBE is especially relevant for evaluating the clarity and plausibility of the interpretive argument. If the validity argument needs to be clear and plausible, then it is important that stakeholders perceive it as such. To be clear, we do not argue that researchers should give preference to the perceptions of laypeople over those of experts. We simply believe that researchers should consider both views when evaluating validity arguments.

At different stages of a test’s life cycle, researchers can analyze PBE for different reasons: (a) to gain insights about items during test development and improvement, possibly referring to the test specifications, (b) to identify validity threats, (c) to gain insights about validity evidence collected from other sources (test content, response process, consequences, etc.), (d) to generate alternative validity claims, and (e) to evaluate the clarity and plausibility of an interpretive argument. In conclusion, ignoring PBE hinders the ability to make compelling and well-founded validity arguments.

For further information, please contact Mr. Tzur Karelitz , and Mr. Charles Secolsky, at the National Institute for Testing & Evaluation, Jerusalem, Israel.

Re-designing the role of examiner judgement in maintaining standards for UK general qualification examinations

In order to ensure a fair assessment process and to maintain consistent standards year on year, UK awarding organisations use a combination of senior examiners’ expert judgement and statistical analysis to set grade boundaries and to ‘award’ high stake examinations. Influenced in part by changing regulatory requirements, UK awarding bodies have placed increasing emphasis on the use of statistical information to arrive at decisions relating to grade boundaries (Baird and Morrissey, 2005). This, to a large extent, has been a result of extensive research showing examiner judgement to be unreliable (cf. Baird and Dhillon, 2005). Nevertheless, the involvement of experts in standard setting remains crucial in preserving public trust in the UK examination system (Jones, 2009).

At WJEC, one of the UK’s main examination boards, we are undertaking research to design alternative, more reliable methods of capturing expert judgement, and to clarify the role that such evidence should play in the process of maintaining educational standards in the UK’s high stake examinations. To do this, we draw on psychological perspectives on judgement and decision-making which point to a wide range of heuristics and biases limiting experts’ ability to make valid judgements on standards (cf. Hardman 2009, Kirkebøen 2009, Kerr and Tindale 2004).  We are employing a ‘design thinking’ approach (cf. Brown, 2009) utilising these insights to develop prototype methods for testing and review.

To date, our study has showed that only examiners who have experience of marking a particular paper are able to provide confident and reliable judgements. This means that general subject knowledge might not be sufficient. In addition, examinee-centred methods were shown to be more accessible for experts as those allowed them to draw on their experience more easily. It was difficult, however, to break experts’ habit of applying their expertise in a way that deviated from how they use it in their daily practice. This underlines the need for a high-quality training of examiners on standard setting methods that they are not familiar with. Furthermore, removing marks from scripts reviewed at award may reduce identified biases, but increase the level of cognitive load associated with the task – in the test phase, experts often fell back on re-marking scripts as a starting point for assessing its worthiness for a given grade. Comparative judgement of scripts may overcome this issue; however, experts also found it difficult to compare scripts holistically when candidates were stronger on different aspects of the assessment.

The next stage of research aims to draw on the aforementioned insights and prototype other standard setting approaches.

For further information, please contact Joanna Maziarz Alayla Castle-Herbert, Siân Denner, Liz Phillips and Richard Harry.

How high-stakes national assessments are valued in different social and cultural contexts: the case of UNT

End-of-school exams are a widespread method of measuring knowledge and skills of individual learners at the end of secondary education. But the culture of assessment can be diverse. In some countries, like Netherlands and United Kingdom, standardized testing has a long history and is really rooted. National exams in these countries are generally accepted as being the school-leaving qualification and at the same time the basic university entrance exam. In most countries to the east of Europe, the tradition of centralized final exams is quite young and ‘purpose pluralism’ is not always accepted or implemented.

In Kazakhstan, the Unified National Testing (UNT) was implemented thirteen years ago and until 2017 it combined  final qualification for secondary education and entrance examinations to higher education.  The combining of two different purposes of testing has caused much discussion and loss of trust in the efficacy and reliability of the UNT. Since 2017 the UNT is split into a separate school-leaving exam and entrance exam for university.

Based on literature research, this paper presented at the conference in Prague, analyses how social and cultural context can have different impact on validity of systems for central examination by comparing Kazakhstan, Russia, China, UK and the Netherlands.

For further information on the findings, please contact Nico Dieteren from CITO Netherlands and Dr. Aigul Yessengaliyeva from the Training Center National Bank, Kazakhstan.

Call for nominations for the Kathleen Tattersall New Assessment Researcher Award

The Kathleen Tattersall New Assessment Researcher Award (KTNRA) will be open for nominations in the first half of February 2018. With every new winner this prize has increased its status as an important recognition of a talent in our field! Before announcing the next award, we would encourage all of you to think about anyone in your vicinity who may be eligible to be nominated. In particular we would like to ask those of you who have a senior role in your organization to encourage and assist potential applicants. In general, most of us need a little nudge to take the final step to apply for a recognition like this! For me information see our webpage, or contact: Rolf Vegar Olsen of AEA-Europe’s Professional Development Committee, at

15 openings for PhD research in international large-scale assessments

The European Training Network (ETN) Outcomes and Causal Inference in International Comparative Assessments (OCCAM) is looking for 15 Early Stage Researchers (PhD Candidates, 100%) to start in August 2018.

The successful candidates will receive full-time employment contracts (including mobility and family allowances) for a period of 3 years at one of the 12 partner organizations. They will participate in a structured training program including network-wide training sessions, research visits, and a secondment.

Applicants are expected to hold a Master’s degree in economics, education, psychology, sociology or related disciplines. We particularly welcome applications from candidates with training in quantitative methods.

The OCCAM is an international, intersectoral, and interdisciplinary graduate school that will investigate international large-scale assessments in relation to the following topics:

  • The Integrity of Educational Outcome Measures
  • Educational Settings and Processes
  • Governance of Human and Financial Resources and Decision Making

OCCAM is a Marie Skłodowska-Curie action and has received funding from the European Union’s Horizon 2020 initiative. It consolidates the expertise from leading universities and research institutes from Australia, Belgium, Cyprus, Germany, the Netherlands, Norway, Sweden, UK, and the US. For further information, please visit or email

MSc programme in educational assessment

This two-year part-time MSc programme has been introduced at a time when high quality educational assessment is recognised as a core element of a strong education system. Its aim is to provide researchers and professionals with the skills to develop and improve educational assessments in their own settings. Students will gain technical and statistical knowledge in assessment and engage with the design and evaluation of educational assessments, as well as graduate with a sound understanding of the field, including high stakes assessment systems.

The course offers a sound understanding of the design of assessment systems, the options available and their implications; the ability to analyse the quality of assessments and engage in research, policy and practice questions in an informed and critical manner; skills to impact upon the quality of educational assessments in a wide range of settings by enhancing assessment skills and increasing opportunities for progression to senior positions in educational assessment organisations both nationally and internationally.

For further information, please contact the course director at  and our course page

Vacancy for a senior position within the Educational Research Centre in Ireland

The Educational Research Centre (ERC) in Ireland is currently advertising a senior position for a role which requires leadership in educational research, quality assurance, management, budgetary control and general administration. It is a five year contract (renewable once). All applications should be sent in by 5pm on 9th February, 2018.

Since its establishment in 1966, the ERC has been an internationally recognised centre of excellence in research, assessment and evaluation in education. In September 2015, it was designated as a Statutory Body in accordance with the Education Act (1998). The Centre carries out research at all levels of the education system, from preschool to adult. Research is undertaken at the request of the Department of Education and Skills, at the request of other agencies and on the initiative of the ERC itself and its staff. The Centre is located on the DCU St Patrick’s Campus in Drumcondra.

For more information please visit  or contact Vicki Lavin on (01) 4744653 or email

Latest issue of the Research Matters journal

Research Matters is our free journal which allows us to share our assessment research with the wider assessment community. It is produced twice a year by the Research Division of Cambridge Assessment and features articles, short summaries, and comment on prominent research articles. Readers of the AEA-Europe Newsletter can also read and explore full details of the contents, articles and features of all previous issues of Research Matters, via our Group’s website.

If any AEA-Europe members are not already on our mailing list and would like to receive a regular, printed copy of the journal, they are very welcome to contact Karen Barden: