Low-Stakes Testing

By: Stephen Ryan

“OK. Good morning. Put away your books and papers, please. We’re going to start, as we always do with the Word Quiz. As usual, you’ll find questions about all the new words we had in the last lesson plus a selection of words from previous lessons, going back to the start of the school year. Here we go.”

There’s a lot published about high-stakes testing—tests where, in one way or another, test-takers’ futures are in the balance. Lots about ensuring fairness, validity, reliability, and so on. Even more about test anxiety, dealing with it, overcoming it, minimising its consequences. But, when I started to look, I found there wasn’t much out there on low-stakes testing, something I realised I’d been doing throughout my teaching career.

Admittedly, they are less exciting than the high-stakes variety, because, well, there’s less at stake. Less to be anxious about. Less for the test-maker to worry about getting just right. In fact, they can be so low key that I could make a case that most practice activities in the language-learning classroom could be called “low-stakes tests” and most teachers and students barely notice that they are tests at all.

Unexciting or not, low-stakes tests and the principles on which they are based are essential elements in language learning, as I hope to show by telling you about my regular vocabulary quizzes.

So, what makes them low-stakes?

For a start, there are so many of them that nobody could think their future (or future grade) depended on any of them. Every language lesson I do starts with a vocabulary quiz. On average, each quiz has around 30 items. Items from the previous lessons plus “review” items from previous quizzes. That’s a lot of items per semester. Getting any one of the wrong is not going to have serious consequences for anybody.

For those who care, there is a sentence in my syllabus (which, traditionally, in my teaching context, nobody ever reads) that says that, cumulatively, the vocabulary quizzes add up to 10% of the semester grade. Just enough, I hope, to make them worth preparing for but not enough to stress anybody out.

Then, I try to make the atmosphere as un-testlike as possible. True, students can’t consult reference materials or their friends, but there is no time limit set, no particular measures taken to prevent the occasional stray glance at another student’s answers. Relax, and see what you can do.

Before the quiz starts, I’m happy to answer any questions, explain anything that may be unclear, being well aware that any word students ask about is likely to remain in their short-term memory when they actually start to answer test items. Afterwards, we do an immediate review of the answers, during which access to resources (friends, notes, devices) is once again allowed. When returning quiz papers in the next lesson, the focus is on answers rather than scores.

If the stakes are so low, why test?

Because, as an avid reader of MindBrain Ed Think Tank articles, I know that what these tests are doing is actually encouraging and reinforcing learning, rather than attempting to measure it. Teachers who read these pages as avidly as I do will know what I mean. Low-stakes tests facilitate these principles of learning:

Spaced Repetition

The more times we encounter an item, the more likely we are to remember it when we need it. The idea of the “spacing” is that, at first, the encounters need to be fairly close together (thank you Ebbinghaus for your Forgetting Curve, reminding us how quickly new information disappears from working memory after a first encounter) but then can be spaced further and further apart. In brain-science terms, the repetition increases the salience of the new knowledge, easing its transition from working memory to long-term memory.

So, let’s count the repetitions. A word comes up in class—usually because a student needs to say something and doesn’t have the words, or because they need it to understand a text. I say the word. I write it on the right side of the blackboard. Students know that the words on that area of the blackboard will be on the quiz in the next lesson. They copy it onto their vocabulary sheet. I say the word again, pointing out any peculiarities of spelling or pronunciation. At the end of the lesson, I read out all the words on the right of the blackboard, with a reminder of the meaning of each.

The ideal student then leaves the classroom and spends 10 minutes or so each day reviewing the words on their vocabulary sheet, until the day of the next lesson. The less-than-ideal student forgets all about the upcoming quiz, ignores the vocabulary sheet, and then panics in the few minutes before the start of the next lesson and does a very quick review then. Once I get to the classroom, both the ideals and the non-ideals can ask me about any of the words.

On the quiz, they see a written prompt that should elicit the new word. This is not a complete definition, just a picture, a couple of words in English, or even a word or two in their L1. Just enough, I hope, to clarify which word I am trying to elicit. Either they recall it and write the answer, or they don’t. The quiz ends, I collect in the papers, and immediately, we review orally what the answers should have been.

If you have been counting so far, you’ll see that the repetition count for each word is already around ten. Then, there’s always a chance that the item in question will be one of the “words from previous tests” that comes up on a subsequent test, so the ideal student will spend part of their daily vocabulary review looking at “old” (previously tested) words. If the item does come up on the next quiz, count one more repetition.

After we finish reviewing the answers to that day’s quiz, I hand back the previous quiz and, once again, go over the answers orally.

As the semester progresses the quizzes get longer and longer, because, in addition to the new vocabulary, I am including one item, randomly chosen, from every previous quiz. So, by mid-semester, there may be 10 new words and 20 “old” ones. The incentive to review previously tested words grows, and so do the number of spaced repetitions.

Retrieval Practice

Your brain is not a computer. Your memory is not a hard drive. In some ways, memory is more like an old-fashioned photo album. The more you take out your memories and look at them, the better you will remember them (except that, unlike photos, the image does not fade with repeated viewings, it is strengthened). Retrieval practice is the $10-dollar term for taking out the memory and looking at it or, if you prefer, transferring it from long-term to working memory and then putting it back again. The more you do that, the more often those particular neurons get to fire together and therefore (thank you, Hebb) wire together.

The trick is to give students a reason to take out their vocabulary items (or other new learnings) and look at them. This is particularly important in an EFL environment where students may not have a reason to retrieve English words from one lesson to the next. Anyone up for a low-stakes vocabulary quiz?

Each quiz affords multiple occasions for retrieval. No need to count them this time. We have just done that. Yes, spaced repetition and retrieval practice are mutually reinforcing.

Learning is Social

I remember taking some pretty high-stakes tests in high school. You sit for an hour or more struggling with the same questions as everybody else in the room but are forbidden to communicate with them. Then the test ends. You can talk again. The one thing you want to do is discuss the test (maybe not your answers to it, but the test): What did Question 7 mean? Was there really something on the test the teacher hadn’t taught? Shared experience. Imposed silence. And then, quite naturally, we all talk about the experience.

And we learn from each other. Because learning is social. Knowledge is constructed through interaction with others.

That’s what happens when even my low-stakes quizzes finish. Students ask each other about the questions and answers. Sometimes they even ask me. When we do the immediate post-test oral review, they often help each other. Knowledge is being constructed and reinforced. Isn’t that why we go to school?

Downsides

Yes, I spend part of each day grading quiz papers. Yes, it takes time from other activities to run and review the quizzes. Yes, it can be a struggle to come up with succinct and unambiguous prompts. I can justify the energy and time taken by each of these things because I am convinced that what looks like a quiz is actually a learning activity for students.

The real weakness of this approach is that it does require students to spend at least some time out of class studying the words. In my experience, this is not usually a problem. Some students may not really know how to study vocabulary; they can be coached. Some may note the new words on a scrap of paper that inevitably gets lost; that’s why we have vocabulary sheets.

But, once. Once, I had a class where nearly every student did poorly on every quiz. The items were no harder than for other classes, their general ability no weaker than others. They simply got into the habit of not studying, getting poor scores on the quiz, and not caring. After all, the stakes were low, so why should they care? I was tempted to answer that question by increasing the stakes but stopped myself in time to think of more learning-centered solutions. I made quizzes more social: after they had done their best to answer questions individually, students were encouraged to consult each other about the answers, as long as they wrote any socially obtained answers in a different colour (thank you, Tim Murphey). Not able to rely on students to study the words at home, I moved that part of the study into the classroom. Not able to answer all the questions? Give me your quiz paper, take five minutes to study your vocabulary sheet, then put it away and I’ll give you another chance with the same quiz paper. These approaches didn’t really solve the inherent problem (low-stakes = low motivation), though it did improve the atmosphere in class.

Beyond Vocabulary

It’s not just about learning words. I’ve used the regular vocabulary quizzes as my main example here, but the principles of low-stakes testing can be extended to any body of knowledge. I recently had a chance to teach geography, and ran weekly map quizzes on similar principles. A course on Life in English-Speaking Countries prompted me to add an extra dimension: movement. Factoids about each country written on note cards; one note card per student; read your card and go to the corner of the room¹ showing the flag of the country the factoid applies to, consult your classmates if you need to; here’s another card; repeat; each week the deck of note cards gets thicker.

Spaced repetition; retrieval practice; expectancy; social learning; these are the building blocks of learning. Setting the stakes low can allow us to make them the building blocks of testing, too.

¹ See our July, 2018 Think Tank on Exercise and our May, 2020 Think Tank on Body Matters on the importance of movement for learning.

Stephen M. Ryan teaches English and a lot of other stuff at Sanyo Gakuen University, in Okayama, Japan.