Cognitive Load Theory and the Differences Between Experts and Novices: What Chess Tells us about Teaching Foreign Languages

Cognitive Load Theory and the Differences Between Experts and Novices: What Chess Tells us about Teaching Foreign Languages

By: Caroline Handley

My interest in Cognitive Load Theory (CLT) started with two books I read last year which had a big impact on me. They made me start questioning some of the ideas I’d gained during my teacher training. Both books were about how students learn in the classroom, one by Chall (2000) and the other by Hattie and Yates (2013). Both attested to benefits of explicit teacher-led instruction over inquiry-based or student-led learning. Hattie and Yates’ book also includes chapters on CLT and on the acquisition of expertise. All three areas were clearly closely linked and together challenged two underlying principles behind a lot of my lesson planning: maximise student talk-time, minimise teacher talk-time, and encourage students to be creative in making real meanings rather than practicing discrete aspects of language. Given the shock these two books gave to my system, I clearly had to learn more about these ideas. So, I started reading journal papers about CLT and the strengths of explicit instruction over inquiry-based learning, all the time trying to keep in mind how they might relate to foreign language learning.

If you’ve watched the videos on Cognitive Load Theory and read Julia Daley’s excellent article above you don’t need another overview of CLT. Instead, I want to discuss CLT in relation to novices and experts. But, as a brief reminder, the main idea is that people have limited working memory capacity, which is easily maxed out when learning new and complex knowledge or skills (for a recent review, see Sweller, van Merriënboer, & Paas, 2019).

But is working memory capacity always severely limited?

Working memory is the temporary storage and processing of information under attentional control in order to perform an activity or task (Baddeley, 1986; Baddeley & Hitch, 1974). In Baddeley’s (1986) model, task information is attended to within working memory, which holds the information and actively processes or manipulates it. Learning occurs when the contents of working memory are encoded into long term memory. Working memory capacity for new information is limited to a few items or chunks of information (for a review, see Cowan, 2014). According to CLT, this limitation on working memory should be the main factor educators consider when designing materials and constructing lessons (Kirschner, Sweller, & Clark, 2006).

"Working memory should be conceptualized as the activated part of long-term memory."
Caroline Handley
TT Author

However, as highlighted in Cognitive Load Theory, the information in working memory is rarely, if ever, 100% new. Just as we now know that past experience guides how we perceive the world around us (top-down processing, see Gilbert & Li, 2013, for a review), working memory is also influenced by our experiences, as stored in long-term memory. Baddeley (2000) acknowledged this, adding an “episodic buffer” to his model to integrate information from long-term memory with that held in working memory. Other authors, such as Cowan (1988) and Ericsson and Kintsch (1995), suggest that working memory should be conceptualized as the activated part of long-term memory, rather than seeing them as two separate stores. In this view long-term and working memory are not located in different parts of the brain; long-term memories are stored across neurons and synapses and information is represented in working memory in their patterns of activation. In Cowan’s (1988) model, long-term memory is activated by stimulus information (from our external environment or internal thought processes). When we voluntarily attend to such information, effortfully and consciously processing it (such as when we detect a change in our environment), this attended-to, activated long-term memory IS what we call working memory. In this model, it is the attentional control that is limited in capacity to processing a few items or chunks. With repeated experience of an activity, as we automatize how to perform it and we become an expert at it, subsequent performances activate long-term memory without requiring attentional control, meaning that our working memory capacity for that activity becomes virtually unlimited. This aspect of human cognition is the central tenet underlying CLT. When we start to learn a new and complex concept or skill, the constituent elements of that concept or skill are all new and must be attended to and processed separately, creating huge cognitive load for novices.

But can memory really be the main (only?) thing separating experts from novices?

Ericsson and Kintsch (1995) suggest that as we acquire expertise at a task, information is stored in higher order structures or representations in long-term memory, which are reactivated when performing the task. The acquisition of a concept or skill is the process of forming a template or schema for it in long-term memory, in which the multiple elements get gradually integrated into a smaller number of more complex elements that we can activate automatically[1] (Sweller & Chandler, 1994). Expertise is task-specific, but for any activity in which expertise is acquired, long-term working memory yields huge processing capacity, in contrast to the severely limited working memory capacity of novices. The seminal study that suggested that experience (or long-term memory) distinguished experts from novices was a 1946 doctoral thesis by De Groot (published in English translation in 1965) about the thought processes of chess players of various proficiency levels. He gave them various states-of-play in chess and asked them to decide what move they should make. Using a think-aloud protocol, he was unable to find significant differences between masters, experts, and novices in how they approached the task. The only differences between (grand)master players and novice players were that the masters could utilize automatic routines and strategies and could immediately perceive the abstract or deep structure of the game, capabilities beyond those of novices. In De Groot’s words:

His extremely extensive, widely branched and highly organized system of knowledge and experience enables him, first, to recognize immediately a chess position as one belonging to an unwritten category (type) with corresponding board means to be applied, and second, to “see” immediately and in a highly adequate way its specific, individual features against the background of the type (category). (p. 306)

Experience, as represented in long-term memory, enables experts to see the solution(s), typically in an automatic or intuitive manner, whereas novices struggle to see how the parts relate to each other.

Chase and Simon (1973) extended De Groot’s findings by testing chess players’ recall of chess boards in various configurations. Whereas the novice player could only recall the positions of a few pieces, the master could recall almost the entire board. However, this was true only when the configuration of pieces simulated a state of actual game play. When the pieces were randomly placed on the board, the master’s recall was similar to that of the expert and novice. These studies suggest that masters don’t chunk four or five pieces, perceiving them as if they were one piece, but rather that they had formed subconscious long-term memory representations of numerous states of game play, enabling them to see the underlying structure of the configurations and intuitively determine the best move. Memory allows experts to perceive the solution; novices have to test a multitude of possible solutions and can only hope they are lucky enough to land on one that isn’t too bad.

But how does this relate to language learning?

Well, interestingly, Ericsson and Kintsch (1995) base part of their argument that the performance of experts can only be explained by long-term working memory on our ability to understand written texts. They claim that when reading a text, we construct an integrated representation of the gist of the text in long-term working memory in order to comprehend it. In this way, we are able to store a lot of information about the text, much more than a few chunks of information. In terms of CLT, if the text is in our native language (or a language we are highly proficient in), processing the sentences and their meaning is largely automatic and does not impose much cognitive load, provided we are familiar with the topic (a point I will return to later). In the same way master chess players perceive chess pieces, we automatically perceive words and phrases and their meanings and have access to a range of routines for how they relate to each other within larger texts.

 These internal models are like templates, enabling us to easily remember conversations and stories. The movie Memento, about a man with short-term memory loss, is acclaimed partly because it reversed the narrative template so that viewers were as confused as the lead character.

For language learners, reading and listening in the target language are not grounded in long-term working memory, processing the words automatically and with low attentional demands. Instead, comprehending a text relies much more on the ability to hold new information in working memory, with its severe limitations and susceptibility to cognitive overload. Although learners may struggle to decode the words and their meanings, understanding individual words is not the sole problem. According to CLT, it is maintaining the relations between individual items in working memory (element interactivity) that causes cognitive overload for novice learners (Sweller & Chandler, 1994). Understanding what each word means is not so difficult, but understanding how the words’ meanings relate to each other to produce the complex meaning of the whole text is beyond the capacity of their working memory. If a learner has to decode each sentence word by word, this places a heavy demand on working memory, making it very difficult to create a gist representation of the whole sentence, to guide interpretation of the following sentence. With spoken texts the difficulties are compounded by the transience of the input and the inability to control the word rate or to re-listen to a word or phrase. Like a novice chess player, a language learner “has to build up from the ground – if such a thing is possible at all” (De Groot, 1965, p. 305).

[1] As a result, instructional materials that are optimally designed for learners (novices) are detrimental to experts’ performance, known as the expertise-reversal effect in CLT (Kalyuga, 2007).

"The most important thing may be to consider the content of the text or lesson, not just the language."
Caroline Handley
TT Author

So how can we help our learners?

According to Cognitive Load Theory, the most important thing may be to consider the content of the text or lesson, not just the language. Given the central idea that complex information in activated long-term memory can be automatically processed, whereas working memory for novel information involves attentional processes with very limited capacity, within CLT there is an emphasis on connecting new information to prior knowledge to reduce cognitive load. The importance of prior knowledge in learning is hardly new, yet in many classrooms the emphasis is instead on students acquiring general skills (Wexlar, 2019). Proponents of explicit teaching and applying CLT to instructional design emphasize that learners must first learn content or knowledge to which they can then apply such skills (Rosenshine, 2012; Wingate, 2006). For example, Recht and Leslie (1988) compared the effect of prior knowledge and reading ability on children’s memory of a text. They gave junior high school students a text to read describing the state-of-play during a baseball game, then asked them to create the scene with models of the players and to verbally summarize the text.

Knowledge of baseball significantly increased text recall and comprehension, but general reading ability did not. They concluded that teaching reading strategies without considering students’ prior knowledge is insufficient for successful learning outcomes. In CLT terminology, without prior knowledge of the topic, the students’ working memory capacity was overloaded, so they couldn’t decode or remember the information successfully. In regard to foreign language teaching, this suggests that relating content to students’ lives is not merely a matter of generating interest. At least when teaching lower level learners or introducing new language structures or grammar, it may be essential to ensure there is no unfamiliar content (extraneous cognitive load) so that students can effectively process and learn the target language (intrinsic cognitive load). As Nation (2007) asserts in his “four strands” approach to language teaching, meaning-focused input and output strands only exist if the content is familiar.

But what do Cognitive Load Theory and the differences between experts and novices imply for my beliefs about always maximising student talk-time and encouraging students to be creative in making real meanings in English?

 Unfortunately, the answer has to be that I need to revise my beliefs. I need to become comfortable with spending some lesson time in whole-class, teacher-led instruction. Even if I’ve given students meaningful input in terms of a listening or reading text on a familiar topic, if I want them to discuss the topic or complete a related task, I should also think about the language they need to do so. My students need me to explicitly teach and drill useful structures that native speakers automatically use, so that they have some spare working memory to be creative. If I haven’t previously used activities to automatize (drill) key phrases or grammatical patterns that students can use to sustain their discussion or task performance, I’m creating the perfect cognitive overload to cause them to give up and use their L1 to complete the activity. Even scarier for my CELTA-taught beliefs, Cognitive Load Theory implies that maybe PPP isn’t such a bad approach for teaching novice language learners after all. It starts with explicit instruction (declarative knowledge that reduces cognitive load) and is followed by structured practice (towards proceduralisation), then extended practice (towards automatization), as is argued by Dekeyser and Criado (2012). Maybe students need such learning experiences in order to build up the knowledge and skills required to use language creatively, as suggested by Li, Ellis, and Zu (2016), who found that task-supported language teaching, with explicit instruction prior to the task and feedback during the task, led to far superior learning outcomes than task-based language teaching. Maybe students need to learn language in the way that De Groot (1965) suggests players learn chess:

First, by means of playing experiences and/or textbooks the player gets to know certain important general strategic and tactical rules; next, he learns to recognize and to handle exceptions to these rules – which in their turn grow into new, more refined rules; with new exceptions, etc. Finally, the player develops a “feeling” for the cases in which these already highly specialized rules can be applied. (p. 351)

 Most importantly, I need to always remember that what is instantaneous for me, as a grandmaster of English, might involve a multitude of thought processes, questions, false starts, and bad moves for my novice student players.


  • Baddeley, A. (1986) Working memory. New York, NY: Oxford University Press.

  • Baddeley, A. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4(11), 417-423.

  • Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), The psychology of learning and motivation: Advances in research and theory. New York, NY: Academic Press, pp. 47-89.

  • Chall, J. S. (2000). The academic achievement challenge: What really works in the classroom? New York, NY: Guilford Press.

  • Chase, W. G., & Simon, H. A. (1973). Perception in chess. Cognitive Psychology, 4, 55–81.

  • Cowan, N. (1988). Evolving conceptions of memory storage, selective attention, and their mutual constraints within the human information-processing system. Psychological Bulletin, 104(2), 163–191.

  • Cowan N. (2014). Working memory underpins cognitive development, learning, and education. Educational Psychology Review, 26(2), 197–223.

  • De Groot, A. (1965). Thought and choice in chess. The Hague, Netherlands: Mouton.

  • Dekeyser, R., & Criado, R. (2012). Automatization, skill acquisition, and practice in second language acquisition. In C. A. Chapelle (Ed.) The Encyclopedia of Applied Linguistics

  • Ericsson, K. A., & Kintsch, W. (1995). Long-term working memory. Psychological Review, 102, 211–245.

  • Gilbert, C. D., & Li, W. (2013). Top-down influences on visual processing. Nature Reviews Neuroscience, 14, 350-363.

  • Hattie, J., & Yates, G. C. R. (2013). Visible learning and the science of how we learn [Kindle version]. Downloaded from

  • Kalyuga, S. (2007). Expertise reversal effect and its implications for learner-tailored instruction. Educational Psychology Review, 19(4), 509-539.

  • Kirschner, P. A., Sweller, J., & Clark, R. E. (2006). Why minimal guidance during instruction does not work: An analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching. Educational Psychologist, 41(2), 75-86.

  • Li, S., Ellis, R., & Zhu, Y. (2016). Task-based versus task-supported language instruction: An experimental study. Annual Review of Applied Linguistics, 36, 205-229.

  • Nation, I. S. P. (2007). The four strands. Innovation in Language Learning and Teaching, 1(1), 1-12.

  • Recht, D. R., & Leslie, L. (1988). Effect of prior knowledge on good and poor readers’ memory of text. Journal of Educational Psychology, 80(1), 16-20.

  • Rosenshine, B. (2012). Principles of instruction: Research-based strategies that all teachers should know. [PDF file]. Retrieved from:

  • Sweller, J., & Chandler, P. (1994). Why some material is difficult to learn. Cognition and Instruction, 12(3), 185-233.

  • Sweller, J., van Merriënboer, J. J. G., & Paas, F. (2019). Cognitive architecture and instructional design: 20 years later. Educational Psychology Review, 31(2), 261-292.

  • Wexlar, N. (2019, August). Elementary education has gone terribly wrong. The Atlantic.

  • Wingate, U. (2006). Doing away with “study skills.” Teaching in Higher Education, 11(4), 457-469.

Caroline Handley, the BRAIN SIG Coordinator, is an English lecturer at Asia University, Tokyo. She is currently pursuing a PhD at Swansea University, UK, where she is researching the relation between conceptual and linguistic knowledge in lexical processing, using an embodied cognition perspective.

Leave a Reply

Your email address will not be published. Required fields are marked *