Maximising the Self by Minimising Surprise

Maximising the Self by Minimising Surprise

By: Caroline Handley

The idea that we remember past experiences in order to predict future ones can be traced back as least as far as Tolman (1948). His research, with various other scholars, conducted at a time when behaviourism was the dominant theory in psychology, led him to propose that learning by association did not result in conditioned stimulus-response, but rather that animals (including humans) developed expectancies based on prior experience. He coined the term “cognitive map” to describe the higher-order representations that were formed during learning. For example, in one experiment, summarised in Tolman (1948), rats trained to run down one arm of a maze for food, were more likely to run down an arm in a similar direction when the original arm was blocked, showing that they had not only learned to associate the arm with the food reward, but had formed a cognitive map of the orientation of the maze. This cognitive map represented the task at a level of abstraction, allowing the rats to generalise from prior learning and transfer that knowledge to new stimuli. More recent research suggests that animals and humans form cognitive maps of a variety of tasks and events, not just spatial maps, enabling them to make predictions that optimise reward in new situations (Behrens et al., 2018).

The free energy principle

A more recent theory of predictive processing, the free energy principle (Friston, 2010; Friston & Stephan, 2007), proposes a unified theory of action, perception, and learning. In this theory, organisms act on their environment to optimise the models of themselves that are intrinsically derived from their bodily form, so that they can predict the cause of subsequent sensory input, or minimise surprise. In other words, they exhibit adaptive behaviour that maximises their chances of continued survival. Brains can predict the causes of sensory input because their generative models are hierarchical, with each level of the hierarchy informing levels below and above it, enabling self-organisation (see Jason Lowes’ article in this issue for a detailed explanation of this key aspect of predictive processing).

According to Friston (2010) learning is the creation of models or predictions based on prior experiences; it is the reduction of uncertainty or “noise.” As Curtis Kelly and Jason Lowes both state, prediction is necessary to comprehend the world, due to the amount of information our senses take in every second and the noisiness (uncertainty) of that information. We create concepts and categories (models) that drive our predictions about the world around us; we act on the environment to sample new evidence and update our predictions or beliefs based on the sensory input (Friston, 2010). For humans, language may be a key element of our models (Connell, 2019). For example, if your friend brings her new pet and calls it a “dog” even though it doesn’t look like any breed of dog you’ve seen before, your initial reaction will probably still be to stroke it, and, if you’re a dog-lover, you probably won’t even consciously think about the action before you do it. Your predictive model has automatically selected an optimal response to the situation.

The brain is a “prediction machine” because it must lower the free energy (surprise) or entropy (uncertainty) of its own states in an ever-changing environment. It does this by constantly acting on the environment in order to change the sensory input to better match its predictions about what is causing that sensory input and by updating its internal states as a result of prediction error to enable more accurate future predictions (Friston, 2010). Friston argues that all self-organizing systems, or living organisms, must minimise free energy in order to remain alive, or as he puts it, to deal with “entropy,” which is a mathematical measure of the unknown information about a system. The second law of thermodynamics states that all closed systems tend towards their state of lowest free energy or maximum entropy. However, living organisms are open systems which ingest energy and use it to sustain their existence (thereby increasing the informativeness of their being, or decreasing entropy) by increasing the entropy in the surrounding environment. In so doing, they maintain their structure and function, or a high probability of being in a homeostatic state (a state that supports living; Hoffman, 2012, provides an accessible explanation of this complex idea). This exchange of entropy couples living organisms to their environmental niche.

For more about free energy, click on one of these videos (15 min and 75 min respectively):

Action-constructed perception

Although open systems, such as organisms and their brains, constantly interact with their environment, they are separate from it. The brain must infer or predict causes of sensory input, as it cannot directly see, hear, touch, taste, or smell external stimuli. There is a barrier between your brain-body and your environment. This gives rise to the long-standing mind-body problem: How can activity patterns in the brain produce our subjective experiences of the world? Predictive processing enables a paradigm shift in the search for answers to this question by claiming that the brain does not respond to stimuli in a reactive manner; it does not map the world, but matches the world to its own, self-generated activity patterns, using these patterns to infer the world, referred to as adaptive inference (Buzsáki, 2019; Clark, 2013). It is claimed that this process is more efficient and maintains greater stability of neural activity than if the brain responded to stimuli; it protects the brain from random fluctuations due to noisy sensory input. Buzsáki (2019) further claims that actions are primary to perception and it is only through action that sensory information can become meaningful. By creating neural maps of our actions and our bodies, we create prediction signals for our sensory systems, to which sensory input can be compared, enabling meaning-making. Predictive processing, grounded by action, constructs experience.

Meanwhile, in the language classroom

Predictive processing, the free energy principle, and action-constructed perception are all interesting and powerful theories of how organisms maintain life and brains learn. They are also relevant for language teaching and learning. Two examples of this are discussed below.

"Expert language users accurately predict what they will hear or read."
Caroline Handley
TT Author

First, comprehending language involves predictive processing. Expert language users actively predict at multiple levels what they will hear or read (see for example, Brothers et al., 2015). These include lexical, syntactic, phonological, and even non-verbal levels (for a review, see Kuperberg & Jaeger, 2016). Foreign language learners, depending on their ability level, tend to have much weaker and less accurate predictive models. In the absence of robust models, they depend more on bottom-up processing of the phonemes, words, and syntax of the foreign language, so they struggle to deal with the noise in such input. For example, spoken word recognition is extremely complex, as phonemes vary due to the preceding and succeeding sounds, the speaker’s gender, intonation, etc. (see Weber & Scharenborg, 2012, for a review of models of spoken word recognition). Expert users of a language often don’t notice the noise in such input as we easily match the sounds to our robust, entrenched internal models of sounds, words, multi-word units, grammatical constructions, and syntax, enabling us to perceive automatically the language we hear. Learners, however, often struggle with the extraneous variation in any particular instance of a phoneme.

L1 models are often little help and even a hindrance in predicting L2 input. For example, grammatical constructions and syntax differ greatly between languages in how they construe events and relations. Once you have created a linguistic-conceptual space to represent your native language(s), restructuring that space to learn a new language is hard (Goldberg, 2019). To the extent learners lack accurate L2 models to predict language they benefit less from learning driven by prediction error. This means that it is difficult for L2 learners to notice the grammatical constructions native speakers use and those they avoid, or the words they choose to put together, which is why adult L2 learners benefit from explicit instruction (Norris & Ortega, 2000). It is also why, when learners speak or write, they often use what they can predict: the grammatical constructions and words that go together in their L1.

Second, as Friston and Stephan (2007) state, minimising free energy, or surprise, does not entail always avoiding new situations and seeking out familiar ones. Fortunately, brains evolved to help keep our bodies alive, which depends on maintaining homeostasis (keeping systems in their optimal state), so they also build representations of and “care about” the internal environment of the body and increasing its future well-being. Minimising surprise for our internal selves means maximising value or information. Our brain seeks out new situations that will maximise reward or value for our body at the lowest possible cost. It does this based on all the knowledge our brain has accumulated about our organism, our environments, and other possible environments, over our lifetime. To the extent that our brain correctly predicts the most valuable attainable situation to be in, surprise (entropy) is decreased and our brain reaffirms its own existence.

Therefore, understanding the brain as a “prediction machine” that minimises free energy provides an extra layer of depth to understand the role of motivation and emotions in foreign language learning. For example, Dörnyei (2000) emphasised that motivation is a dynamic interplay of choice, persistence, and effort. In the free energy theory of the brain, choice and effort map onto predicted value and cost, both of which are continually updated; persistence may relate to how well the changing value and cost predict the internal states of the organism. On the other hand, this understanding of the brain may also explain why, in a recent meta-analysis, Al-Hoorie (2018) found that the L2 motivational self system (which involves learners’ conception of their ideal future self as a language user and their “ought-to-be” self) correlated very weakly with L2 achievement. According to the free energy principle, the brain is constantly updating its predictions about which environments are high value or which states it would rather be in, based on prior experience. Imagined selves have limited potential to influence behaviour, whereas prior states determine our decisions, constrained by the (perceived) available options afforded by our environment. Significantly, it is not necessary for humans or animals to be consciously aware of their prior states and, even when they are, such conscious representations need not always be accurate. Barrett (2017) claims that we construct (predict) our emotions based on interoceptive sensations (such as feeling butterflies in your stomach), prior experience, and our current situation (see the May 2018 Think Tank on emotion). In fact, she even claims that we predict our interoceptive sensations in order to predict how to maintain homeostasis in view of the predicted needs of the body (Barrett & Simmons, 2015).

These theories can also be related to foreign language classroom anxiety (Horwitz et al., 1986), which has been found to be a significant factor in L2 attainment, possibly even more important than motivation (Teimouri et al., 2019). If students frequently experience negative emotions when studying a foreign language, their brains will learn to predict that foreign language classrooms are low-value environments that disrupt their bodies’ homeostasis and which they should act to avoid. Therefore, feelings of anxiety will actually minimise surprise or maximise their sense of self in such situations. If Shaules (2017) is correct, a further issue in the classroom could be resistance. He suggests some students are resistant to learning a foreign language because it challenges their lingua-culturally-derived self-concept. If the brain is constantly making predictions to minimise surprise or maximise value in order to realise itself, this suggests that teachers need to consistently and unambiguously create value for our students to enable them to form positive predictions about being themselves in our classrooms. Predictive processing theories provide more evidence to explain why it can be so difficult to teach and learn foreign languages, but they also reaffirm what to prioritise in this endeavour.


  • Al-Hoorie, A. H. (2018). The L2 motivational self system: A meta-analysis. Studies in Second Language Learning and Teaching, 8(4), 721-754.

  • Barrett, L. F. (2017). How emotions are made: The secret life of the brain. [Kindle version]. Downloaded from

  • Barrett, L., & Simmons, W. K. (2015). Interoceptive predictions in the brain. Nature Reviews Neuroscience, 16, 419–429.

  • Behrens, T. E. J., Muller, T. H., Whittington, J. C. R., Mark, S., Baram, A. B., Stachenfeld, K. L., & Kurth-Nelson, Z. (2018). What is a cognitive map? Organizing knowledge for flexible behavior. Neuron, 100(2), 490-509.

  • Brothers, T., Swaab, T. Y., & Traxler, M. J. (2015). Effects of prediction and contextual support on lexical processing: Prediction takes precedence. Cognition, 136, 135-49.

  • Buzsáki, G. (2019). The brain from inside out. [Kindle version]. Downloaded from

  • Clark A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204.

  • Connell, L. (2019). What have labels ever done for us? The linguistic shortcut in conceptual processing. Language, Cognition and Neuroscience, 34(10), 1308-1318.

  • Dörnyei, Z. (2000), Motivation in action: Towards a process‐oriented conceptualisation of student motivation. British Journal of Educational Psychology, 70, 519-538.

  • Friston, K. J. (2010) The free-energy principle: A unified brain theory? Nature Reviews Neuroscience 11(2), 127–138.

  • Friston, K. J., & Stephan, K. E. (2007). Free-energy and the brain. Synthese, 159, 417–458.

  • Goldberg, A. (2019). Explain me this: Creativity, competition, and the partial productivity of constructions. [Kindle version]. Downloaded from

  • Hoffmann, P. M. (2012). Life’s ratchet: How molecular machines extract order from chaos. [Kindle version]. Downloaded from

  • Horwitz, E. K., Horwitz, M. B., & Cope, J. (1986). Foreign language classroom anxiety. The Modern Language Journal, 70, 125-132.

  • Kuperberg, G. R., & Jaeger, T. F. (2016). What do we mean by prediction in language comprehension? Language, Cognition and Neuroscience, 31(1), 32-59.

  • Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quantitative meta‐analysis. Language Learning, 50, 417-528.

  • Shaules, J. (2017). Linguaculture resistance: An intercultural adjustment perspective on negative learner attitudes in Japan. Juntendo Journal of Global Studies, 2, 66-78.

  • Tolman, E. C. (1948). Cognitive maps in rats and men. Psychological Review, 55(4), 189–208.

  • Teimouri, Y., Goetze, J., & Plonsky, L. (2019). Second language anxiety and achievement: A meta-analysis. Studies in Second Language Acquisition, 41(2), 363-387. doi:10.1017/S0272263118000311

  • Weber, A., & Scharenborg, O. (2012), Models of spoken‐word recognition. WIREs Cognitive Science, 3(3), 387-401.

Caroline Handley, the BRAIN SIG Coordinator, is an English lecturer at Seikei University. She is currently pursuing a PhD in Applied Linguistics at Swansea University, where she is researching the relation between conceptual and linguistic knowledge in lexical processing, using an embodied cognition perspective.

Leave a Reply

Your email address will not be published. Required fields are marked *