An Evolutionary Perspective on Predictive Processing

By: Caroline Handley

Natural selection informs organisms by an inductive gamble, which amounts to a bet that the present and future environments of contemporary individual organisms will be similar to the past environments of their ancestors. (Odling-Smee, 2024, p. 21).

The opening observation of the lead-in video for the September 2025 Think Tank issue on predictive processing might initially seem alarming: “Your brain constructs your reality as you’re experiencing it”. Does this imply the denial of a real world ‘out there’? Does it mean ‘you’ are not in control? In short, no. And partly this is because despite all its impressive capabilities, your brain, just like the rest of you, is made of living cells. For me, perhaps the most important aspect of predictive processing theories is that they can be applied to all living creatures, not just humans, and in this way explain a universality of life across evolutionary history.

What do I mean by this? Well, all living cells clearly need to persist: It is only those cells that succeed at persisting that continue to be living cells (and replicate / reproduce). For many years, it has been argued that this entails that all cells, organs, and larger organisms must be a model of their environment (Conant & Ashby, 1970). And, in being a model of their environment, prediction is built in (Ball, 2024; Friston, 2010; Mitchell, 2023).

As authors such as Ball and Mitchell have argued, no cell or organism needs to model every aspect of their environment, only those aspects that are relevant for it to continue existing. Moreover, persistence depends on the implicit models embodied in their structure resonating with their environment in a way that promotes existing (yes, this is circular causality, but as Howhy (2013) and Mitchell (2023) note, we should reword this as spiral causality, to emphasize that it happens over time). Just as cells must be tightly coupled with the larger tissues and organs they are part of, and these with the organism as a whole, organisms must be tightly coupled with their environmental niche (Thompson, 2007; Odling-Smee, 2024).

In this sense, life can be seen as “cognition all the way down”. On an evolutionary level, the brain’s ability to subconsciously predict reality could be viewed as an extension of the necessity for every living cell to embody predictions that enable them to better control their external environment in ways that promote the maintenance of their internal environment (Ball, 2023; Sterling, 2012).

When evolution happened across mutations in cells that led them to develop into neurons, these turned out to be a really useful innovation, enabling organisms with such cells to integrate actions within their niche with sensed information (in the form of molecules) about it. Organisms became much better models of their environment. Building in an extra layer of neurons between the acting and sensing neurons enabled learning at a developmental level (i.e., within the organism’s lifetime) not just an evolutionary one (Mitchell, 2023).

Simple cells may implicitly embody predictions, but complex brains can construct internal models of the world and update those models when they don’t fit with their organism’s experiences. Importantly, such learning can occur without supervision: Organisms can become better models of their environment through trial-and-error processes, which inherently minimize prediction error. In other words, whether or not they are conscious creatures, many non-human animals have the ability to learn about the external world and predict how to respond as optimally as possible to it (Gunawardena, 2022).

This, for me is the key point to understand about predictive processing: Whether or not our hypotheses about how the brain implements predictive processing are correct (as recently discussed in Hodson et al., 2024), the ability to predict or implicitly model our environment–in a way that enables us to better control it–is vital to all life.

It is also worth noting that even if the details of predictive processing are challenged by future evidence, the overarching idea that inference is a critical brain function is much less likely to disappear. This is because the advantage of neurons in complex brains is that they are not limited to predicting their immediate environment, as happens at the cellular level, they can translate (or transduce) information carried in differences in light and sound waves emanating from distant entities. But this specialization comes with a catch: The ability to coordinate perception and action across distance (and therefore time) entails being physically separated from both sources and targets (muscles)—the brain cannot directly detect its outside world (including the organism it is part of) but must instead infer it.

But, returning to the lead-in video, after explaining the basic idea of predictive processing, it ends on the message that “consciousness narrates your life story”, suggesting this is a different type of prediction (Howhy, 2013, argues against this view). In a similar vein, many scientists think that the main purpose of episodic memory (memory of specific events) is to enable us to plan for the future (e.g., Schacter, 2012). In humans, who have the ability to reflect on their past experiences, this level extends to metacognitive processes (why did I do badly on that exam and what can I do to do better next time?). So, at least in humans, subconscious predictive processing interacts with conscious predictions and decision-making (Mitchell, 2023). And I think that maybe this is what is most relevant for foreign language teachers.

First, in general, consciousness, along with an enlarged association cortex that enables abstraction (in time and space), a social brain (Lieberman, 2010), and a highly developed theory of mind, gave rise to sophisticated inter-individual communication: language (Heyes, 2018; Tomasello, 2008). And language allows us to learn directly from others without resorting to trial-and-error predictive processes. Humans don’t always need to infer or predict their perceptions and actions, as other humans are willing to tell them what things are and what to do with them (Csibra & Gergely, 2011). This seems to me to link directly to cognitive load theory (Sweller & Chandler, 1994).

Proponents of cognitive load theory (e.g., Sweller et al., 2019) argue that due to novice learners’ limited working memory capacity to handle incoming information (which to them is novel and cannot be retrieved from long-term memory or top-down predictive processing), they learn best when supported by explicit instruction. Moreover, the foundational observation of this theory is the distinction between primary and secondary knowledge—knowledge that people are somehow predisposed to learn through exposure (no or little explicit instruction needed) and knowledge that people can learn from others, if instructed. An obvious language-related example is the difference between infants learning to speak (or sign) a language and children learning to read and write that language. More relevantly, many people would argue that this distinction also holds for learning a first language(s) by being immersed in that language from a young age and learning a second (or third) language later in life, especially when exposure is more-or-less limited to a few hours a week in a classroom context.

Now, I would argue that predictive processing can help explain these different ways of learning languages. Long-term bodily and environmental regularities can be implicitly modelled or embodied in neural structures through evolutionary processes (reflex actions and circadian rhythms being obvious examples). As our brains learn about the world (both internal, the type of body it is in and its capabilities, and external, the type of world it is in and its affordances for action by that body), it is guided by the predictions implicitly embodied in its evolutionarily–and developmentally–produced networks (Ball, 2023; Mitchell, 2023). This may also apply to language (e.g., Deacon, 1997): If our brain follows a so-called “normal” developmental trajectory, networks will develop that support languaging, provided the brain is exposed to language(s) from an early age.

In addition, an important regularity for many mammals is childcare. This predictable regularity means that infants can be dependent on others (typically the mother) as their brain and body develop after birth. This, in turn, enables remarkable brain plasticity: the ability to be changed by experiences at an individual level, or to show flexible behaviour. Baby humans can exploit the luxury of being able to learn how to sense and act on the world effectively according to the exact world they are born into, which for humans includes the language(s) they are born into. The plasticity of neural networks means that humans can learn flexibly–for example, adapting to the sounds of the language(s) they are exposed to–but there still seems to be an innate prediction that the human brain will be exposed to a language that it will benefit the human to learn. Within predictive processing, a clear example of this is the early finding that even 8-month-old infants can very quickly learn statistical patterns in language sounds (Saffran et al., 1996).

In addition, due to the plasticity of brains, particularly the human brain, networks can be repurposed to learn behaviours that could not be predicted on an evolutionary time scale (Anderson, 2014), such as how to read (Dehaene, 2009). With enough repetition (practice) such behaviours (or at least, many of their subcomponents) can become automatic. We no longer need to sound out each letter as we read, but can read at the level of whole words and even statistically predictable multi-word expressions. But because the brain doesn’t have an implicit model of such behaviours hardwired into its networks, learning them requires (or is vastly improved in terms of speed and accuracy) by explicit instruction (Rastle et al., 2021).

Teachers don’t need to do a deep dive into predictive processing theories (although it is fun!), but understanding the predictive nature of life can, I believe, help us make better decisions about how to use second language acquisition theories to inform our practices (Requena-Ramos, 2025). In particular, we can reframe the decision to adopt a plurality of approaches not as being undecided or chaotic but as maximising predictive success.

To show what I mean by this, let’s return to the so-called reading wars, a good example of how our beliefs (past-based predictions) shape our actions. The belief that learning to speak and read a language are both naturalistic processes that only require exposure and motivation can provide justification for not teaching phonics (Castles et al., 2018). As Castles and others argue, the way to end the reading wars is to understand that multiple factors are involved in learning to read. Novices need explicit instruction, followed by controlled and student-led practice through which they can apply the new knowledge and automatize the subskills involved and so develop towards expertise.

This is equally true of language learning: explicit instruction works (Norris & Ortega, 2000) but that does not mean it replaces the need for practice. Given that within the field of SLA we still haven’t reached a consensus on what we should be doing (Requena-Ramos, 2025), and maybe this is impossible, the best solution might be to make sure we include what we predict will work in our context, but also what our students predict will work for them, in view of their sociocultural backgrounds and individual differences. We should also remember that some of these predictions are subconscious and may be so entrenched that they resist updating based on prediction error, but others can be learned from others and acted upon consciously. Ultimately, it is the unique ability of humans to reflect on our predictions and how they shape us and our interactions with others that makes us such incredible learners.

References

Anderson, M. L. (2014). After phrenology: Neural reuse and the interactive brain. MIT Press. https://doi.org/10.7551/mitpress/10111.001.0001
Ball, P. (2023). How life works: A user’s guide to the new biology. University of Chicago Press.
Castles, A., Rastle, K., & Nation, K. (2018). Ending the reading wars: Reading acquisition from novice to expert. Psychological Science in the Public Interest, 19, 5–51. https://doi.org/10.1177/1529100618772271
Conant, R. C., & Ross Ashby, W. (1970). Every good regulator of a system must be a model of that system. International Journal of Systems Science, 1(2), 89-97. https://doi.org/10.1080/00207727008920220
Csibra, G., & Gergely, G. (2011). Natural pedagogy as evolutionary adaptation. Philosophical Transactions of the Royal Society B: Biological Sciences, 366 (1567), 1149–1157.
Deacon, T. (1997). The symbolic species: The coevolution of language and the brain. W. W. Norton & Company.
Dehaene, S. (2009). Reading in the brain: The science and evolution of a human invention. Viking.
Friston, K. J. (2010). The free-energy principle: A uniﬁed brain theory? Nature Reviews Neuroscience 11(2), 127–138. https://doi.org/10.1038/nrn2787
Gunawardena, J. (2022). Learning outside the brain: Integrating cognitive science and systems biology. Proceedings of the IEEE, 110(5), 590–612.
Heyes, C. (2018). Cognitive gadgets: The cultural evolution of thinking. Harvard University Press.
Hodson, R., Mehta, M., & Smith, R. (2024). The empirical status of predictive coding and active inference. Neuroscience & Biobehavioral Reviews, 157, Article 105473. https://doi.org/10.1016/j.neubiorev.2023.105473
Hohwy, J. (2013). The predictive mind. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199682737.001.0001
Lieberman, M. D. (2010). Social: Why our brains are wired to connect. Oxford University Press.
Mitchell, K. J. (2023). Free agents: How evolution gave us free will. Princeton University Press.
Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quantitative meta-analysis. Language Learning, 50, 417–528. https://doi.org/10.1111/0023-8333.00136
Odling-Smee, J. (2024). Niche construction: How life contributes to its own evolution. MIT Press. https://doi.org/10.7551/mitpress/13942.001.0001
Rastle, K., Lally, C., Davis, M. H., & Taylor, J. S. H. (2021). The dramatic impact of explicit instruction on learning to read in a new writing system. Psychological Science, 32(4), 471–484. https://doi.org/10.1177/0956797620968790
Requena-Ramos, C. (2025). Toward consensus building in second language acquisition: A sociophilosophical perspective. Applied Linguistics Press. https://www.appliedlinguisticspress.org/home/catalog/requena-ramos_2025
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274(5294), 1926–1928. https://doi.org/10.1126/science.274.5294.1926
Schacter, D. L. (2012). Adaptive constructive processes and the future of memory. American Psychologist, 67(8), 603–613. https://doi.org/10.1037/a0029869
Sterling, P. (2012). Allostasis: A model of predictive regulation. Physiology & Behavior, 106(1), 5–15. https://doi.org/10.1016/j.physbeh.2011.06.004
Sweller, J., & Chandler, P. (1994). Why some material is difficult to learn. Cognition and Instruction, 12(3), 185–233. https://doi.org/10.1207/s1532690xci1203_1
Sweller, J., van Merriënboer, J. J. G., & Paas, F. (2019). Cognitive architecture and instructional design: 20 years later. Educational Psychology Review, 31(2), 261–292. https://doi.org/10.1007/s10648-019-09465-5
Thompson, E. (2007). Mind in life: Biology, phenomenology, and the sciences of mind. Harvard University Press.
Tomasello, M. (2008). Origins of human communication. Jean-Nicod lectures. Massachusetts Institute of Technology.

Caroline Handley (PhD.) has taught English for over 15 years and currently teaches English for Academic Purposes at Wenzhou-Kean University, China. Her research interests include vocabulary and embodied theories of cognition and language.

An Evolutionary Perspective on Predictive Processing

References

Leave a Reply Cancel reply