Grammar as a Predictive Tool: Chains that Make Communication Possible

By: Stephen M. Ryan

Grammar. Not really something to gladden the heart of any language teacher or student. Subject of some of my worst Friday-afternoon lessons ever. Deceptively simple in its details. Maddeningly frustrating in its application. Mind-numbingly dull to explain. What has it ever done for us, anyway, apart from giving us a way to discriminate, on tests, between students who are strong analytical thinkers and those who don’t quite “get it?” Well, quite a lot it turns out, including a way to stop our brains from overheating and the species-defining gift that is communication itself.

Back in the Dark Ages, almost before we realized that languages could be taught communicatively, a favourite simile for grammar was Chain and Choice. I have struggled to find anyone trained as a teacher after the mid-80s who even remembers the name, but now Chain and Choice is back, back with a new name and far wider implications, as part of the radical theory of cognitive processing known as Predictive Processing.

Chain and Choice was an attempt to deal with the reality that language emerges, whether from mouth or pen (oops, word processor), as a linear chain of words following each other, but is in fact a product of three-dimensional mental processes (think: the branching trees of Chomskyan grammar). C&C has the budding linguist imagine the flow of words as a chain. Each of the links in the chain is a slot into which a word can be, well, slotted. Not just any word, though. The choice (keyword) of word is limited by its position on the chain and, crucially, by choices made about words in previous slots.

So, when my previous sentence started with the word “The” it determined the menu from which I could choose the next word in the chain. It could be a noun or an adjective. An adverb is possible (“The rapidly rising tide of disapproval”) but a verb or preposition is pretty much out of the question. The choice of a word for each slot is constricted by the word or words that come before it. Of course, this applies not only to words, but to the sounds they are made of, if we are speaking, or to the spelling: some sounds / letters can follow each other and others cannot (“Gngh” is not a combination of letters known to English outside the world of comic books). But it is grammar, or more specifically syntax, that concerns us here.

Words that can appear in a sentence slot carry with them a certain probability that they will appear in that slot and the probability is governed by the words that appeared in previous slots. I would guess that the probability of a sentence-initial “The” being followed by a noun is somewhere around 50%, whereas the chance of finding an adverb in that slot are considerably smaller. The fact that I have to guess at these percentages, or consult a corpus, indicates that the rather precise knowledge I have about the relative probabilities of slot-fillers is largely unconscious.

This mental process, making predictions about what comes next, is at the heart of the theory of Predictive Processing (see our October 2020 Think Tank for much more on this). The theory gives us a model of how the brain deals with the world: it makes predictions about what comes next, uses sensory signals to check out how accurate its predictions are, and, if there is a gap between prediction and reality, uses the new information to modify its expectations for next time. This theory takes Chain and Choice and makes it the central process of cognition.

Our brains have some interesting skills. Sensitivity to novelty. Pattern detection. Learning from the gap between our internal model of the world and sensory information about the external world itself. These are all facets of the brain we have known about for some time. What Predictive Processing does, though, is to tie them altogether in a coherent view of how we learn to survive and thrive in a world that provides an overwhelming amount of sensory information (Clark, 2016).

Why, though, is prediction so central to language processing? Two reasons are immediately obvious: first, it saves energy; and, second, it makes communication possible.

The brain is the most energy-hungry organ we have, hogging around 20% of the body’s energy supply. Yet, to deal with all the sensory information we get from the world (estimated at around 11 million bits at any given moment), this amount of energy is far from sufficient. Yet it is only a small fraction of what we would need to interact with the environment, if the answer to the question “What happens next?” were truly random. So, as we have seen, predictability makes the world manageable for the brain. Rather than processing every possibility, we just process the ones we predict. In engineering terms, this increases the signal-to-noise ratio: prediction primes us to be attentive to what is likely to be meaningful for us, in a tsunami of information about the world sweeping over us at each moment.

If our predictions about what happens next are right even part of the time, we save ourselves the enormous amount of energy needed to prepare ourselves for literally any possible event at any given moment. What if a cow were to fall out of the sky and land on my keyboard in the next few seconds? I would need to start pumping adrenaline now to prepare myself for that eventuality. But no, my predictive circuits tell me this is an extremely low probability event, so no need to expend much energy preparing for it. What if the next sentence you read had an adjective followed by a verb? All bets would be off: the author’s choice of words is no longer limited by the chain, so you, the reader, need to be ready, in the mental picture you are building of the meaning conveyed by the sentence, for literally any possibility. Thanks to grammar for limiting the otherwise infinite possibilities of word choice and meaning making. This is true as much for the sentence producer as it is for you, the sentence receiver.

Secondly, the point of a linguistic code is, quite simply, to communicate, and without a shared understanding of the code there can be no understanding. So, when we “acquire” a language, what we actually acquire is an ever more nuanced understanding of how words in that language can be combined with each other to convey meaning, an ever better set of probabilistic predictions of what the next word might be. This is what makes communication possible. I can invent my own set of grammatical rules but, until you figure them out and can predict what I am going to put in the next slot each time, our communication will be pretty limited. Sharing a grammatical code reduces the noise in the signal to the point where communication becomes possible.^[1]

^[1]Predictive Processing theorists (notably Friston) extend the signal/noise metaphor into theoretical physics and talk in terms of free energy and entropy.

An Aside about Poetry

What poetry does that makes it so satisfying to read (or so disconcerting when the poet subverts the conventions) is to add further constraints to the words we can use to fill a particular slot. This choice is constrained not only by the usual limits of grammar and semantics (after all, it’s supposed to make sense on some level), but also by the expectations of metre and/or rhyme. These extra constraints serve to further limit the possibilities for choosing a word: “There was a young man from Peru, who decided to build a .” (click here.)

Seriously, no prizes for guessing that one, even if it is the first time you’ve heard it. Kalamazoo is too long. “Long and lasting relationship” does not rhyme. Igloo would need “an,” and kazoos are made, not built. Better predictions = greater sense of satisfaction when they are proved right (actually, dopamine—the reward both for making a prediction and for getting it right).

What does this mean for our students? It means that their proficiency (whether for production or reception) in the language they are studying is a direct consequence of the extent to which their probabilistic predictions about the language they are using (for short, we can call this their “internal grammar”) match those of the people they are communicating with. Better predictions = greater proficiency. And, hey presto, our job as language teachers is to help students make better and better predictions.

Meanwhile, back in the classroom, it should be clear from all this why simply knowing a grammar rule does not make a student’s English better. They need to experience the language, to build a set of expectations about how it is likely to be used (“priors” in the jargon of Predictive Processing). How do they build these priors? By exposure to as much grammatically acceptable, meaningful language use as possible. Extensive reading, anyone? Extensive listening? Oh yes, and sleep, too. Sleep that allows the brain to process the events of the day, the word patterns encountered, predictions fulfilled, and, especially, unfulfilled, to distil these multiple experiences into a set of improved priors.

Remember, most of this is unconscious. It is the “native speaker intuition” that has enabled many of us to make a decent living over the years. It is the intuitive understanding that a word-choice is wrong, and why we tell our students they had better ask a non-native speaker to explain a grammar point. Whether explicitly taught grammar has a role in acquiring “native-like” intuition is a discussion I will leave for the experts and the polemicists. What we do know is that our brain is constantly using sensory information from the environment to modify its expectations of the world. We, teachers, can mold and modify the environment, but without repeated exposure to the set of probabilities we call grammar, students have little hope of developing useful predictions about their new language. It is all about patterns of words in context. This means we should not teach words or grammar rules in isolation. It means maximizing students’ exposure to contextualized examples. Their pattern-recognition and processing systems can then take full advantage of the information available to improve the students’ predictions about the language.

This is not really new. I remember visiting my ophthalmologist and being handed a card of paragraphs to read, the font of each paragraph smaller than the paragraph before it (see the illustration). She intended to test my eye-sight, but a paragraph consisting of highly predictable word-choices would have measured mainly my familiarity with the word choices, since I would have had a fair chance of guessing (predicting) anything my eyes were too weak to see. When I examined the card, though, I found that every word was followed by one that was grammatically possible but of very low probability (you may want to look at the illustration again now). Clearly, ophthalmologists are well acquainted with the predictive processes at the heart of language processing.

What is new, though, is how this ophthalmological wisdom, fading jargon once used to explain syntax (Chain and Choice), and a whole host of other things combine to verify a powerful new theory in neuroscience, predictive processing, a theory which gives us a consistent understanding of not only how the brain deals with language but of how it deals with (just about?) everything. It’s good to be able to see this coming.

Stephen M. Ryan teaches at Sanyo Gakuen University, in Okayama, when he is not travelling in search of new puzzles, poems, and priors.

Grammar as a Predictive Tool: Chains that Make Communication Possible

An Aside about Poetry

Leave a Reply Cancel reply