The 3 Ks Explore Listening and the Brain

The 3 Ks Explore Listening and the Brain

By: Heather Kretschmer, Curtis Kelly, and Mohammad Khari

Editors’ note: If you are going to teach listening, then we figure you ought to know how the brain does it. But it is hard to find language teacher writers versed in that area, so three of our editors (whose last names all start with a K) decided to dig deep and write a short summary of it.

This article has three parts. 1) We will start with the basic mechanics of sensory processing. 2) Then we will go on to how the brain uses prediction to lighten the processing load. And finally, 3) we will look at embodied simulation.

1) From ears to brain

Let’s imagine something for a moment. Picture yourself grading some tests at home next to an open window when two neighbors suddenly strike up a quiet conversation right outside your window. At first you are so engrossed by grading that the conversation hardly intrudes on your thoughts. But then you hear your name, which motivates you to focus on their conversation to figure out what they’re saying. As you strain your ears to understand, you catch part of what they’re saying and fill in the blanks for the rest. How do your ears and brain work together to make sense of the conversation?

Complex processes take place for you to hear and understand the conversation outside your window. Let’s begin with how the sound waves from your neighbors’ voices move through your ears to different areas of your brain.

Sound waves travel through the outer and middle ear to the snail-shaped cochlea in the inner ear:

The cochlea is the place where mechanical sound waves are converted into neural signals that the brain processes. Filled with fluid, the cochlea has thousands of hair cells. Each hair cell is linked to the auditory nerve, which is a line of nerve cells connecting the cochlea to the brain. As the sound waves reach the cochlea, they set off ripples in the fluid. These ripples move the top part of the hair cells, called stereocilia. When the stereocilia move far enough, the hair cell generates a neural impulse or signal. This neural impulse stimulates a neighboring nerve cell and sets off a chain reaction through the auditory nerve to different parts of the brain involved in receiving and decoding sounds (The Inner Ear; Journey of Sound to the Brain; How The Ear Works; How Do We Hear?).[1]

As you’re zeroing in on the conversation outside your window, your neighbors’ voices are not the only sounds in your environment. You also hear cars passing by, birds singing, the distant rumble of thunder, and so forth. How does your brain focus your attention on the conversation? Let’s dive a little deeper into what’s going on in your brain.

[1] To see a good depiction of this process, we recommend watching Journey of Sound to the Brain.

Auditory pathways & speech comprehension

When hair cells generate neural signals, these signals are sent to either a primary auditory pathway or a non-primary auditory pathway in the brain. While primary auditory pathways only transmit neural signals from the cochlea, the non-primary auditory pathways send all kinds of sensory signals (Pujol, 2020). This diagram shows the primary auditory pathway:

Source: Auditory Pathway by Jonathan E. Peelle in Wikimedia Commons (used under license)

The primary auditory pathway involves a series of relays through which auditory neural signals are passed from the cochlea to the auditory cortex. During this journey, various brain structures interpret these neural signals into frequency, intensity, duration, and location. By the time the neural signals reach the auditory cortex, they have largely been interpreted (Pujol, 2020).[2] The auditory cortex is where we consciously perceive sound. Interestingly, neural signals don’t only move from the cochlea to the auditory cortex but also from the auditory cortex to the cochlea (Peterson et al., 2021). Why? The brain can ask the cochlea to prioritize certain sound information, allowing it, for example, to focus primarily on your neighbors’ conversation outside your window instead of other sounds like birds singing (Pujol, 2016).

As mentioned earlier, auditory neural signals are also transmitted with other sensory neural signals (e.g., visual neural signals) via non-primary auditory pathways in the reticular formation. The reticular formation is located in the brainstem and connects to many other areas of the brain. One important responsibility of the reticular formation is to choose which type of sensory signal the brain should pay attention to first (Pujol, 2020).

So, in our example of the neighbors chatting outside your window, your reticular formation helps you focus on the auditory neural signals occasioned by your neighbors’ conversation instead of on visual neural signals from the tests you’re looking at. What we each hear and choose to focus on is heavily influenced by what each of our brains considers important. When we’re teaching our students to listen to a foreign language, we need to keep in mind that they may not know what’s important to focus on in the foreign language. They may, for example, feel they need to understand every syllable to be able to understand what someone is saying. This isn’t true, and we need to give them appropriate guidance to help them focus on what is crucial to understanding.

Many areas of the brain are involved with understanding speech. Two processes occur. One is the phoneme-perception process, in which the auditory neural signals are analyzed for meaningful speech sounds. The second process involves retrieving the semantic meaning associated with the auditory neural signals (Binder, 2015). Both processes take place very quickly as the brain recognizes it is hearing speech almost immediately after the sound passes into the ears (University of Maryland, 2018). But remember how you didn’t catch everything your neighbors said but still managed to figure out their meaning? How is that possible?

[2] This video, Ascending Auditory Pathway, explains succinctly how neural signals travel from the cochlea to the auditory cortex.

2) Listening as predicting

Up until now, we’ve mostly looked at how sound travels to your brain and is perceived. But registering sounds is just a small part of the process and, as we shall explain, it is less of a story of outside to inside than inside to outside. Why is that? Because the brain predicts. And once it has done so, it pretty much disregards anything coming in that fits that prediction. This is a radical change from the way most people usually think of how the brain processes sensory input. It’s a hard concept. So, to help you see how this happens, let’s go back to our starting story:

You pack up the tests you were grading and you’re heading out the door for work. You are going through a mental checklist to make sure you didn’t forget anything: Wallet? Train pass? Got it. Homework papers?… You pull the door open and freeze. What the… Someone is standing right in front of you!

You jump. You stare. For a whole two seconds your brain goes into a hyperdrive whirl trying to figure out what is happening. Is this the right place? Am I in danger? Did I forget a promise? Then your friend says “Hi. I just came by to see if you wanted a ride.” In a low voice, you say back, “Oh… I did not expect that.”


We love to go on about how wonderful the brain is, able to do more than the greatest computer. But we forget that all this is being done by a little 3-pound guy, only a couple percent of our body mass, and we are asking too much of it.[3] Even though we keep it running 24 hours a day, it is overloaded. Way overloaded. And it’s a wonder it does not give up entirely.

Think about it. The human brain does not just sense the world, it fills it in. You know that clock on the wall is not part of the wall itself because your brain is constantly organizing everything you sense into things via mental models. A particular sound pattern becomes a word. A particular light pattern becomes a cat. What we call direction or frequency is a matter of which sensory neurons are being stimulated. Direction helps us identify the parts that go together, but what we do with frequency is utterly amazing. We make sounds and tones out of air vibrations, and we make colors out of light frequencies. Colors and sounds do not exist in the world any more than words do; our brains make them. We paint the world and make it a symphony. Doing so helps us identify important differences. Knowing whether a strawberry is red or brown, colors that look almost the same in monochrome, might be the thing that saves your life.

This idea, that the brain paints the world is a bit jarring, but if you think about it, the only thing nature gives us is frequencies. We do the rest. We do this for air vibration and light frequencies because they are particularly important, which is why most of us don’t do that for other frequencies we encounter, such as heat, air pressure, or the strength of a touch.[4]

So that little guy is overloaded and having learned how to use mental models, and reorganize frequencies into sounds and colors, are just two of the ways he has learned to cope. But there is one more trick he has, one not even that widely known among brain experts, that is even more amazing. He predicts.

Most people still think the brain processes the world, but in the last ten or so years, a completely different view has emerged in which we see the brain as predicting the world, and by doing so, saving itself from a huge amount of work.

It goes like this. If the brain operated by processing every sensory input, thousands per minute, by matching them up to internal models, we’d be frozen most of the time, just like when we opened the door and saw that good-looking person standing there.[5] The man at the door was totally unexpected, so the brain had to figure out from scratch what was happening. And so you jumped, froze, and could not respond right away.

[3] Kelly has made the brain male because he says his, at least, has a hard time listening, always thinks it is right, and forgets birthdays.

[4] But some people do. Synesthesia is a condition where stimulation in one sensory area leads to involuntary sensations in others. Some synesthetes see colors when they hear sounds. Others see numbers in different colors, genders, or spatial locations. Almost every sense has examples of synesthesia including touch, taste, and movement. (source)

[5] “Wait a minute!” you might say. “You told us in the first part that we only focus on what is important for us, so we don’t process every input, do we?” Yes and no. We have to collect the sensory data before we evaluate whether it is worth focusing on, so selective attention and processing of meaning are still way downstream.

Now, imagine if you were like that every second of every day. That is what life would be like if our brains did not develop this little trick of predictive processing that relieves it of all that work. It predicts. Not once in a while. Not just about what will happen later in the day. But rather, it is constantly predicting what is happening now and what to expect in the next second. It does this by taking all your prior experiences of the world, which it has crafted into simple cartoon-like models, and laying them down on the current situation you are in. It predicts the now by using what’s happened before.

If you walk into a library, your brain predicts it’ll be cool, quiet, and filled with books. If you walk across a street, it predicts you’ll meet pedestrians, bicycle riders, and cars stopped or slowing. That makes it easy-peasy for that little guy. As long as everything fits what was predicted, the processing is done. It is only the errors in those predictions that demand our attention: a big dog by the bookshelf, or the car speeding up instead of slowing down. (By the way, if you noticed the misspelled “Listening” in the article header, your prediction error system is doing well.) One offshoot of predicting is that far more neural firings go from the brain to the sensory areas than the other way around. The brain is telling our sensory areas what it thinks will already be there.

So, what does this have to do with listening? Quite a bit, actually. Our brains have developed models of sound patterns that represent words, phrases, and other kinds of utterances along the same line as the visual mental models we have for clocks and walls. But more. Those are just static objects. More important are processes. Does that dirty look on that dog mean it will attack? Will that car be able to stop before it hits someone?

And so it goes with language. Our brains have developed these wonderful predictive models for processes to reduce the otherwise massive load of language processing. That is why we have grammar. It is basically a tool we use to predict what words we are hearing now (or reading, writing, etc.), and what words we will encounter next. If someone utters the pronoun “He,” we are already primed to hear a verb come next. If the verb is “went,” we expect “to” and a place, as being the most probable items coming next, but we also have “went crazy” or “went ballistic” activated as less likely possibilities. We do not at all expect the next word to be “receipt” or the tens of thousands of other words that do not fit that grammar or situation. So, grammar helps us understand what your neighbor is going to say even before the words come out of her…

It also helps us understand what we are hearing at the moment. We only hear a fraction of the words someone utters, but this wonderful predictive tool, grammar, along with our models of what is likely to happen in any situation, help us fill in the blanks.

Now, I wonder if you see where we are going with this? “Fill in the blanks” was a hint. Once you realize that we are engaged in predictive language processing all the time, including when listening, then it becomes obvious that some of the ways we teach listening are probably good and some probably not.

Think about students at the late beginner or intermediate level. If we may, we deem the method of having students listen to a text and filling in blanks with a few carefully chosen words as good. We deem the method of having them do a complete dictation, stressing that they must hear every word, as bad, or at least not representative of how people really listen. Even native speakers do not hear every word.

So, the next time you teach listening, use language that is fairly predictable with lots of situational cues, such as supporting pictures. Make sure the language is meaningful and in context.

Go easy on that little guy.

3) Embodied simulation

We talked about how the brain uses mental models to interpret what it is sensing, but where do those models come from? A database? Or, for language, a neural dictionary? Databases and dictionaries come from computers and books, ways we store information in the outside world, but what happens inside, is far, far, more interesting. For starters, think “web.”

A large part of our brain exists in cortices, huge areas in the neocortex to manage vision, touch, hearing, smell, and motor actions: the senses and movement. In each of these areas, firing routines develop to represent a thing or action. A group of neurons might fire together to represent a piano note. Another group might fire to move a finger (or interpret someone else’s moving of a finger). Or a group might fire to recognize a cat’s ear. These groups of neurons have fired together before, and as a result have wired together, thus making the model. Their association came from having a succession of similar experiences in the world, that allowed them to define the model based on what was similar in each experience.

These neurons are not delegated to making just one model; most of the neurons that fire to represent a cat’s ear will also fire to represent a dog’s ear, with some for fur, and some for triangles. And the cat’s ear circuit will be a part of the model for cats, Garfield, and litter boxes. And more: the word “cat.” Likewise, the ear model will be connected to other cat things, like the motor routines for running, curling up, licking. That is why we asked you to think web. You build models that reuse the same neurons again and again in different combinations, that spread out into related sensory and motor memories.

As we have learned from Bergen and Feldman Barrett, we use those models, in simulations, to process language. If you hear “The cat walked across my stomach,” your brain fires up all the visual, auditory, somatosensory, and motor routines to simulate that action (to some degree as if you were doing it yourself). Your brain creates simulations like these for every bit of language you hear, and all based on your own experiences in the world. Indeed, we are truly constructivist.

What this means for teaching listening, then, is to remember that every person has their own unique set of mental models based on their unique experiences over time. The “cat’s ear” a Thai simulates might be quite different than what a textbook author in England simulates. And so it goes for “family,” “excited,” “ferry” and everything else. In addition, since the embodied simulation of words involves huge parts of our sensory-motor apparatus, pictures or other multisensory (and motor) input can be particularly effective scaffolding.

Finally, this is just a short summary of embodied simulation, but we promise to have more on it in a future issue. In the meantime, take a look at something we wrote on this topic a couple of years ago, that also discusses the phonological loop.

And that, folks, is how the brain does listening. We hope these ideas will enhance your understanding of what is happening in class.



K1 Heather Kretschmer has been teaching English for over 20 years. She’s enjoying learning about the brain.

K2 Curtis Kelly (EdD.) is a predictive processing nut, framing it as the grand unifying theory of the brain (click me). He is now hard at work predicting the big change he is about to make.

K3 Mohammad Khari is an English lecturer at Ozyegin University, Istanbul. He holds a BA in English Literature, an MA in Philosophy of Art, and a CELTA. Mohammad has been reading and researching on the integration of neuroscience into pedagogy.

Leave a Reply

Your email address will not be published. Required fields are marked *