The Listening-Pronunciation Connection: Four Linked Practices for Improving “Pronunciation Flow”

By: Michael Rost

Listening and pronunciation are closely linked—both in how we acquire the two skills and how they depend on each other neurologically. When we articulate sounds, our motor cortex gets direct messaging from our auditory cortex about the target sounds—informing us how to contort the muscles of our face, mouth, tongue, lips, and throat to produce speech. The way that we perceive sounds creates a template for how we will pronounce them. Simply put, we cannot consciously articulate what we can’t also perceive.

This short article looks at some practical interconnections between listening and pronunciation to help students develop a “flow” of pronunciation. These exercises aid pronunciation flow by drawing awareness to “discourse phonology” rather than word phonology: in particular, contrastive stress, articulation of key words, timing of pauses, and use of weak forms (assimilations and reductions). By helping students understand the larger system of English phonology, rather than focusing on enunciation of individual sounds, we can escape from the myopic trap of emphasizing correct diction and teach pronunciation in a more holistic way.

Background: How Do Listening and Pronunciation Development Overlap?

I have been interested in listening and pronunciation for as long as I can remember. Growing up in a large extended family in Cincinnati, I felt like I seldom had a chance to speak, but I had abundant opportunities to listen. As a budding mimicry artist, I also found it fascinating to pay close attention to people’s speaking styles and idiosyncrasies. Our peculiar Ohio Valley parlance, which is a kind of blend between the lilting Louisville dialect and a low-keyed, down-home Appalachian twang, ladled over standard Midland American English, made this hobby particularly amusing. 

One thing I learned early on in my observations is that speakers habitually take “short cuts” to express what they want to say. I noticed people tended to use simplifications (like Chubinupto? for What have you been up to?) and abbreviated codes (like Nadaklu for I don’t have a clue) when talking to people in their inner circle. But somehow, despite the cryptic nature of the exchanges, people seemed to understand each other readily.

It wasn’t until I went to graduate school many years later and took courses in phonology and dialectology, from the likes of the venerable Francis Katamba, John Ohala, and Alleen Pace Nilsen, that I began to understand that these short versions were neither random nor a sign of laziness. I gradually became aware of the intricate linguistic systems (such as stress timing) and pragmatic systems (such as the principle of minimization) that underlie these “short cuts” and “dialect codes.” I learned to value the spoken language as a separate realization system from the written language, even though they share the same underlying semantic and syntactic structure.

As a teacher in my university’s intensive English program, I became fascinated with these simplification patterns in the spoken language and was surprised that there were no ready materials to teach simplified pronunciation to students. So (entrepreneur that I am) I teamed up with a colleague to produce a series of audio worksheets to help students “crack the code.”

Steeped in the Audio-Lingual Method as we were at the time, we first created a grammatical syllabus. For each grammatical point we recorded two versions of the target structure (like Present tense question, V + INF: “Do you want to go?”) and labeled them “long” (Do you want to go?) and “short” (D’ya wanna go?). The students would listen to the target sentences in contextualized “real world” conversations and identify the target forms as either “long” or “short.” We knew it was a pretty basic “hack,” obviously overgeneralized (producing and perceiving reductions is actually a gradation rather than a disjunctive either-or choice). Even so, this rudimentary approach did help students raise their awareness about the differences between written English (as an alphabetic or visual system) and spoken English (as a phonotactic or oral system).

We (Ken Stratton and I, with guidance from our mentor, George Landon) later developed these worksheets into an audio-book which we entitled Listening in the Real World: Clues to English Conversation. (Clever title, eh?)

We made the task of detecting of phonological contrasts central to the methodology. We had students identify the spoken input as one of two versions: citation (long) forms (in which each phoneme is clearly articulated) or reduced (short) form (in which some vowels are reduced and some consonant clusters are assimilated). This version of Phonology 101 seemed to resonate with lots of teachers (and became our first successful book in the Lingual House lineage[1]). The method allowed students to explore the bottom-up processing difficulties they had in listening: the blurs of reduced speech that were impossible to dissect—and therefore were also problematic to pronounce.

[1] For those who may enjoy connecting with the “good old days,” here’s a flashback: Lingual House is a company I formed with Ken Stratton because we felt our listening book filled an important gap in language teaching and we were determined to get it published. After major publishers all summarily rejected our proposal, we did our own version of a Kickstarter campaign and launched Lingual House. Buoyed by the success of the inaugural project, we went on to publish other successful books and series, some by Japan-based authors, including Marc Helgesen (English Firsthand) and Curtis Kelly and Ian Shortreed (Significant Scribbles). We eventually sold the company to Longman (now Pearson) who republished the books under their own imprint.

Approach: What Kind of Instruction Leads to More Intelligible Pronunciation?

For me, the central goal of pronunciation instruction is to improve intelligibility, rather than to strive for an elusive native-like accent. Pronunciation experts seem to agree upon three key elements that are essential in improving the intelligibility of non-native speech:

    • ARTICULATION – Producing individual sounds with clarity (pronounce the 24 consonants and 20 vowels of English as accurately as possible)

    • TEMPO – Generating a comprehensible flow of information (speak in distinct thought groups, employ rules of connected speech)

    • RHYTHM – Creating harmony, through contrastive stress and the musicality of intonation (utilize appropriate variations in loudness, duration, and pitch range)

All three are essential in pronunciation teaching. What I have found is that the most “underserved” of these three is the mastery of tempo, or the flow of spoken language. Tempo fluctuations, which bring about blending of consonants (assimilation) and weakening of vowels (reduction) are also, unsurprisingly, a consistent source of listening difficulties.

Essentially, to achieve a more natural tempo (what I’m calling “flow”), the speaker has to utilize three fundamental phonotactic rules in English:

Rule 1: When two or more consonants are juxtaposed in any utterance (the consonant at the end of one word and another consonant at the beginning of the next word), only one consonant will be fully articulated. The other(s) will be assimilated into pronunciation of the dominant consonant—it will become more like the dominant sound. 


The most common of these are:

    1. / t / changes to / p / (or a voiceless bilabial stop) before / m / / b / or / p /

basket maker,  mixed blessing,  set point

    1. / d / changes to / b / before / m / / b / or / p /

bad pain,  hold back,  gold mine

    1. / n / changes to / m / before / m / / b / or / p /

brown paper,  chicken breast,  question mark

    1. / t / changes to / k / before / k / or /g/ 

credit card,  cut glass

    1. / d / changes to / g / before / k / or / g /

red carpet,  Grand Canyon

    1. / n / changes to /ŋ/ before / k / or / g /

tin can,  common ground

    1. / s / changes to /ʃ/ before /ʃ/ or / j / 

bus shelter,  nice yacht

    1. / z / changes to /ʒ/ before /ʃ/ or / j /

cheese shop,  where’s yours?

    1. /θ/ changes to / s / before / s / 

birth certificate

Rule 2: Whenever a word within a prosodic group (thought group) ends in a consonant, that final consonant will jump over to join the following vowel, to become the onset in a CV or CVC (consonant-vowel or consonant-vowel-consonant) structure.

types of => typ-sof    which one => whi(t)-chone


Rule 3: Whenever a vowel is in an unstressed word, it will be centralized: the speaker’s tongue will tend to assume a central or neutral position in the mouth (in preparation for forming the next vowel more quickly) and reduced (the speaker will give it less voicing, less length and decreased volume, often eliding it entirely). Typically, all unstressed vowels will be centralized toward: /ʌ/ as in “hut” (the back central vowel), /ə/ as in “comma” (standard central vowel), or /ɜ/ retroflex central vowel preceding /r/ as in “her.” (See Figure 1.)

Figure 1: Vowel chart of major English vowels. Horizontal axis refers to articulation position of the tongue mass in the mouth (front, central, back). Vertical axis refers to relative position of the tongue and the roof of the mouth:. Reduced or weakened vowels tend to be centralized, that is, pronounced in the region of /ə/, /ʌ/, or /ɜ/.

 These principles do not need to be learned explicitly as “rules” per se (and L1 speakers do not learn them as declarative rules), but the underlying principles are needed to inform both perception and production in the mastery of spoken English.

Examples: Practical Listening-Speaking Exercises to Improve “Pronunciation Flow”

The basic methodology I promote in teaching pronunciation flow is examining examples of spoken language. I will select multiple short examples from a listening passage that illustrate one pronunciation principle. I try to find 5-10 key sentences from an extract (lecture, film, conversation, song, etc.) that students have just listened to. (The “recency effect” of memory promotes better learning: The more recently something is heard, the clearer it will be in your memory.)

Here are the steps I follow:

    • Prepare the written text of these target sentences for display.
    • Prepare an audio file of just these isolated target sentences. Replay the audio recordings of these target sentences one by one. (Alternatively, you can say the target sentences in your own voice.)
    • Set up a “marking task,” where the students make diacritical marks to demonstrate understanding of the target: key words (circle), stress (underline), pauses (slash), intonation (arrow), linking (crescent). I find it most effective to focus on just one aspect of pronunciation in each short lesson.
    • At the end, have the students repeat and record the target sentences, using their own transcript (with their markings), focusing on the target point.

There are four specific practice types, I use to promote better pronunciation “flow.”

TYPE 1: Noticing and reproducing stress

Stressed syllables are tenser (more contraction in diaphragm), louder (more decibels), and longer (more extended time). Deliberate Practice Technique: Overemphasize the stress: make it extremely tenser, longer, louder.

Exercise type 1: Notice and reproduce syllable stress

    1. Listen to each sentence. Notice the stressed syllables (in capital letters). These syllables are louder, longer, and higher in pitch. Underline the stressed syllables.
    2. Press “record” and repeat the sentence. Be sure to stress the correct syllables.


      1. First, I want to talk about how psychologists like myself measure happiness.

First, I want to talk about how psychologists like myself measure happiness.

      1. One common method psychologists use is interviewing people.

One common method psychologists use is interviewing people.

[2] These are target sentences taken from various lectures. Source: Contemporary Topics, Pearson, 2021.)

TYPE 2: Noticing and reproducing pauses

A pause is a momentary “planning stop” for the speaker to prepare the next burst of speech (thought group). It is the complete stoppage of all audible vocalizing.   Deliberate Practice Technique: Overemphasize the pauses, make them extremely long. At the same time, attempt to bunch the sounds of the thought group together; articulate them quickly.

Exercise type 2: Notice and reproduce thought group pauses.

    1. Listen to each sentence. Notice the pauses—the speaker groups an idea, then pauses. Make a pause mark (/ for very short pauses; // for longer pauses).
    2. Press “record” and repeat the sentence. Be sure to make very clear pauses for each / mark.


      1. If you get less than seven hours of sleep on most nights you’ll start suffering from sleep deprivation.

If you get less than / seven hours of sleep / on most nights// you’ll start suffering / from sleep deprivation.//

      1. So basically anything to do with memory, making decisions, thinking—all of these are affected by the lack of sleep.

So basically / anything to do with memory// making decisions/ thinking// all of these /are affected by the lack of sleep.

TYPE 3: Identifying key words

One key word or phrase will dominate each thought group. You will hear this word or phrase very clearly (louder and with more precise pronunciation). The other words in the thought group will be weaker, and you won’t hear them clearly. Deliberate Practice Technique: Articulate the key words super clearly: make each sound of the word very precise. Or shadow (repeat) only the key words and hum (or silently time) the other weaker syllables.

Exercise type 3: Notice and reproduce key words.

    1. Listen to each pause group (marked by /). What are the key words that the speaker stresses most? Underline them.
    2. Press “record” and repeat the sentence. Be sure to give extra strong emphasis to the key words; make them louder and longer.


      1. In sociology / we study social groups / and how people interact / and respond to each other.

In sociology / we study social groups / and how people interact / and respond to each other.

In sociology / we study social groups / and how people interact / and respond to each other.

In sociology / we study social groups / and how people interact / and respond to each other.

      1. For example / depending on / if I say my name is “Alex” / or “Alexandra” / or “Dr. Shaw” / you might respond differently to me.

For example / depending on /if I say my name is “Alex” / or “Alexandra” / or “Dr. Shaw/ you might respond differently to me.

Type 4: Noticing sound changes (reduction, linking, and assimilation)

There are three types of sound changes you will notice: vowels are reduced, consonants are joined together across words, consonants jump to the next word. Deliberate Practice Technique: For each sentence identify just one sound change that you are sure that you can hear: a reduction, a linking, or an assimilation. Practice repeating this multiple times.

Exercise type 4: Notice sound changes

Listen to each sentence. Notice how the speaker links the underlined sounds. Then press “record” and repeat the sentences.

[listen, record, and playback]


    1. Find a linking of a consonant to the next word.

It’s the theory of multiple intelligences.

2. Find a reduced vowel in this sentence.

Then I plan to present how the theory has affected what some teachers now do in the classroom.


CONCLUSION: Charting Progress Toward More Intelligible Pronunciation

From our own experiences as L2 learners and as teachers, we all know progress in pronunciation can be elusive. To work towards gradual and long-lasting changes in pronunciation, regular, focused exercises with discourse-level pronunciation rather than word-level pronunciation seem to be the most effective. Because students are unlikely to grasp this “top-down principle” of pronunciation, it is up to the teacher to make this shift in pronunciation instruction, providing more emphasis (or at least equal emphasis) on discourse fluency than on word accuracy.

This short article has provided some personal history of my own approach to teaching pronunciation, contextualizing pronunciation practice with listening practice. This approach can be very effective in that it helps students link listening strategies and speaking strategies, finding ways to become more intelligible and to communicate more effectively.


Michael Rost (Ph.D.) is an independent scholar, author, editor, and teacher trainer—and recent member of the Odyssey Swim Club, doing distance swims in the San Francisco Bay. He has written a number of academic books is author or editor of several language courses, including English Firsthand, Impact Issues, and Pearson English Interactive.

