When we look back and ask exactly when AI took over, we may pinpoint November 2022. That was when ChatGPT was launched and reached a million users within a week. Technically, the victory of DeepMind over LeeSedol may have been an earlier signal that the technology had come of age (watch: Krieg, 2017), but DeepMind’s’s victory was in the game of Go, which most people do not know. ChatGPT, on the other hand, was demonstrating a high level of proficiency in language, which is an area that most humans are expert in, and therefore able to judge and, perhaps more tellingly, criticise any shortcomings. As it becomes increasingly difficult for language teachers to ignore or deny this fourth industrial revolution, we are faced with questions for the short, medium, and long term. In the short term: how do we identify AI, and how do we stop students from using it? In the medium term: how do we encourage sensible use, and what can we learn from the technology? And in the long term: where’s the beach? Ultimately, the technology will likely either take all our jobs, or we will be in mortal combat. In either case, the beach may be the best place to be!
The short-term questions are solved: if we think our students are using AI, then they probably are. We cannot reliably identify AI, so there is no way to be sure whether they are using it. We cannot stop them from using it, and indeed, we should probably rather be concerned about any students who are not experimenting with this new technology. While we are revising our teaching repertoires and asking if there is any point in even assigning tasks that can be done by AI, let us think about some more interesting questions: how did AI get to be so good at English, and what can that tell us about how humans should learn language?
It is not long ago that computer-generated text, such as the output of machine translation, was laughable nonsense, immediately obvious to the teacher when submitted in an assignment. Now Large Language Models can produce text that is not only grammatical, but also contains vocabulary appropriate to the context, catching the stylistic essence of various genres. The short answer to how it reached this level of proficiency is that we don’t know (Hutchins & Somers, 1992).
While large language models (LLMs) seem to have suddenly reached proficiency, their path has taken decades. Computers have been entwined with language since the birth of the modern computer for code breaking in the 1940s (Haugeland, 1985). With hubris these early computer scientists believed they could solve translation of natural language within five years (Weaver, 1949). Two decades later, the ALPAC report (1966) claimed that high-quality automatic translation comparable to human output was unattainable.
Early machine translation and natural language programs were rule-based, relying on fixed grammars and word lists, and only achieving narrow success (Bender & Koller, 2020). At the time, linguists debated whether language could be fully described by syntactic rules (Chomsky, 1956) or by its functional use in society (Halliday, 1973). The later approach of LLMs has bypassed these debates entirely: they do not encode rules or functional grammars but instead learn to produce plausible text by identifying patterns in massive datasets through the use of Neural Networks. This may be familiar to anyone experienced with Extensive Reading (ER): an approach to language learning by reading a lot of easy books (Day & Bamford, 1998; Jeon & Day, 2016; Nishizawa et al., 2010; Sakai & Kanda, 2005).
So how does ER work? In its purest form, advocates of ER tell students to ignore the grammar and not worry about unknown words. They promise that by aiming for fluency, reading will lead them to proficiency. A central tenet of ER is that learners internalize language structures implicitly through extensive, enjoyable reading, which leads to natural fluency over time. ER aligns with Stephen Krashen’s Input Hypothesis, which posits that language acquisition occurs through exposure to comprehensible input that includes some items slightly beyond one’s current level, thereby allowing learners to intuitively grasp language rules and usage without explicit instruction (Krashen, 1982; Day, 2019). ER also allows learners to develop cultural and contextual fluency by engaging with authentic texts that reflect the subtleties of human language, including idioms, slang, and cultural references. These elements are critical for achieving advanced proficiency, as they involve more than just understanding vocabulary and syntax; they require an awareness of social norms, humor, and context-specific usage.
This seems to be exactly the approach LLMs have taken, which raises a provocative question: has AI been doing ER? The similarity of LLMs and ER is uncanny, but let’s take a closer look, starting by considering what we mean by “AI.”
Artificial Intelligence is an imprecise and unhelpful term. “Artificial” is usually understood as “not real” even though the actual meaning is “man-made.” Likewise, “intelligence” is difficult to define, often leading to a lazy and circular idea of “what we humans can do but machines cannot.” As the technology progresses, the goal posts are moved to protect our vanity. We must remember that, when we are developing technology, our aim is not to match human ability, but to supersede it. We do not use bicycles because they move at the same speed as we can run, but because they can move faster, with less effort, carrying more. Our intelligence is limited by our biology, but the technology we develop is not, and at some point, AI will become more intelligent than us, and we will struggle to know what that means. This may already have happened.
To understand what this might mean in practice, it helps to look at what exactly underlies the technology. More precise terms than AI are Machine Learning and Deep Learning. Machine learning somewhat mimics human learning by giving a computer a task and some training data, and then test data to check whether the task can be accomplished. Deep learning is so called because the algorithms which turn inputs into outputs include hidden layers. Essentially, we have a black box and do not know exactly how the results come out.
People frequently cite mistakes AI makes as evidence of its lack of intelligence, but we should not count on AI mistakes as evidence of incapacity. Users of ChatGPT become frustrated when answers are wrong and there is no explanation why. Interestingly, humans also make mistakes and don’t know why they made them. The inability of LLMs to process numbers correctly highlights that here the AI is focusing on mastering patterns in language, not on replicating all human abilities. And anyone who has experimented with AI chatbots will be chronically aware that they usually treat rules as mere guidelines, and often ignore them completely.
Beyond these limitations, large language models sometimes show surprising emergent abilities. Although they have been described as “stochastic parrots” (Bender et al., 2021), LLMs sometimes demonstrate abilities that extend beyond simple pattern recognition. They can generate creative responses, solve novel problems, and produce outputs in complex, emergent ways that have never been explicitly programmed. This reflects an unexpected sophistication in artificial neural networks, reminiscent in some ways of emergent phenomena observed in human cognition. While the inner workings differ from the brain, the analogy emphasizes learning through extensive exposure rather than prescriptive instruction.
Both LLMs and ER learners depend on contextual input to develop proficiency. Just as ER emphasizes immersion in a wide variety of texts to internalize vocabulary and structures naturally, LLMs improve through exposure to enormous amounts of language data. The key similarity is learning by doing rather than learning by memorizing rules. In both cases, repeated engagement with meaningful content allows patterns to be internalized organically. This suggests that ER’s approach of focusing on abundant, comprehensible input remains highly relevant in modern language learning contexts. But is AI’s language training truly following ER principles, or are the similarities merely metaphorical?
While it is tempting to see the success of LLMs as simply a victory of data input over rule following, we should not overlook the exponential increase in computer power and memory size on the slow road to computer language proficiency. Computers are sometimes thought of as electronic brains, but they are very different to our own neural networks. While electronic computers ultimately derive all their functions from one-dimensional strings of ones and zeros, the 86 billion neurons in the brain each has a thousand synapses making connections in three dimensions. And the brain does not neatly divide architecture, processing and memory into ROM (read only memory), RAM (Random Access Memory) and permanent storage. We cannot therefore equate a brain to a computer with an 86-gigabyte hard drive, or say it is close to the 175 billion parameters that allowed GPT-3 to produce impressive language. However, we can see that computers have recently entered the ballpark of the human brain’s computing power, and neural networks and parallel processing imitate the way our brains process information (see Frank et al., 2008).
Additionally, we do not know exactly how many millennia of the brain’s evolution has contributed to our language abilities (see Worden, 1998). The data used for training of the ChatGPT 3 LLMs was equivalent to a human reading one novel a week for around 25,000 years (Brown et al., 2020). This pre-training is the PT in GPT, which stands for Generative Pre-training. The novel is relatively new, but we can be confident that language has been around longer than that. While the “extensive” in ER means reading a lot, we usually expect learners to read hundreds of thousands of words (see Nishizawa et al., 2010; Jeon & Day, 2016), with a million words an ambitious target (Sakai et al., 2005), and we cannot hope for people to read billions of words. On the other hand, our brains are already context-aware and language-ready, so we do not need to train them from scratch. We may therefore argue that modern computers are on their way to the computing power and architecture of our brains while the size of the data exceeds what we can read.
These structural behaviors begin to parallel human learning in ways reminiscent of extensive reading. While algorithms may guide the training of modern LLMs, once trained, they are not step-by-step procedures any more than the human brain is. The resulting networks operate through weighted connections that detect patterns rather than follow explicit rules. This is crucial to understanding why we have a “black box” problem: there’s no algorithm to inspect, only emergent behavior arising from billions of interacting parameters. And yet, in processing the textual equivalent of millennia of human reading, these models develop a kind of muscle memory for language, discovering subtle patterns and regularities that even expert humans might never be conscious of.
Both ER and AI language models focus on language input, but they process this input in fundamentally different ways (see Tomasello, 2008). While ER enables human learners to develop a nuanced understanding of language through meaning-based engagement, AI lacks the capacity for true comprehension and awareness. Despite AI’s ability to predict words and context, these systems can only partially grasp emotions or irony (Magee, 2023). Models like ChatGPT are trained on vast datasets containing culturally rich content, but they do not truly ‘understand’ culture or context; they can only replicate patterns they have correlated in training data (Bender et al., 2021). For now. This gap means that, although AI may produce language that appears fluent, it lacks the depth of understanding that is essential for true fluency, as cultivated through human practices like ER.
Conclusion
Large language models (LLMs) provide a powerful analogy for language processing, and despite their flaws, have succeeded in generating human-like language by sidestepping millennia of linguistic theories that sought to model how language works. LLMs, like the students under ER purists, have not been given strict rules or specific explanations of how the language works; they have simply been given a large body of language and left to find the patterns. The parallel between AI and ER is fascinating precisely because both seem to work through emergent behaviours we don’t yet fully understand.
But has AI really been doing ER? If we look at one simple definition of ER—reading for pleasure—then of course, it has not. While we may, at a stretch, equate the algorithmic rewards that are given when models match the testing data and the endorphins released in the reader’s brain, AI cannot read for pleasure. Here ER can help teachers worried about students cheating: while AI can help them produce grammatical text, pass comprehension tests or translate in or out of the target language with minimal engagement and opportunity to learn, nobody can get AI to enjoy a good book. As we ponder our path through the AI quagmire and look forward to the beach, we can continue to promote this activity and celebrate our humanity.
References
ALPAC. (1966). Language and machines: Computers in translation and linguistics. National Academy of Sciences.
Day, R. R. (2019). ER: Learning to read by reading. Mind Brained Think Tank, 5(6), 17–24.
Day, R. R., & Bamford, J. (1998). Extensive reading in the second language classroom. Cambridge University Press.
Halliday, M. A. K. (1973). Explorations in the functions of language. Edward Arnold.
Haugeland, J. (1985). Artificial intelligence: The very idea. MIT Press.
Hutchins, W. J., & Somers, H. L. (1992). An introduction to machine translation. Academic Press.
Krashen, S. D. (1982). Principles and practice in second language acquisition. Pergamon Press.
Krieg, G. (Producer) & Kohs, G. (Director) (2017). AlphaGo [Documentary]. Moxie Pictures.
Magee, G. (2023). Similarities and differences between predictive language processing in human brains and machine learning systems such as GPT-3. Mind Brained Think Tank, 9(3), 15–20.
Nishizawa, H., Yoshioka, T., & Fukada, M. (2010). The impact of a 4-year extensive reading program. In A. M. Stoke (Ed.), JALT 2009 Conference Proceedings. Tokyo: JALT
Sakai, K., & Kanda, M. (2005). 教室で読む英語100万語—多読のすすめ [1,000,000 English words read in the classroom: Recommendations for extensive reading classes]. Taishukan Shoten.
Tomasello, M. (2008). Origins of human communication. MIT Press.
Weaver, W. (1949). Translation. In W. N. Locke (Ed.), Machine translation of languages: Fourteen essays (pp. 15–23). MIT Press.
Worden, R. (1998). The evolution of language from social intelligence. In J. R. Hurford, M. Studdert-Kennedy, & C. Knight (Eds.), Approaches to the evolution of language: Social and cognitive bases (pp. 148–166). Cambridge University Press.
Mark Brierley is an associate professor at Shinshu University and has taught a range of courses from low-energy building to dialects of English. He has been teaching English in Japan for over twenty years. His research focuses on using technology to support Extensive Reading, particularly the use of machine learning to assess text difficulty and generate engaging materials tailored to learners. He developed the Extensive Reading Foundation placement test and serves as a board member of the Foundation and an editor of the Journal of Extensive Reading.
Gary Ross is an Associate Professor in the Faculty of Pharmacy at Kanazawa University, Japan. His research interests include artificial intelligence, online learning, extensive reading, speech recognition, and speech synthesis in language education. He also serves as lead programmer of Erai.app. Originally from Ireland, he has worked in Japan for many years, combining teaching and research with the development of practical tools to support language learning and student engagement.
