What the MIT Study on AI and Cognitive Debt May Have Missed
The recent MIT study on large language models (LLMs) and “cognitive debt” has reignited debate about what’s at stake when we use tools like ChatGPT to support thinking. In the study, participants who had access to ChatGPT for writing tasks over several sessions performed noticeably worse when asked to write without assistance in a final session. The authors describe this as evidence of diminished independent performance due to over-reliance on AI, a warning about what might be lost in the age of augmentation.
While I acknowledge the perils of cognitive outsourcing, we may want to think carefully before drawing our own sweeping conclusions from this study. The study included 54 participants, only 18 of whom participated in the 4th session, where some of those who had been using LLMs to assist them in their writing were asked to write unassisted. Beyond the small sample size, it may be worth asking a few questions about what the study may obscure as well as what it illuminates. For example, what exactly was being tested? What assumptions underlie this experimental design? And how might emerging theories of learning, cognition, and human-AI collaboration offer a more nuanced interpretation?
We tailor our training based on how we believe our performance will be assessed: why the study’s surprise condition may skew the results
One of the most basic questions about the MIT study is whether participants were told in advance that their access to AI tools would be removed in the final session. From the methodological design, it appears they were informed there would be a fourth session, but not that it would involve writing unassisted: “The participants were not informed beforehand about the reassignment of the groups/essay prompts in session.” This raises a fundamental methodological concern.
Imagine someone training for a bodybuilding competition: strict form, high reps, with a focus on sculpting the body and increasing muscle mass. Then, on the day of the event, they’re told they’re actually competing in powerlifting, a discipline focused on low-rep maximum strength, with a different physiological and technical training emphasis. Even though both involve lifting heavy things, they are distinct domains. The athlete would have every right to feel misled, and likely underperform relative to someone who had trained specifically for that event.
The same logic applies here. If students were encouraged to optimize their work using AI tools, and then were suddenly evaluated without them, the test may have shifted its focus from cognitive ability to adaptability under surprise constraint. Their performance drop may not reflect permanent cognitive erosion, but a mismatch between the conditions they were trained under and the ones they were tested in.
This reframing matters. Rather than concluding that AI weakens the mind, we might instead say the study highlights the cost of contextual mismatch, of failing to align learning environments with assessment environments.
2. The Amplification Effect: How High-Performing Learners Used AI to Think Faster and Deeper
One of the most important but underemphasized findings in the MIT study is that high-performing learners, especially those focused on learning rather than mere task completion, were able to write faster while maintaining engagement and cognitive performance when using AI.
“There is also a clear distinction in how higher-competence and lower-competence learners utilized LLMs, which influenced their cognitive engagement and learning outcomes. Higher-competence learners strategically used LLMs as a tool for active learning. They used it to revisit and synthesize information to construct coherent knowledge structures; this reduced cognitive strain while remaining deeply engaged with the material. However, the lower-competence group often relied on the immediacy of LLM responses instead of going through the iterative processes involved in traditional learning methods (e.g. rephrasing or synthesizing material). This led to a decrease in the germane cognitive load essential for schema construction and deep understanding. As a result, the potential of LLMs to support meaningful learning depends significantly on the user's approach and mindset.”
The study observed that when previously unassisted learners gained access to AI tools, their memory performance and brain activity remained high. In fact, these learners improved relative to their AI-dependent peers, demonstrating that LLMs can enhance cognition when used strategically. These learners treated AI as a partner in thinking instead of as a replacement for it.
This observation surfaces a crucial question: were participants trained in how to use AI with best practices? The study does not describe structured instruction in prompt engineering, iterative thinking, or reflective practices. Without such scaffolding, students are likely to default to the most obvious use of the tool: speed and convenience.
And here lies the deeper insight: LLMs amplify the strategies we bring to them.
If a learner is focused on task completion, the AI accelerates completion, with minimal depth.
If the learner is focused on learning, the AI accelerates learning, enabling faster iteration, broader exploration, and sustained engagement.
This is why intention matters. Students who enter the process with a mindset geared toward learning, not just getting the job done, engage more deeply, retain more knowledge, and remain cognitively active, even as they accelerate their pace. How should we tailor our approaches to using LLMs based on the goals we have, learning or performance? How should we build systems to enable critical thinking where humans need to exert effort to gain the best results, over both the short- and long-term.
3. Writing Isn’t a Process. It’s Many: Cognitive Load, Thinking Loops, and the Chance to Rethink Thinking Itself
We likely also carry with us the assumption that writing unassisted is the purest form of critical thinking, a kind of mental gold standard. But that assumption deserves scrutiny. We’ve associated writing with thinking for so long that we’ve rarely paused to ask: How else might thinking happen? What new cognitive possibilities open up when we no longer treat writing as the only, or even primary, mode of thought?
This tension isn’t new. Socrates himself questioned the connection between writing and thinking, warning that writing could give “the appearance of wisdom, not true wisdom,” because it might replace memory and understanding with mere surface recall. Today, we find ourselves at a similar inflection point, not because of writing, but because of AI. And once again, we must ask: What does real thinking look like, and how should it be supported?
One of the most powerful demonstrations of this came in my recent work on SocraGPTes, a generative AI built in the spirit of Socratic inquiry. Unlike tools that simply generate answers, SocraGPTes was designed to ask questions that exposed gaps in understanding, challenged surface assumptions, and prompted learners to clarify their reasoning. In that interaction, learning happened not through automation, but through amplified reflection.
And, in Writing Isn’t a Process. It’s Many, I argue that writing is not a singular act but a composite of subprocesses: idea generation, organization, sentence construction, tone management, revision. Traditional writing demands that we manage all these dimensions at once, imposing heavy extraneous cognitive load, effort spent on grammar, structure, and surface-level presentation rather than core meaning.
Cognitive load theory distinguishes between:
Extraneous load: effort spent on non-essential or distracting tasks (e.g., wording, formatting)
Germane load: effort that directly supports understanding, reasoning, and insight
AI tools, when used well, reduce extraneous load, generating initial drafts, offering structure, correcting phrasing. This frees up space for germane load: deeper reflection, reordering logic, iterating on meaning. Writers begin to operate in thinking loops: draft → critique → revise → re-prompt → refine. These loops aren’t less cognitive—they’re differently cognitive. They reflect a model of thought that is exploratory, dialogic, and nonlinear.
In this light, the MIT study may be measuring the wrong thing. If a student has come to think with AI through these loops, externalizing thought, testing it, revising in response, then asking them to return suddenly to the blank page is not a neutral test of their mind. It’s a test of fluency in a different system. Their cognitive performance hasn’t decayed; it has evolved to work in an ecosystem that requires something different.
Studies like the MIT study are important, and shed light on potential pitfalls and dangers. The overall moment we are living through also offers us a a broader chance to rethink the cognitive architecture of learning itself. Generative tools let us distribute effort differently, and in that distribution, there is an invitation to explore what forms of thought become possible when we stop treating writing as the only vehicle of thinking, and start designing tools and practices that let thinking unfold differently, and perhaps more richly.