Professor Natalia I. Kucirkova, Professor of Early Childhood Development at the University of Stavanger, Norway, and The Open University, UK Around the world, major philanthropic agencies are turning to educational technology (EdTech) to improve learning. In the rush to identify tools that can quickly, cost-effectively, and reliably support children’s learning, ‘impact’ has become the central […] The post How to avoid three common pitfalls in EdTech impact evaluations appeared first on World Education Blog.
Professor Natalia I. Kucirkova, Professor of Early Childhood Development at the University of Stavanger, Norway, and The Open University, UK
Around the world, major philanthropic agencies are turning to educational technology (EdTech) to improve learning. In the rush to identify tools that can quickly, cost-effectively, and reliably support children’s learning, ‘impact’ has become the central measure of success. But measuring educational impact in EdTech is far from straightforward. The reasons are many, and widely debated in education circles. Here, I focus on three pitfalls that I frequently come across in my role of Director of the International Centre for EdTech Impact.
Pitfall 1: Usage taken for learning scores
Many providers are confident that their products are educational by default, and thus assume that because a child spends hours on their app, they are getting hours of learning. Yet, several studies show that children using EdTech not only fail to learn effectively but may even experience setbacks in their learning.
The number of hours spent on a tool is an output, not an outcome. It is a mechanism that might lead to learning, but it’s no guarantee. Effective use depends on finding the right balance of frequency and intensity. Indeed, too much time can be counterproductive. For example, Kahoot! has demonstrated educational benefits in meta-analyses, but studies with specific student populations have also found that frequent quizzes can increase stress levels.
So, even with EdTech tools that have proven learning benefits, balanced use and an understanding of how much exposure supports optimal learning are essential.
Pitfall 2: Assessments without psychometric validation
Many EdTech tools include quizzes or tests built right into the app. These give providers a ready-made source of data, and a tempting shortcut to claim “impact.” I see this all the time, especially in settings where researchers aren’t brought in to verify learning and procurement teams rely solely on whatever the providers present as proof.
Imagine an early reading app in which a child reads a story containing new words. At the end of reading, the child completes a short quiz to test whether they learnt the new words. Some apps may give children the same quiz before and after the child reads the story. Aggregating data across many children, the app provider sees that most children score high on the quizzes after they read their story. Proof, they say, that the app is teaching kids new words. But any researcher will tell you that it is not an actual proof. The core problem is the lack of psychometric rigor of the test.
A psychometric evaluation assesses whether an instrument, like a quiz, measures what it claims to measure, whether it does so reliably over time, and whether the results can be generalized. Most quizzes, surveys or tests developed by EdTech providers were not developed by researchers but rather by in-house staff or simply generated by AI to match the learning stimulus. They are not standardized tests and may measure something entirely different from the intended learning objective.
When we evaluated the psychometric properties of a quiz in an early numeracy app, we found a striking disconnect: the children were scoring high on the app test, but the same children performed poorly on their teacher-administered tests. Examining the app test more closely, we saw that some questions were too easy, others ambiguous, and several assessed general pattern recognition rather than specific numerical concepts.
The feedback embedded in the test taught children to answer correctly within the app, but the test itself was not measuring their true numeracy skills, only their ability to respond correctly to the test. As a result, the test gave the false impression that children were mastering early numeracy, when in fact their understanding of the intended concepts was not developing.
Developers and funders should not rely on in-app tests without psychometric checks. Beyond making sure internal data are psychometrically sound, an impact evaluation should also include external data.
Pitfall 3: Internal measures taken as sufficient
External measures, that is information gathered beyond what the EdTech collects, is crucial, yet many providers ask why bother if they have their own data. Funders, too, are tempted to save money, since hiring data collectors (enumerators) and paying researchers to design valid tests is both costly and time-consuming.
This is a problem because internal data (even if psychometrically sound) are limited: they only measure what a child can learn within the app environment. Whether that knowledge transfers to knowledge outside the app, is unknown.
External data can show positive effects as well as unintended consequences of using an EdTech tool. For example, in a study with the coding software Kodable, researchers conducted direct classroom observations and were able to demonstrate that children learnt game-specific coding skills after one week of using the tool. In a different study, with the tools Speechify and Elevenlabs, researchers conducted interviews and surveys with students and found that these speech generation technologies perpetuated linguistic bias and discrimination based on accent.
Some external measures, such as research surveys, can be embedded in the EdTech tool itself but that opens room for bias: because providers have full visibility into their users’ performance data, algorithms within the app can be tweaked to produce results more favourable to them. When presented to funders and buyers, a child’s apparent ‘progress’ may reflect design choices rather than genuine learning.
To truly understand an EdTech tool’s impact, internal usage data must be combined with external data that is independently collected by qualified researchers. By combining rigorous internal and external data and following a clear theory of change, we can move beyond proving that a tool is being used to showing why and how it actually improves learning. In a crowded EdTech market, this is exactly where policymakers should focus – and fund – rigorous impact evaluations that separate tools that truly improve learning from those that only claim to.
The post How to avoid three common pitfalls in EdTech impact evaluations appeared first on World Education Blog.



















