Minding the Gaps: Statistical Misrepresentation in Attainment Gap Research

Minding the Gaps: Statistical Misrepresentation in Attainment Gap Research

It is no secret that political interests configure the stories we tell with data. Closing the gap in attainment between disadvantaged students and their advantaged contemporaries is pivotal to an agenda to use education as a positive social force. But both the measurement and representation of this gap is politicised, skewed and open to manipulation. This paper shows how two organisations with inverse aims and ends represent—and misrepresent—their measure of the attainment gap to portray diametric trajectories in the pursuit of equal attainment. Ultimately, there is a lack of engagement with statistical analysis on show: these measures are flawed both in politicised representation and in the absence of analysis to assess whether the observed trends indicate anything more than normal year-on-year variation.

Evidence in Education: The Mire of Educational Statistics

There is an appreciable lag in the quality of research in education. There is a huge, ballooning literature in evidence-based medicine, which—for all its flaws—accelerates the accumulation of large, well-designed studies whilst investigating the many statistical and design challenges which bedevil applied research. Not so educational research. Organisations like the Education Endowment Fund (EEF) and the Campbell Collaboration are starting to spin up programmes to evaluate evidence quality and set standards for conducting trials in education. But across the field, the evidence base is threadbare and rigorous statistical analysis is more anomaly than norm.

This problem is exacerbated by the multifarious complexities which confront educational researchers. Even in the abstract, it is often easier to speak of idealised randomised controlled trials in medicine than in education. Occasionally, a research problem in medicine can feel quite agreeable—we have an easily-measurable binary outcome (Did the patient die? Did they suffer a stroke?), definitive diagnostic eligibility criteria, and a pair of simple interventions which can easily be administered according to standard procedure (e.g. one of two different pills to take daily). Of course, most medical research is a long way from these simple cases. All sorts of intricacies quickly embed this pristine perfection in the quagmire.

But for educational researchers, so straightforward a study is unimaginably picturesque. Educational research must wrestle with extremely muddy outcomes as standard (e.g. change from baseline results, with poorly-estimated and highly variable baselines and outcomes) across a range of subject areas, between schools with heterogeneous populations, in a context in which any effects may not be realised for years, while being washed out by the myriad other ongoing interventions alongside exogenous changes over long timescales. Meanwhile, any interventions prescribed will be more complex than a pill, lack obvious analogues to a placebo (despite the Pygmalion effect, a rough corresponding concept to the placebo effect, the potency of which is quite well documented – see Rosenthal & Jacobson, 1992), and be applied with scant consistency by overstretched teachers with varying teaching philosophies and approaches to their craft. The quagmire is the starting point and it gets marshier thereout.

But importing idealised clinical trial design to the educational mire is not the sole approach to place education on sound scientific ground. Perhaps we can be ready to suggest that educators need not begin by aping clinical counterparts. This is particularly true where less mud-caked data is available to provide important insights. Clinical trials are a methodological masterstroke for testing which of two treatments works better on average. But educational research is not all—or even mainly—about comparing interventions (nor is clinical research, but that is another matter). We might more reasonably start with serious work in charting and understanding variation in performance. What causes students to under- or over-perform? Are there identifiable risk factors for suppressed results? Is there bias in assessment and grading? Which assessment modalities favour which groups, exacerbate or mitigate biases, and so forth?

Here we can avoid some of the worst excesses of research models which focus on predominantly on large randomised trials. These are not primarily interventional questions. We do not need to perform experimental studies. Large data sets are often already available, assembled through meticulous labour and spanning generations. Yet, the research here still lacks in engagement with statistical challenges. For educational research to learn from its existing data, we need to insist on rigour in the evaluation, analysis and presentation of that data.

Attainment Gaps

This gap in statistical rigour and standards of presentation in data becomes painfully and ironically apparent in the literature on attainment gaps in education. In the context of university education, analysing my own cohort of students’ results to test for attainment gaps in terms of gender and ethnicity, I faced two major concerns. First, university programmes see uneven samples as a matter of course. Gender and ethnicity are imbalanced at the university level, but far more so at the departmental or programme level. Not only is ethnicity and gender distributed unevenly across departments and courses, but so is attainment. It is hardly surprising that the canonical example of Simpson’s Paradox—the connoisseur’s statistical confounding par excellence—was found in university admissions statistics stratified by department (Bickel, Hammel & O’Connell, 1975). So, it would be necessary to control for these influences, explore the data at departmental level, and interpret the findings carefully.

Naturally, I turned to the literature to see what progress had been made in developing protocols to perform this analysis. It was a disappointing search (see Universities UK, 2019; Miller, 2016). There is little out there to guide analysis, little evidence of sensitive or statistically-informed work being performed, and worse still some egregiously bad practice modelled by the most influential data centres in education research.

Research on university students’ attainment is underdeveloped in comparison to research at primary and secondary school level, where there’s more of a tradition of cross-institutional data gathering and government funding for evaluation of outcomes (though the embattled new Office for Students purports to be taking measures to enact research at the university level, which presently veers from promising towards perilous – see OfS, 2020). Appreciating this, I turned to the school-level literature for support. But I found much the same problem. The most prominent reporting of UK attainment gaps exposed starkly the frailty of existing approaches to research on attainment gaps amongst school pupils.

Two contrasting reports present an informative juxtaposition of statistical abuse, particularly in terms of misleading data visualisation and gung-ho hyper-extrapolation. On the one end, we have the optimistically-named Department for Education’s 2019 report on attainment gaps at Key Stage 4 (essentially equivalent to GCSE level, conventionally sat by pupils aged 14-16). Here, the government line shines through the murky statistics to illustrate a tale of progress—slow and steady, though it may be—towards an egalitarian educational system (DfE, 2020). Their inverse comes in the form of the Education Policy Institute’s 2019 report (albeit on 2018 data), which is willing to plunder some egregious extrapolation and its own brand of misleading representation to drive through a message of progress screeching to a halt, and maybe into reverse gear (Hutchinson et al., 2019). Two reports on the same phenomena, months apart, with inconsistent conclusions each drawn from some pedestrian statistical abuse. With no clear path to truth, but prophylactic statistically-founded scepticism, we might unmask that there is nothing of interest underneath either report’s posturing.

DfE 2019

The Department for Education’s latest report on attainment gaps at Key Stage 4 makes for ugly reading, and not just because it shows the extent to which disadvantage permeates attainment. The DfE’s approach to their own statistics at times sways between cumbersome and skewed.

As a first note, the DfE and EPI reports have different approaches to classifying students as ‘disadvantaged’ for the purposes of analyses of attainment gaps. Elsewhere, attainment gaps are measured in terms of other characteristics such as ethnicity and gender (see Miller, 2016). Constructing these categories presents thorny challenges and crosscuts with political issues, but for the purposes of this paper, we will take the definitions of advantage and disadvantage as read and focus on the construction and use of the measures of the gap.

The DfE report a headline measure: the disadvantaged attainment gap index (henceforth ‘gap index’), finding that this measure of the attainment gap has been pretty much stable for a decade. Immediately, the headline figures are bedevilled with unhappy reporting practices: a change in the index from 4.07 in 2011 down to 3.70 in 2019 is reported as “9.1% lower”. The gap index has moved to a level of abstraction away from reporting the actual percentage difference in attainment, which makes reporting this change as a percentage difficult for the authors. But, “9.1% lower” seems a misleading way to depict the change, akin to saying that a fall in temperature from 10 degrees to 5 degrees represents a 50% drop in the temperature (it really, really doesn’t). This is a relative rather than an absolute measure of change—something that the medical literature has known for decades tends to misrepresent change and unduly dramatize small differences (see e.g. Malenka et al., 1993; Barratt et al., 2004).

Imagine, for instance, that we are testing a new treatment to reduce the risk of stroke. Presently, let’s say 4% of people in the population we are testing will have a stroke in the next year. After administering the drug, this falls to 3%. We could report this in two ways. We could say that the risk of a stroke in this population decreased by 1% – this is the absolute change in the risk. Or we could say it has fallen by 25% – the relative change in the risk. If we wanted to sell the drug to consumers, we would probably opt for the latter because it sounds more impressive. But if we want to be honest about how much benefit the patients will get, the former looks a lot fairer. There are other options for reporting too with their own advantages and drawbacks, such as ‘Number needed to treat’ and odds ratios.

Returning to the DfE’s data, we need to know more about the scale of this gap index. If we have fallen from 4.07 to 3.70 on a 100-point scale, that’s a miniscule difference. If it’s on a 5-point scale, that may be more important. Given that in the last eight years, the gap index fell by 0.37 points from 4.07 to 3.70, where do we get a 9% change from? Well, 0.37 is 9% of 4.07. This is a reasonable way to work out the relative change. But how could we calculate the absolute percentage change? We would need to know the scale. If it’s out of 100, it would be a 0.37% change. If it’s out of, say, 5, then it’s a 7.4% decrease. To look at the DfE’s chart, one would assume that the score is out of 5, simply because that’s the scale they present:

DfE 2020 Attainment Gap Index
(DfE, 2020, p.9)

But if we read the methodology—in a separate technical document which is mentioned but not cited (I believe it to be: DfE, 2014)—we find that the gap index ranges from +10 (indicating the maximum disadvantage for disadvantaged students) to -10 (indicating the maximum disadvantage for advantaged students), with zero denoting that disadvantaged and advantaged students score the same, i.e. no attainment gap. This range is never mentioned in the 17-page report. So, the absolute change towards zero, which is the goal, expressed as a percentage would be: 3.7%.

We can hardly feign surprise that the DfE chose the statistic that depicts their progress of the last decade in the most favourable light, particularly when the country has been under a Conservative government of one styling or another for almost the entire period. They are incentivised to depict their efforts as successful, much like a pharmaceutical company which wants to sell its wonderdrug for preventing strokes. But this approach has dangers. It undersells how much work there is still to be done to close the attainment gap, encourages complacency and disincentivises important further work. To illustrate this, we can look at another limitation of relative measures of change.

Suppose we dared assume that progress in closing the attainment gap will continue at the same rate (this is a weak assumption, but let’s go with it for now). How long will it take to close the gap? Let’s take the relative measure as the stable one, and assume that every eight years, the gap index decreases by 9% relative to the gap index of eight years ago. In 2027, the index will have fallen to 3.37. That’s a decrease of 0.33 points. It appears that progress has slowed: this is less than the fall of 0.37 from 2011-19. Why? Because 9% of 3.7 is less than 9% of 4.07! Working it through a little further, we find that in 2035, the gap is 3.06. By 2043, it’s 2.79. Fast forward to 2099, and we find our gap index at 1.44. By now, it’s decreasing only 0.14 points every eight years. That’s still 9%. We are still going in 2203, with a gap of 0.42, closing by about 0.02 points every cycle. That’s still 9%, too.

Cutting ahead, we are never going to reach zero this way. We’re stuck in a butchered incarnation of Zeno’s Paradox: despite making constant progress, we never reach our destination. In the infinite long run, we will approach zero but never reach it. 9% of a very small amount is a very, very small amount.

But if we consider things through the absolute measure, things are rosier. The gap closes, pretty much exactly, in 2099. Of course, this is a very poor inference to make, and I do not want to see anyone quoting that figure. The rate of progress will not actually be stable. Indeed, as we will see in a moment, the data does not really support speaking of ‘progress’ in closing the attainment gap at all. If proof of the absurdity of this inference is further desired, consider that projecting the current that rate of change, in 2179 the present-day attainment gap will be exactly reversed and disadvantaged students will outperform their advantaged counterparts with a gap index of -3.7. Though at least when we use the absolute figures, it is possible to reach zero!

The gap index has another trick embedded within it. The index is actually measured as a value between 0.5 and -0.5. The DfE notes, justifiably, that people find small values like this hard to handle. They choose to scale up the measure to be out of 10 instead. What would be a 0.2 on the underlying index scores 4/10 on the scaled-up version. This is fine as it goes, but they really should make this clear in their public-facing reports. They say: “this scale of “how many out of ten” is familiar from many everyday situations” (DfE 2019, p.7). But of course, how many out of five is equally familiar—the five-star rating system is probably as widespread as any method of rating. So, if it is meant to be understood as out of 10, it really must not be presented in a chart on a scale of 0-5!

But here is the trick: if I say that we are scoring 4/10, where 0 is the best and 10 is the worst (weirdly—this is not so familiar!), it might feel like we are doing admirably. The problem is over halfway solved, after all. But in reality we never had, and never will have, the 10/10 problem. 10/10 means that every disadvantaged student scores less than every advantaged student. This is not the extreme case which we are moving away from, or the comparator against which our current situation is to be judged. In fact, the authors note that we can “understand the Index’s value as a proportion of the maximum possible gap” (technical report, p.7). In other words, a score of 3.7 means that we currently have about 37% of the maximum possible gap between disadvantaged and advantaged students. Suddenly, 3.7 out of 10 does not look so good. To put it another way, we are at 37% of a situation in which every single disadvantaged student scores less than every advantaged student.

Note, too, how different 37% of the maximal attainment gap feels to a score of 3.7 on an index out of 10. For most readers, a 37% score is much more impactful than a 3.7/10. There is no reason whatsoever why the creators of this index could not have scaled this measure onto a 100-point scale rather than a 10-point scale. Familiarity too, in my book, favours the 100 scale.

The reality of the attainment gap is minimised by the reporting stylings. If instead of reporting a fall of 9.1% from 4.07 to 3.7, we reported that the attainment gap has shrunk by approximately 3.1%, down from roughly a 41% gap to a 37% gap, this seems a far more dramatic problem, and a less mollifying improvement.

EPI 2019

To see exactly how these deformations undermine our understanding of attainment gaps, we can look at what happens when an organisation with the opposite incentives take a run at the problem. In the very same year, 2019, the Education Policy Institute (EPI, 2019) published its 2019 Annual Report on the attainment gap in England.

We should note, first, that the EPI’s 2019 Report relates to 2018 data, while the DfE are working with 2019 data. It is already slightly frustrating, then, to see this framed as the “2019 Annual Report”, but to their credit the EPI authors make this abundantly clear from the outset. But the EPI headline figures present a gloomy outlook entirely at odds with the DfE’s rosiness. According to EPI, the attainment gap has stopped closing and is now beginning to widen. Indeed, per their analysis, based on five-year trends, they state that it will now take 562 years for the attainment gap to finally close, in the year 2581.

The DfE make little use of this way of representing the trends in the attainment gap, but the EPI put it front and centre in their report. This is hardly surprising: the change in the time to closure measure is staggering, making it perfect for pillorying the DfE’s record and calling for change. In 2015, EPI finds that the attainment gap will take 43 years to close. In 2018, that’s up to 562—a dizzying increase of 519 years in just a three-year window. It could be claimed that in only three years, Conservative education policies have set back the cause of educational equality by over half a millennium.

To their credit, the EPI hold back from using relative figures here which could have exacerbated the critique to an almost parodic level. But it merits the exercise to emphasise just how absurd a relative measure can be. From 2015-2018, we see a relative increase of 1300% in the time to closure. Extrapolating that the timeframe will continue to expand at the same rate, in just three years time, it will be 7,306 years until the attainment gap closes. For perspective, that’s longer than recorded human history to date. Best not to ask about the situation three years from then.

There is something murky here. How can the time to closure spiral so disastrously in three years? The measure of ‘time to close the gap’ acts as a way to amplify small changes so that they look massive. EPI reported several forms of attainment gap. At Key Stage 4, they report both the gap in GCSE average grade, and the gap only in GCSE English and Maths average grade. Let’s focus on GCSE English and Maths average grade, because this is the figure that yields that 500+ year time to close. In 2018, that gap was 18.1. The previous year, it was smaller: 17.9. The gap widened by 0.2. The report makes much of this widening. But EPI are unable to represent this as a negative trend overall in their 5-year figures, because the 2013 figure was 18.6. So, the gap is down over 5 years, but up in the last year. How could we make this seem as momentous as possible?

We can transform this from a small effect to an enormous one by extrapolating it centuries into the future. The five-year trend from 2013-2018 is a decrease of 0.5 in the attainment gap. But the five-year trend 2012-2017 was down a whole 1.0, because the 2012 figure is much higher (18.9) and the 2017 figure is the lowest gap value on record (17.9). So, in effect, by contrasting the five-year change 2013-2018 with the five-year change 2012-2017, we can make it look like the rate of change has halved in just one year.

Of course, much of this change is actually due to the pretty big drop between the 2012 and 2013 figures. It seems hard to justify suggesting that progress stalled in the last year due to something that happened in 2012. Moreover, if, as the DfE suggest, the attainment gap is narrowing, we would expect to see the pace of that narrowing decrease gradually over time. There are diminishing returns to our efforts. We would expect to see some big decreases early, following by a slowing of the pace. That is reflected in the figures. What is worrying is the recent growth in the attainment gap, which is reflected in both the EPI and DfE figures.

The EPI’s extrapolation technique compounds a slowdown in a trend and makes it appear much more impressive than the figures can bear out. Part of the reason this extrapolation is so misguided has already been explored above. But the EPI should surely expect slowing progress as the attainment gap narrows. What’s more, it seems unlikely that a full closure of the attainment gap would ever be reached. Closing the gap entirely means that exactly half of the advantaged students outperform exactly half of the disadvantaged students (and of course vice versa). But even if all of the barriers to attainment were closed for disadvantaged students, this is a matter of statistical fluke. By pure chance variation alone, we must expect that even in the absence of any bias or barriers, some years advantaged students would outperform disadvantaged students in the average, and other years the opposite would be true. Perfect equality of outcomes (at least as the DfE have conceptualised it) is vanishingly unlikely.

So how long will it take to “close” the attainment gap. We will never see the gap index ‘close’ in the sense of falling to and remaining at zero. We can only hope to reduce the influence of advantage on outcomes to the point that whether a student is advantaged or disadvantaged has no predictable impact on attainment. Ideally, knowing whether a student is advantaged or not would give no useful information for predicting their grade. We are seeking a situation in which, in the long run over many years, the average attainment gap is zero, not a mythic point at which the gap becomes and remains zero. For this reason, along with the others, a ‘time to close the gap’ measure based on the variation between individual gap scores is misleading, misguided and mistaken.

Predictable Variation in Attainment Gaps

This brings us to the final point. How concerned should we be that the EPI and DfE’s analyses have revealed that the rate of decrease in the attainment gap has apparently slowed, and in particular that in the last couple of years the gap has increased slightly compared to that impressive 2017 low-point?

This is where we need to engage with statistical analysis, and where simple figures are not enough. Do the figures that the DfE and EPI report require explanation? Do they suggest that something is happening—a trend or trends—which goes beyond mere year-to-year variation? These are the questions which the analysts should be asking of their data.

The changes we are looking at are quite small. In recent years, neither attainment gap index has moved all that much. We should not be asking whether the gap index has increased (it has) but whether the influence of disadvantage on attainment has increased. That is much harder to determine from these figures. We are not looking at a nice and well-behaved set of outcome data in which nothing else changes, in which the play of chance variation is limited.

An increase of 0.1 in the gap index seems entirely consistent with the hypothesis that the effect of disadvantage on attainment is unchanged. Equally, so would a decrease of 0.1 be. How big of a change do we need to see before we judge it to be large enough that it cannot be explained by chance alone, requiring we suppose that the influence of disadvantage on attainment has risen or fallen in England? That question remains open as far as the researchers here are concerned—and that is the problem, because answering that question is the task of statistical modelling in education research, which is almost entirely wanting in these reports.

Just as these small changes are consistent with the hypothesis that the influence of disadvantage on attainment is unchanged, so is a small increase in the attainment gap from one year to the next consistent with an overall trend in which the influence of disadvantage on attainment is falling. Even if what we are doing to close the gap is working, the play of chance and the slow speed of change in complex systems will mean that some years give the appearance of stalling or taking small retrograde steps. That does not mean that the trend in the influence of disadvantage has been inverted, though, just that we should acknowledge that attainment gaps do not measure the influence of disadvantage itself, but are filtering through the sludge to try to extract an elusive quantity from a vexatious quagmire.

From the DfE and EPI reports alone, in the absence of any statistical analysis to determine the likeliness of the figures seen given these competing hypotheses, it is only fair to say that we do not know whether the influence of disadvantage on attainment is falling or not. Only a serious programme of statistical analysis backed by well-informed modelling could attempt to elucidate and measure our confidence in these two contrary hypotheses. In the meantime, rather than engage with the statistical work necessary to explore and attempt to answer these questions scientifically, it is disheartening to see both organisations instead stoop to playing politics with the data, representing their figures to flatter or flatten the recent record.

We are not strangers to the play of politics in the representation of data. Manipulation and misrepresentation are endemic in scientific literatures, but still deserve exposure. The absence of basic statistical testing, though, is more severe an omission, especially when it remains a sector-wide omission. Until statistical rigour sufficient to try to separate signal from noise and test the consistency of the data with each organisation’s theory of choice becomes the norm in these reports and not the anomaly, there is little reason to view this work on measuring attainment gaps in England as scientific. A scientific approach would explicate and test theories. The current state of the art too often resorts to attempts to swing opinion through a blend of innuendo and misrepresentation.


Bibliography:

  • Barratt, A. et al. (2004) “Tips for learners of evidence-based medicine: 1. Relative risk reduction, absolute risk reduction and number needed to treat”, CMAJ, 171(4): 353-8
  • Bickel, P.J., Hammel, E.A. & O’Connell, J.W. (1975) “Sex bias in graduate admissions: data from Berkeley”, Science, 187(4175): 398-404
  • Department for Education (DfE) (2014) Measuring disadvantage pupils’ attainment gaps over time (updated), Statistical Working Paper SFR 40/2014 (London: Department for Education)
  • Department for Education (DfE) (2020) Key stage 4 performance, 2019 (revised) (London: Department for Education), available at: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/863815/2019_KS4_revised_text.pdf (accessed: 11/01/21)
  • Hutchinson, J. et al. (2019) Education in England: Annual Report 2019, July 2019 (Education Policy Institute)
  • Malenka, D.J. et al. (1993) “The framing effect of relative and absolute risk”, Journal of General Internal Medicine, 8: 543-8
  • Miller, M. (2016) The Ethnicity Attainment Gap: Literature Review (University of Sheffield Widening Participation Research & Evaluation Unit)
  • Office for Students (OfS) (2020) Degree attainment: Black, Asian and minority ethnic students, 27 July 2020, available at: https://www.officeforstudents.org.uk/advice-and-guidance/promoting-equal-opportunities/effective-practice/black-asian-and-minority-ethnic-students/ (accessed 11/01/21)
  • Rosenthal, R. & Jacobson, L. (1992) Pygmalion in the classroom: teacher expectation and pupils’ intellectual development (Carmarthen, White Horse Publishing)
  • Universities UK (2019) Black, Asian and Minority Ethnic Student Attainment at UK Universities: #Closingthegap (London: Universities UK)