I have written extensively on hierarchies of evidence in evidence-based medicine. The origin story of hierarchies of evidence is a little contentious. Several sources in EBM cite Campbell and Stanley’s 1963 classic “Experimental and Quasi-experimental Designs for Research” as containing the first hierarchy, or at least the germ of the idea. Campbell and Stanley offer a lengthy argument for the claim that experimental designs have fewer potential sources of invalidity than quasi-experimental and non-experimental designs. But their conclusion is measured and seems plainly inconsistent with a hierarchical approach to research design: “caution is needed about the tendency to use speciously convenient tables (…) the placing of specific pluses and minuses and question marks has been continually equivocal and usually an inadequate summary of the corresponding discussion” (p.71)
The idea of a hierarchy of evidence as we know it originated, then, with the 1979 Canadian Task Force on the Periodic Health Examination, not coincidentally associated with EBM pioneer David Sackett. This hierarchy will probably look very familiar to any modern connoisseur of the medical literature.
I have written plenty about how an idea like this one becomes entrenched over time, goes from a heuristic or guideline, an overall take which is not intended as a hard-and-fast rule, and becomes an unassailable fixture. Hierarchies of evidence are an intriguing case of this process, particularly given how they have continued to change and how different authors have consistently offered their own variants, yet accessed the historic precedent of past hierarchies as their primary justification – at least for everything other than their own tweaks. It’s almost unheard-of for an EBM text to actually attempt to justify the ranking chosen beyond a simple citation of a hierarchy that came before.
But EBM’s relationship with historic precedent and authority is a strange one, as I’ve pointed out. Especially in the early years of the movement, EBM’s success depended on a characterisation as a radical break with the past, repudiating authority and experience as a basis for practice. The hierarchy came later to EBM itself, entering the doctrine slowly, subtly. It was never justified or defended, either by theory or by evidence, coming along as a hangover of the authority of the originators of EBM.
However, it was recently pointed out to me by Pekka Louhiala that although the Canadian Task Force may well have offered us the first hierarchy of evidence that an EBM proponent could recognise or accept, that doesn’t mean there weren’t earlier attempts to rank and order evidence according to the underlying design. Indeed, there’s another ghost of progress waiting out there to be found. In his 2015 thesis on EBM in the Netherlands, Timo Bolt identifies a wholly different hierarchical evidence ranking that dates all the way back to 1912, due to Sir Almroth Wright (1861-1947). In true clickbait style, the way Wright ranks evidence will shock you:
Now, Wright can’t have have been thinking about randomised trials when he denigrated statistical methods. But in that case, Wright’s hierarchy showcases the folly of down-ranking a whole category of evidence sources: there’s no guarantee an innovative statistician won’t come along in the next couple of decades to solve a problem in agricultural science and create a new method which literally inverts the popular understanding of evidence. Wright’s reasoning, though, seems to have more to do with philosophical justification than, dare I say, the EBM ranking can muster. After all, a crucial experiment – if there is such a thing, and that’s contentious to the point of almost outright rejected by most philosophers of science – is at least decisive. The decisiveness of the evidence would seem to fall as we travel down Wright’s ranking. Looking at his hierarchy now, it’s hard to defend any of his distinctions – statistical vs. experiential, cumulative experiment as distinct from experience and observation, etc. But we see that the precedents here are ambiguous, culturally-bound and directly related to the specialisms of their proponents. After all, Sackett and his colleagues, pioneering EBM, were clinical epidemiologists – familiar with, impressed by and adept at statistical analysis. Wright was a bacteriologist, comfortable with laboratory science and biomechanical investigation, and impressed by the telling power of a single experimentum crucis. For him, the petry dish was more compelling and definitive than the statistical trend.
Wright’s approach to evidence, which he described as an “evidentiary hierarchy”, was motivated much like that of the EBM pioneers – the elimination of bias and error. However for Wright, different errors carried different weight. He distinguished between ‘functional errors’ – errors in technique in experimentation – and ‘mathematical errors’ in statistical inference. The minimisation of the former, for Wright, is the priority. Mathematico-statistical reasoning is downplayed.
For Wright, it is only with hands-on practical experience that the practitioner can learn and understand the principle, and through practical experimentation that progress is made. His textbook The Principles of Microscopy defends this, as historian J. Rosser Matthews noted: “If … the reader sees ground of complaint in the fact that he is required at every moment to put down the book and undertake an experiment, I would submit that no proposition is adequately apprehended until it has been invested in the apposite mental image.” It’s intriguing to see the parallels between Wright and the EBM movement here. After all, for EBM it is only in direct contact with the medical literature, engagement with the evidence and the data first-hand, that the practitioner can truly acquire medical knowledge sufficient to practice evidence-based medicine. Second-hand evidence and wisdom of authorities is, like book-learning to Wright, inadequate.
Despite his use of mathematics, Wright has disdain for statistical reasoning in medicine and was concerned that statistics in medicine did not provide understanding or insight at sufficient depth: “It cannot be too clearly understood that the mathematical statistician has no such secret wells of wisdom to draw from, and that his science does not justify his going one step beyond the purely numerical statement that – as computed by him from the data he has selected as suitable for his purposes – the probabilities in favour of a particular difference being or not being due to the operation of chance are such and such.” A caution, then, from the 1910s about the risks of confirmation bias and cherry-picking of data, of over-interpretation of statistical measures, of the limitations of purely quantitative research, and the risks of limiting medical evidence to the narrow range of statistical inference.
How reliable is Timo Bolt’s identification of this hierarchy? It’s not evident where the table he offers originates. It is either Bolt’s own summary of Wright’s thought or derived from a presentation by J. Rosser Matthews. Pekka Louihala investigated the provenance and found Bolt’s source, in the form of a 2002 chapter by Matthews entitled ‘Almroth Wright, Vaccine Therapy and British Biometrics‘ in the book ‘The Road to Medical Statistics‘. Matthews draws on a range of sources from Wright’s writings and letters here, and doesn’t offer that table himself, though the defence of such a ranking is visible. Ultimately, the hierarchy itself, if seen as such, is presented in a list form in plain text, much like the Canadian Task Force’s proto-hierarchy above. Wright’s evidentiary hierarchy (which Matthews terms a ‘procedural hierarchy’) originates in his co-authored 1912 Lancet paper, ‘Observations on the Pharmaco-Therapy of Pneumococcus Infections’ (December 21, 1912: p.1636).
Like the Campbell and Stanley origin story for hierarchies of evidence, then, the picture is muddier. Wright might likely object to his thought being represented in as crude an instrument as a hierarchy. A lesson remains for scholars of evidence, though: the way evidence is graded, ranked, sifted and appraised is changeable, culturally-bound and closely linked to disciplinary preferences, experiences and presumptions. Clinical epidemiology is not the only discipline which could try to impose a (false) ordering on the sprawl of medical evidence, and if a different specialism broke into the consciousness of the practitioner, the ranking could invert again.
Featured image: The History and Progress of the World (1913)