The Pyramid Schema: The Origins and Impact of Evidence Pyramids

The Pyramid Schema: The Origins and Impact of Evidence Pyramids

All things are defined by names. Change the name, and you change the thing.

Sir Terry Pratchett, Pyramids

There is a lesser-known riddle of the sphinx, lost to arcana, which goes: “When is a pyramid not a pyramid?” The answer: when it’s just a table in the guise of a pyramid.

In the late 1990s, evidence pyramids began to arise point-first from the literature of Evidence-Based Medicine (EBM), a movement within clinical medicine which has the goal of placing medical practice on sounder evidential footings. By the mid-2000s, evidence pyramids had bloomed across the EBM literature and were widespread across medical publications, textbooks and websites. Pyramids came to occupy as influential a position as the simple tables and lists which had preceded them, and to rival the prominence of the complex calculations of the more sophisticated evidence appraisal systems which arose alongside them. Evidence pyramids have come to define the visual imagination of EBM, to the point that a Google Images search in 2022 for “Evidence-Based Medicine” primarily returns evidence pyramids. A search for “Hierarchy of Evidence” is even more predominated.

Why are the pyramids here? Where did they come from? Who made them and what esoteric ancient wisdom do they represent? The origins of the evidence pyramids are masked by uncertainty. Few sources have attempted to identify the original evidence pyramid, and as a result, one of the most important artefacts of EBM comes from an unknown source. It is this lacuna that this paper will seek to fill. First, the place of evidence pyramids within the broader trajectory of evidence appraisal schemas within EBM is mapped out. Second, I trace the origins of evidence pyramids and identify the most likely candidate for the title of original EBM evidence pyramid. Third, I consider what if anything a pyramidal presentation of an evidence hierarchy offers compared to tabular hierarchies, beyond an aesthetic choice. Finally, I present an argument that the pyramidal turn in evidence appraisal systems has been a retrograde step for the EBM movement.

Since the mid-1990s, the Evidence-Based Medicine movement has constructed and promulgated evidence hierarchies, which usually attempt to rank the quality or strength of the evidence a study provides, according to the methodology used in the study (see Blunt, 2015). Traditionally, Randomised Controlled Trials (RCTs) rank at or close to the top, sometimes alongside or beneath systematic reviews and meta-analyses of RCTs. Forms of observational study such as cohort and case-control studies occupy the rungs beneath. Languishing at the foot of the table are case reports, and sometimes other forms of evidence such as expert opinion, mechanistic reasoning and biological rationale.

These hierarchies have their own complex and contested history. They considerably pre-date the unveiling of the EBM movement, which came to prominence with the publication of Evidence-based medicine: a new approach to teaching the practice of medicine in the Journal of the American Medical Association in 1992 (EBM Working Group, 1992). Some sources (e.g. Rychetnik et al. 2002; Glasziou, Vandenbrouke & Chalmers, 2004) cite Campbell and Stanley’s highly influential Experimental and Quasi-Experimental Designs for Research (1963) as the original hierarchy. However, as I have discussed elsewhere (see Blunt 2015; Blunt 2019), Campbell & Stanley do not offer a methodology-driven ranking and are explicit about not interpreting their work in this way. While offering inspiration for the idea, it would not be accurate to term Campbell & Stanley’s work as offering an evidence hierarchy.

Elsewhere, Joseph Vere and Barry Gibson (Vere 2018; Vere & Gibson 2020) make a strong case for considering Archie Cochrane’s classic Effectiveness and efficiency (Cochrane 1972) as the progenitor of evidence hierarchies. While Cochrane again offers some semblance of a ranking, placing RCTs ahead of observational studies, which in turn outrank other forms of evidence, I contend that Cochrane’s piece primarily defends the claim that only RCTs offer high-quality evidence, and that other forms of evidence lack merit – in this form, it could be considered a proto-hierarchy with two levels (see Blunt 2020). But Cochrane is an inspirational figure for many EBM proponents (see Daly 2005), and his Effectiveness and efficiency is, with little doubt, the philosophical and ideological underpinning of the formal, structured hierarchies that followed. Whether we consider it the inspiration for the first hierarchies, or the first hierarchy in itself, is moot.

The first hierarchy recognisable as such to a modern EBM proponent was proposed by the Canadian Task Force on the Periodic Health Examination (CTF) in 1979. This hierarchy has most of the features replicated across many subsequent rankings. It proposes three tiers, with tier II itself subdivided into tier II-1 and II-2, perhaps drawing inspiration from the British university classification schema. Tier I is reserved for evidence from RCTs. Tiers II-1 and II-2 are various forms of observational study: cohort, case-control and “comparisons between times or places” (CTF 1979, p.1195). Finally, Tier III contains opinion, clinical experience, descriptive studies and expert committee verdicts. Most later hierarchies are riffs or expansions of this core structure.

Hierarchies preceded EBM, yet EBM did not immediately adopt hierarchies. Key early texts in the development of the distinctive EBM philosophical foundations did not include hierarchies or make any explicit reference back to the pre-EBM hierarchies (see e.g. Sackett & Rosenberg 1995; Davidoff et al. 1995; Sackett et al. 1997). This is despite the most prominent foundational texts sharing a co-author (David Sackett) with almost all of the pre-90s early hierarchies (Canadian Task Force 1979; Anonymous 1981; Sackett 1986; Sackett 1989). Hierarchies began to cross over into the EBM mainstream half a decade later, in the mid-to-late 1990s, primarily through the Users’ Guides to the Medical Literature series (e.g. Guyatt et al. 1995; McAlister et al. 1999; Guyatt et al. 2000). This was codified into a book in 2001 by Gordon Guyatt and colleagues, in which the opening chapter, entitled “The Philosophy of Evidence-Based Medicine”, delineated an evidence hierarchy as one of the two fundamental philosophical pillars of EBM (Guyatt et al. 2001). Other prominent early EBM or EBM-adjacent sources which had considerable influence at this time included Trisha Greenhalgh’s critical (but nonetheless widely adopted and copied) presentation of an evidence hierarchy in How to read a paper (1997), the Australian NHMRC’s adoption of a hierarchy in their clinical practice guideline development process (ANHMRC 1999), and the 1998 Levels of Evidence published by Oxford’s Centre for Evidence-Based Medicine (CEBM) (Phillips et al. 1998). The heyday of the signature evidence hierarchies was the late 1990s. Its codification as a foundational element of EBM thought came swiftly after, in the early 2000s.


Unearthing the Pyramids

While the Evidence Based Medicine movement’s hierarchalists primarily spent the late 1990s tabulating evidence according to methodology, the 2000s represented a more experimental period, in which proponents of evidence rating, ranking and grading systems explored some creative methods for portraying their proposals. At this time, early evidence pyramids began to rise to prominence. In contrast to the stark tables and lists which preceded them, early evidence pyramids were often colourful, vivid illustrations. Take, for example, this multicoloured pyramid published online on SUNY Downstate Medical Center’s EBM Tutorial page:

Figure 2: Pyramid of evidence, often attributed to the SUNY Downstate Medical Center.
Usually cited as: Wagoner, B. et al. (2004) ‘Guide to Research Methods: The Evidence Pyramid’, in SUNY Downstate Medical Center: EBM Tutorial, formerly available at: http://library.downstate.edu/EBM2/2100.htm
See below for further detail on the authorship and origin of this image.

The rainbow colour-scheme and chunky MS Paint arrows make an immediate impression, and evoke an antithetical imagery to, for instance, the staid and academic style of previous evidence hierarchies, albeit conveying much the same information. For instance, consider this presentation of an evidence hierarchy from the same period, enshrined as one of the two philosophical precepts of EBM in the 2001 edition of the Users’ Guides to the Medical Literature, authored by EBM originator Gordon Guyatt and colleagues:

Figure 3: Hierarchy of evidence from:
Guyatt, G.H., et al. (2001) ‘The Philosophy of Evidence-Based Medicine’, in Guyatt, G.H. et al. (eds.) Users’ Guides to the Medical Literature, pp. 9-16

There are some notable differences here. The inclusion of N-of-1 randomized trials in Guyatt et al. (2001) is markedly distinct from the vast majority of hierarchies and never caught on within the broader literature. Meanwhile, the inclusion of “Animal research” and “In vitro (‘test tube’) research”, and the phrasing of “Ideas, Editorials, Opinions” in the SUNY pyramid similarly is atypical amongst EBM hierarchies. But the underlying idea is the same: a ranking of the evidence provided by studies, according to the underlying methodology of the study, which places RCTs and systematic reviews thereof at the top, and other forms of evidence below, going from observational studies down to expert opinion and mechanistic evidence.

Other evidence pyramids were more measured in appearance, but continued to offer visual flavour. Consider these widely-reprinted efforts:

Figure 4: Glover, J. et al. (2011) ‘EBM Page Generator’, from www.ebmpyramid.org
Also referenced as: Glover, J. et al. (2006) ‘EBM Pyramid and EBM Page Generator’ (Trustees of Dartmouth College and Yale University) – date unverified.
Figure 5: Melnyk, B.M. & Fineout-Overholt, E. (2005) Evidence-Based Practice in Nursing and Healthcare: A Guide to Best Practice (Philadelphia: Lippincoot, Williams & Wilkins)

These pyramids were not simple re-representations of tabular hierarchies. They often introduced novel components into the evidence appraisal processes. In each case, like Wagoner et al. (2004), these authors have introduced some quirks into their pyramidal hierarchy designs. The use of “Single descriptive or qualitative study” is unique to Melynk & Fineout-Overholt (2005), although several other sources had included qualitative evidence in some form within their rankings prior to this point (e.g. Gray 1997, Weightman, Barker & Lancaster 2000, Craven 2001, Weaver et al. 2002), and “Expert committee opinions” is a rare inclusion which harks back to the 1979 CTF hierarchy. Glover et al.’s (2006/2011) pyramid integrates Critically Appraised Topics (CATs) and Critically Appraised Individual Articles. These forms of synopsised evidence summaries are essentially included in Brian Haynes’ influential ‘4S’ and ‘5S’ hierarchies for pre-appraised evidence (Haynes 2001, 2005), albeit under the broader heading of ‘Synopses’, but not widely acknowledged elsewhere. Pyramids, like evidence hierarchies more broadly, tend to incorporate a core structure (RCT > Observational studies > Anything else) but equally embed the more idiosyncratic evidence preferences of their authors.

Since the mid-2000s, when evidence pyramids first emerged in earnest, there has been a proliferation of the mode of portrayal. However, frequently the sources for the specific rankings of evidence in the pyramids are either uncited, or cite earlier evidence hierarchies which deployed a similar ranking but which do not use a pyramidal construction. Few of the mid-2000s pyramids offer either a source for a pyramidal representation of their hierarchy, or their own justification for or explanation of the significance of a pyramid as a structure. An exception is Akobeng’s early (2005) evidence pyramid. Akobeng writes: “The pyramid shape is used to illustrate the increasing risk of bias inherent in study designs as one goes down the pyramid.” (2005, p.840)


The Origins of Evidence Pyramids

To date, no study has conclusively determined which evidence pyramid was the first to depict an EBM-style hierarchy in this novel format. I contend that the first evidence pyramid recognisable as such by EBM proponents is the SUNY Downstate pyramid (see Fig.2, above), which I attribute to Betty Wagoner and colleagues at SUNY and particularly at the Medical Research Library of Brooklyn.

Based on a systematic review completed in 2020, I identified 195 distinct evidence hierarchies, including evidence pyramids, ranging from 1979 to the present. Amongst these, the earliest evidence pyramid was found in complementary and alternative medicine researcher Wayne Jonas’s (2001) The evidence house. However, Jonas is not putting forward the evidence pyramid as a novel contribution, instead presenting it as a foil against which to develop his alternative model (the titular ‘Evidence House’) which rejects hierarchical evidence appraisal. The source which Jonas cites for his pyramid-shaped hierarchy is Sackett et al.’s Evidence Based Medicine: what it is and what it isn’t (1996). This source does not contain an evidence hierarchy in any form. No antecedent versions which match Jonas’s hierarchy have been identified, and none could be found through an exhaustive search based on the distinctive terminology employed in his presentation (e.g., “more causal research methods”, “less causal research methods” as descriptions of the high and low ranked evidence, respectively).

A rival for the claim of the earliest use of the pyramid form as an evidence ranking is the SUNY Downstate pyramid, which is generally cited as “SUNY 2004” in the EBM literature. The reasons for this date being offered in citations may vary, and may include processes of replication from previous works. The SUNY pyramid appears on a tutorial page for a SUNY course on the principles of EBM. Citing grey literature such as a tutorial page in an online course has rarely been well-standardised across the literature. In 2011, when I first accessed and locally archived the SUNY EBM Tutorial page which contained the image of the pyramid, the page included a footnote reading “Last updated January 6, 2004”. In the absence of further information regarding the nature of the updates made in 2004, or any records of previous iterations, 2004 seemed a reasonable best estimate for the date of publication.

However, further research suggested that the SUNY Downstate evidence pyramid must be older than the best available sources at the time could corroborate. This pyramid was cited by three papers which predate the 2004 reference to which the pyramid is most commonly ascribed within the EBM literature.

A paper by Evidence-Based Dentistry proponents Jane Forrest and Syrene Miller of the University of Southern California was detected during the systematic review, but was excluded from the results because it did not contain a novel hierarchy. This paper contained a black-and-white but otherwise identical version of the SUNY evidence pyramid, clearly attributed to SUNY. This paper was published in 2001, implying that the SUNY pyramid must already have been available in some form at that point (Forrest & Miller, 2001a). In a separate but similarly titled paper by Forrest & Miller (2001b) from the same year, the authors list some resources which would be of interest to dental practitioners wishing to learn more about evidence-based practice, in which they provide a URL for the SUNY evidence pyramid, listed as being accessed on 4th April 2001 (Forrest & Miller 2001b, p.55). This suggests that the pyramid was available online at some point prior to April 2001.

A subsequent literature search using the keywords “evidence pyramid” and “SUNY”, filtered for sources published prior to 2004, identified another 2001 paper in the British Journal of Social Work by the prominent British sociologist and social policy scholar Stephen A. Webb (2001). Webb criticises of the application of the ideas of evidence-based policy to social work, writing:

Indeed, the SUNY-based health sciences ‘Evidence-based medical course’ provides an ‘evidence-based pyramid’ which places methodologies such as randomized controlled trials, cohort and case control studies at the top, with ideas and opinions at the bottom of the pyramid.

Webb 2001, pp.65-66

Although not a complete description of the pyramid, and without any citation to help us determine its origin, Webb does independently corroborate the presence of such a tool in 2001.

One could, therefore, cite the pyramid to Forrest & Miller’s 2001 paper as the first publication of the pyramid within a healthcare journal (2001a). This would establish that the SUNY pyramid is at least contemporaneous to Jonas’s (2001) pyramid. However, it would be valuable to ascertain for long how the SUNY pyramid had been in circulation prior to the publication of Forrest & Miller (2001a; 2001b) and Webb (2001). Forrest & Miller themselves cite the source to 1997, albeit without further context and with the accessed date in 2001. It seems likely that the pyramid had been accessible for at least some time prior to the publication of these papers, in particular given that its influence had extended at least as far as Evidence-Based Dentistry proponents at the University of South California and a British social work scholar who at the time was working at the University of Bradford in the UK.

Using web archives to attempt to trace a more specific date uncovered three distinct and separately hosted versions of the SUNY tutorial in question: version 1.3 and version 2. Almost all citations seen in the EBM literature are to version 2, using the URL http://library.downstate.edu/ebm2/2100.htm, at which nothing currently remains. Using the Internet Archive’s Wayback Machine for this URL returns 31 captures of the page, dating between 2004-2021, the earliest of which is dated 23rd March 2004. That snapshot of page contains the “Last updated: January 6, 2004” footnote. However, by removing the ‘2’ from ‘ebm2’ in the URL, we are able to find the version 1.3 iteration of the page. For this adjusted URL (http://library.downstate.edu/ebm/2100.htm), the Internet Archive’s first imprint of the page is from 29th August 2002. Unfortunately, this page does not contain a “last updated” statement to indicate when prior to August 2002 the page was most recently altered. However, navigating to the Table of Contents for the SUNY EBM Tutorial version 1.3 in Wayback Machine shows an earliest imprint on 5th January 2001, on which the contents page lists “Last Updated: August 15, 2000.” That contents page lists “A Guide to Research Methods” including “Evidence Pyramid” as a linked component. This strongly suggests that the pyramid was available online in some form in 2000, predating Jonas (2001).

The second paper by Forrest & Miller (2001b) offers a third unique and distinct URL in their citation: http://www.servers.medlib.hscbklyn.edu/ebm/2100.htm. This page is hosted on the Medical Research Library of Brooklyn’s own specific servers, rather than the SUNY Downstate website. The first archived version of this page on the Internet Archive dates to 11th May 2000, and contains the full-colour image of the evidence pyramid, identical to the one displayed in the “Version 2” site. This would seem to definitely attribute the pyramid to, at the latest, May 2000.

The archived contents pages for the Medical Research Library of Brooklyn version of the course provides authorial information and an original date of publication for the tutorial as a whole. They write: “Copyright © SUNY (State University of New York), 1997.” and:

Original course material prepared by Betty Wagoner, Reference Librarian and Lead Content Specialist, and Martin Mellish, Co-author and Lead HTML Programmer. Other consultants for EBM examples used in the tutorial: Charles Hyman, MD, and Mary Doherty, Reference Librarian. Revisions to original course material by Dr. Andrea Markinson, Assistant Director for Educational Services.

First Design revisions by Anita Ondrusek, Mary Doherty, and Ki-tae Mok. Second Design revisions by Dr. Andrea Markinson.

Screen shots included in this program are used by permission from Ovid Technologies Inc. copyright 1997.

‘SUNY EBM Course – Author and Copyright Info’ (http://servers.medlib.hscbklyn.edu/ebm/author.htm), as archived in the Internet Archive, 27 October 2000.

Hence, I have cited this pyramid to Betty Wagoner et al. The earliest date of online publication of the evidence pyramid is 1997, but it is possible that the pyramid could have been a later addition in the revisions by Dr. Markinson or in the first or second design revisions, at some point between 1997 and May 2000. I have attempted to contact the various authors involved in the creation of the course material to determine a more precise timeline and identify the original creator of the evidence pyramid.

Amongst the team, the only individual who could be identified and responded to queries was Dr. Andrea Markinson, who is currently Director of the Evidence in Practice Information Center at SUNY Downstate. Dr. Markinson joined SUNY Downstate in 1997. In personal communication, Dr. Markinson confirmed that she did not create the evidence pyramid, that the pyramid was not added during any of the revisions of the tutorial in which she was involved, and indeed that the pyramid was already a part of the tutorial at the time at which she became involved with the project. This strongly suggests that the pyramid was created at the latest in 1997, and that it may have been a part of physical resources used to teach courses in evidence-based practice prior to the digitisation of the course in 1997. In personal communication, Christopher Stewart, who also joined SUNY Downstate in 1997 and maintains the course content, confirmed this recollection.

In summary, it seems that the SUNY Downstate/Medical Research Library of Brooklyn evidence pyramid considerably predates the other pyramidal presentations of evidence hierarchies. It was likely accessible online from 1997 onwards, and was prominent enough to attract the attention of a diverse range of researchers, extending beyond the core EBM community. The first appearance of this evidence pyramid in published journals was Forrest & Miller (2001a), but I endorse a citation to Wagoner et al. (1997) with the proviso that the design was likely created prior to the digitisation of the course, and that it is possible that none of the cited individuals were the direct creators of the image. Given that the pyramid was part of the course materials prior to the revisions by Dr. Markinson, the most likely creator is Betty Wagoner or Martin Mellish.


The Meaning of Evidence Pyramids

Given that the precise authorship of SUNY’s original evidence pyramid cannot be established, it is unlikely to be possible to determine what its authors intended by presenting the information in this form. It is possible that there was no further intent other than to offer an aesthetically pleasing presentation. It is also possible that the connotation of rising to higher and therefore superior levels was the only intention: perhaps a pyramid structure makes the direction of the ranking obvious at face-value in a way that may not always be as transparent in tabular form.

In some contexts, pyramidal representations are used to convey a decreasing number of instances available of the specific types in each successive tier. The breadth of the pyramid step is a heuristic representation of prevalence – in this case, the narrow the pyramid, the rarer evidence of that type would be. This will often be the case where ‘superior’ instances are more difficult to produce than lower-tier ones, and thus can be expected to be rarer. This explanation for the pyramidal shape has been promulgated in the secondary EBM literature. For instance, Yetley et al. (2016) reproduce an evidence pyramid and write:

“The pyramidal shape qualitatively integrates the amount of evidence generally available from each type of study design and the strength of evidence expected from indicated designs. In each ascending level, the amount of available evidence generally declines.”

(Yetley et al. 2016, p.11S).

Such an approach may reflect the design assumptions behind, for instance, Haynes’ ‘4S’, ‘5S’ and ‘6S’ hierarchies of pre-appraised evidence sources (see Fig. 6, below), which begins with individual studies, of which there is a vast proliferation, and moves up towards synopses and systematic reviews of studies – fewer studies have been synopsised than the full range of published materials – and systematic reviews draw together a wide range of studies, so will tend to be fewer still in number.

Figure 6: The ‘4S’ evidence pyramid (Haynes, 2001) of pre-appraised evidence, the basis of Haynes and colleagues’ systems (Haynes 2005; Dicenso, Bayley & Haynes 2009) for evaluating the quality of pre-appraised evidence sources, which is suggestive of a narrowing of the amount of sources from tier to tier.

This sometimes seems a viable interpretation. For instance, many hierarchies rank expert opinion at the lowest tier. There are certainly more clinicians and healthcare professionals expressing opinions on many matters than there are studies of any kind addressing them. However, particularly since the rise of Evidence-Based Medicine and the promulgation of randomised controlled trials as the highest quality evidence through precisely such hierarchies, RCTs have proliferated while observational studies have declined in popularity. Particularly for evidence pyramids which draw fine-grained distinctions amongst observational studies (separating cohort studies from case-control studies, for instance, into separate tiers), this interpretation seems unlikely to track with real-world prevalence. Certainly, this interpretation would seem inconsistent with the original pyramid of Wagoner et al. (1997), in which ‘Animal research’ and ‘In vitro (‘test tube’) research’ are included at a tier below ‘Ideas, opinions, editorials’, despite surely being less frequent.

In other pyramids, the relationship is one of dependency. Each level of the pyramid rests upon the one below. While reaching a higher point in the pyramid is preferable, one must either rise sequentially through the tiers, or must satisfy the lower levels in order to be able to satisfy the higher. In Maslow’s (1943) famous hierarchy of needs, one of the few other explicitly hierarchical structures which is commonly presented as a pyramid, each set of needs can only be satisfied once the tier of needs below have been met. Base tier needs are dependencies for second tier needs, and so on. So, while Maslow’s pyramid is hierarchical in the sense that the fulfilment of higher tier needs conveys a higher level of satisfaction than would lower-tier needs, this is inherent to the design. The question of whether fulfilling needs of self-actualisation is better or higher quality than fulfilling needs of safety is irrelevant, as self-actualisation needs cannot be satisfied unless safety needs are first met.

This interpretation has significant overlap with the narrowing range interpretation. As higher tiers are dependent upon lower tiers, there will tend to be fewer instances of the higher tiers than the lower. In the Maslow example, there must be fewer (or equally many) people whose second tier needs are met than people whose first tier needs are met. However, the situation is more complicated in medical research. Even if, say, an observational study of a particular treatment’s effects must be performed before an RCT then could take place, this would not necessitate that there are fewer RCTs than observational studies of the treatment – a single observational study might be a precursor for a range of independent RCTs.

There are some dependencies in evidence hierarchy structures. Many evidence hierarchies include not only primary studies but also systematic reviews and meta-analyses of studies. The ability to perform such reviews and meta-analyses are clearly dependent on the presence of sufficient primary studies. In principle, it is possible for there to be more systematic reviews in an area than there are primary studies (a systematic review could find that there are no studies, or multiple reviews could be independently performed on the same small set of primary studies). But dependency could be the intent of the pyramid structure even in separation from the narrowing prevalence interpretation.

However, more broadly, hierarchical structures have not tracked with this intuitive interpretation. This is not generally the sense in which the evidentiary hierarchy functions. Evidence hierarchies are not intended as a research trajectory to be followed, starting with low-level evidence and working one’s way up as a research programme. RCT evidence is not dependent (at least not in the sense intended by Maslow or by the EBM hierarchy authors) on lower-level evidence. One can conduct an RCT without any cohort studies having been conducted. It is unlikely that a sequential or dependency relation is intended for the majority of evidence pyramids, with the possible exception of Haynes’ (2001; 2005).

A final potential interpretation is that the breadth of the hierarchy might represent the extent of bias, imprecision or risk involved in using evidence at that tier, pseudo-analogous to the width of a confidence interval. While this seems an unintuitive reading, this does match with Akobeng’s (2005) aforementioned explanation of their own pyramid, to “illustrate the increasing risk of bias inherent in study designs as one goes down the pyramid” (2005, p.840). While consistent with the intended interpretation of most pyramids by design, this offers very little meaning beyond the simple rank order. As such, Akobeng’s minimal interpretation offers no practical merit of presenting a hierarchy as a pyramid beyond the aesthetics.

The intended interpretation notwithstanding, a pyramidal presentation may help to convey the sense of authority upon which EBM has often depended in cementing such evidence appraisal systems with medical practice. The uncertainty of the provenance exempts the underlying assumptions of the pyramid from critique, as the hierarchy can be ascribed to historical precedent (see Blunt 2015, ch.2).

The resemblance to other fixtures within epidemiological theory may also suggest inspiration for the design, and allow proponents to draw on the heuristic of acceptance of familiar forms. For instance, John Last’s ‘iceberg’ model of disease (Last, 1963) is a prominent way to represent hidden and subclinical suffering in a visual manner which has received thousands of citations in the epidemiological literature (Last, 2013). Last describes the popular appeal of such a structure as “durable and useful”, and a metaphor which is “a valuable communications aid, immediately grasped by everyone” (2013, p.1613). Last’s iceberg is frequently depicted as a pyramid (cf. Houben et al. 2022; Pfeiffer, 2002 – see Fig. 7, below), and has been reformulated as the “disease prevalence pyramid” (e.g. Zuccon et al., 2015).

Figure 7: An example of Last’s iceberg model of disease in a pyramidoid depiction from Pfeiffer’s Veterinary Epidemiology (2002).

Elsewhere, pyramids have been used in epidemiology, health sciences and healthcare policy in ranking and guiding actions, and to support political cases for prioritisation and funding. For instance, the US Department of Health and Human Services’ 1994 For a Healthy Nation report promulgated the four-tier “Health Care Pyramid”, from ‘Tertiary health care’ at the top down through secondary and primary care, to a wide base of ‘Population-based public health services’ at the bottom (Gold et al. 1994). Confusingly, this pyramidal model is used to convey the underfunding of population-based health services at the base of the pyramid. The pyramid showcases that 99% of healthcare funding goes to the upper levels and only 1% to the base. The breadth of the bottom level here is used in criticism of the mismatch between funding priorities and fundamentality to health provision. In that sense, their pyramid is an inversion of prestige when compared to EBM’s usage: the foundations are fundamental but underappreciated and the upper tiers are overvalued.

In summary, there may be some rhetorical value in associating hierarchies with recognisable visual structures and some kudos suggested by a pyramidal structure over and above that of a simple table or list. But the common visual interpretations of a pyramidal format which would convey additional information beyond that offered by a table are inapplicable to or inconsistent with the ways in which evidence hierarchies are used in EBM. In effect, there is little practical or philosophical difference between a pyramid and a table as approaches to express a ranking of evidence according to underlying methodology that would justify the choice of a pyramid structure rather than a table or list.


The Curse of the Pyramids

However, even if the pyramid format offers little in the way of additional information conveyed beyond that of tables, there remain several drawbacks of the form. I close by offering two deficiencies of pyramidal hierarchies which diminish the capacity and usefulness of a hierarchy so presented, to the conclusion that evidence pyramids are a regressive step in the evolution of evidence hierarchies.

First, evidence pyramids are less apt to presenting conditional rankings. As such, evidence pyramids are far more likely to adopt a primarily or entirely unconditional ranking than would be available in a non-pyramidal presentation. I previously (Blunt 2015) distinguished between unconditional and conditional rankings in evidence hierarchies. Hierarchies primarily rate or rank evidence according to the method used to produce it. But a great many also make reference to other factors, by way of which they can offer a more fine-grained set of distinctions and acknowledge that not all studies of a given methodology produce evidence that is equally strong or of equal quality. Conditions are requirements other than the type of underlying methodology which produced a study, which are factored into the rating or ranking of the evidence within a hierarchical structure.

A wide range of conditions have been appended to qualify for a higher ranking in evidence hierarchies. Some are themselves evaluative. For instance, the original Canadian Task Force (1979) hierarchy requires that cohort or case-control studies be “well designed” to qualify for level II-1. Others require optional methodological features such as blinding (e.g. Davies & Nutley 1999). The size of studies or the precision of the results are frequently included as conditions, following Sackett’s influential Chest hierarchies (1986; 1989) which differentiates “large” and “small” studies, and evidence “with clear-cut results” and “with uncertain results”. Provenance is another common condition; for instance, the much-replicated LaForce (1987) hierarchy refers to studies “preferably from more than one center or research group”. Conditions can often be complex and lengthy, as exemplified in Cook et al.’s (1992) hierarchy which includes conditions such as “the lower limit of the confidence interval for the effect of treatment exceeds the clinically significant benefit”, “individual study results are homogeneous”, and “with low false-positive (alpha) and low false negative (beta) errors”.

While some simple conditions can be applied within a pyramid structure, the aesthetics of the design tends to limit the prevalence and level of detail of conditions applied. Particularly at the higher levels, as the pyramid narrows, there is seldom physical space to offer a nuanced set of conditions. Thus, the design choice may constrain the sophistication of a hierarchy. From the start, hierarchies have included conditions. But the pyramidal turn coincides with, and may have contributed to, the paring back of these conditions with EBM hierarchies. Reduced conditionality is a significant contributor to the oversimplification of evidence appraisal processes using evidence hierarchies.

It is particularly problematic that the pyramidal structure discourages conditionality at the highest echelons of the ranking. Most conditions are positively framed – evidence must meet the condition to attain a higher rank, rather than face downgrading for meeting an undesirable condition of a lower rank. This is sensible from the perspective of ensuring clarity in the ranking. For instance, if a ranking of tier 2 is reserved for “well-designed cohort studies”, while tier 3 is “cohort studies”, then it is clear where any cohort study ranks: it satisfies the conditions for 2 and 3, so is ranked at the highest of level which it satisifies, tier 2. But if a ranking of tier 2 lists “cohort studies” while tier 3 lists “poorly designed cohort studies”, then ambiguity is introduced. A poorly designed cohort study still fulfils the criteria for tier 2 – it remains a cohort study nonetheless. Moreover, a higher ranking corresponds to a higher rating of quality or strength, and we would expect more criteria to be applicable to attain such a ranking. As such, higher echelons should tend to attract more conditions, for both practical and theoretic reasons. Pyramidal schemas push against this natural property, and thus discourage conditionality as a consequence of design.

This trend is borne out across the history of evidence hierarchies. Prior to 1997, when it seems likely the first evidence pyramid was introduced, only one major evidence hierarchy was unconditional (Anonymous, 1981), amongst a preponderance of conditions. Around the turn of the millennium, unconditional hierarchies proliferated (e.g. McAlister et al. 1999; Briss et al. 2000; Craven 2001; McLeod 2001; McAlister & Sackett 2001). It would be premature to infer a causal relationship here – the aforementioned unconditional hierarchies are primarily in list or tabular form, and the turn towards pyramids could equally be informed by the reduction in complexity of hierarchies as opposed to a driver of that trend towards simplification.

But a dearth of conditions is a theme amongst evidence pyramids. The only hint of conditionality in Wagoner et al.’s (1997) SUNY pyramid is the inclusion of “double blind” in the description of RCT evidence – an optional methodological feature. However, there is no tier for unblinded or single-blind RCT evidence, suggesting that the author of the SUNY Downstate pyramid might simply equate RCT evidence with double-blind RCT evidence. Similarly, the early evidence pyramids are all entirely unconditional: Jonas (2001), Haynes (2001; 2005; DiCenso, Bayley & Haynes 2009), Akobeng (2005), Dagenais et al. (2006), Daly et al. (2007), Sprague, McKay & Thoma (2008), Bigby (2009), Crosswell & Kramer (2009) and Glover et al. (2006/2011) all provide unconditional evidence pyramids.

The first evidence pyramids to include minor conditions did not emerge until 2008, with Mantzoukis’s second tier reading “At least one well-conducted RCT” (2008). It is not until 2011 that a evidence pyramid included conditions at multiple levels; Tomlin & Borgetto’s inventive 2011 ranking includes a top-down view of a three-sided pyramid, with conditions including “related”, “blinded” and “prolonged engagement with patients” included at different tiers. But to achieve this, and despite their aesthetic innovation with the form, Tomlin & Borgetto feel compelled to include a separate table which replicates the pyramid in a traditional list-style hierarchy to give the full detail of each ranking. The pyramid itself omits the conditions. Almost every other evidence pyramid since has remained unconditional, or includes at most one or two brief conditions, primarily distinguishing amongst categories at the lower reaches of the pyramid.

A second drawback of the pyramid format is the inability to convey a non-categorical ranking in this design. In Blunt (2015), I distinguish categorical from non-categorical rankings, insofar as a categorical ranking implies that all evidence which meets the criteria for a particular rank or rating in a hierarchy definitively receives that rank or rating. The defining feature of a non-categorical hierarchy is that it is possible for some lower-ranked evidence to nonetheless be higher quality (or stronger, etc.) than some higher ranked evidence. In non-categorical hierarchies, either evidence might move up or down the rating scale according to further criteria (as in the popular GRADE systems, see e.g. GRADE Working Group 2004; Balshem et al. 2011), or the hierarchy is explicitly intended to interpreted with some modifiers attached – for instance, that tier 1 ranked evidence ‘tends to be high quality’ or ‘has a high likelihood to produce strong evidence’, etc. Non-categorical hierarchies avoid many of the philosophical criticisms directed towards hierarchical approaches to evidence appraisal, and are generally more flexible and defensible in the context of the highly variable character of the medical research literature.

While one could find inventive graphical ways to depict a non-categorical evidence pyramid, the format is not suited to this more nuanced understanding of medical evidence. Of the vast range of distinct evidence pyramids found across the medical literature, only one has ever offered a suggestion of a non-categorical interpretation: Murad et al.’s New Evidence Pyramid (2016) offers a wavy-tiered pyramid to convey that lower-ranked evidence sources are not necessarily always inferior to higher-ranked ones (see Fig.8, below).

Figure 8: Murad et al. (2016) present the only evidence pyramid which depicts the possibility of lower-ranked evidence being superior to higher-ranked evidence in some instances.

Nonetheless, Murad et al.’s pyramid is unusual, subverting the expected aesthetic of a pyramid, and may be more open to interpretation than other non-categorical approaches. The wavy-tiered approach also can only accommodate a level of flexibility between two neighbouring tiers – in comparison, the GRADE framework allows for evidence which starts at a particular tier based on the underlying methodology to ultimately reach any level within the ranking based on a compounding of multiple independent conditions (see Balshem et al. 2011). This flexibility is difficult to envision within a pyramid framework, and has never been achieved.


Conclusion

In conclusion, the origins of evidence pyramids can be traced back at least to 1997, and such pyramids have proliferated considerably across the Evidence-Based Medicine literature in the years since, becoming one of the most recognised artefacts of the movement. Yet a pyramidal representation of an evidence hierarchy represents a retrograde step in the development of evidence appraisal systems.

The pyramid structure offers few benefits in conveying additional information beyond the list or tabular formats that were previously preferred. The use of pyramids in scientific and social scientific visualisations of structures and systems has usually conveyed either a dependency of higher levels upon the lower, or a narrowing of the range or prevalence of instances of each type for each progressing level – or frequently both. Neither of these interpretations is useful or accurate to the intended interpretation of evidence hierarchies.

Meanwhile, a pyramid structure narrows the space for more nuanced and sophisticated versions of hierarchical appraisal schemas. By cramping the space available towards the zenith of the structure, authors who adopt a pyramid format are discouraged by design from introducing conditions to achieve the highest levels. Pyramid structures also make it difficult to visually depict any flexibility between the levels to convey a more nuanced non-categorical interpretation of the ranking. Evidence pyramids have typified a simplification of the appraisal system, both in comparison to the hierarchies which preceded them (e.g. Canadian Task Force 1979; Sackett 1986; LaForce 1987; Cook et al. 1992; Guyatt et al. 1995; Greenhalgh 1997) and in comparison to other contemporaneous trends within hierarchy development, such as the move towards more flexible designs such as the GRADE framework (GRADE Working Group 2004; Balshem et al. 2011). Overall, with very few exceptions, the pyramidal turn in evidence hierarchy presentation is a negative force which detracts from the level of sophistication of evidence appraisal techniques promulgated in EBM, and counteracts the movement towards less rigid appraisal systems which move beyond the hierarchical model.


Bibliography:

  • Akobeng, A. K. (2005). “Understanding randomised controlled trials.” Archives of disease in childhood, 90(8), 840-844.
  • Anonymous (acknowledged to be authored by David Sackett) (1981) “How to read clinical journals: IV. To determine etiology or causation”, CMAJ, 124, 985-990.
  • Australian National Health and Medical Research Council (ANHMRC). (1999) A Guide to the Development, Implementation and Evaluation of Clinical Practice Guidelines. Commonwealth of Australia: available at: http://www.health.gov.au/nhmrc/publicat/synopses/cp30syn.html
  • Balshem, H., et al. (2011) “GRADE guidelines: 3. Rating the quality of evidence”, J Clin Epidemiol, 64(4), 401-406
  • Bigby, M. (2009) “The Hierarchy of Evidence”, in Williams, H., Bigby, M., Diepgen, T., Herxheimer, A., Naldi, L., & Rzany, B. (Eds.). (2009). Evidence-based dermatology. (John Wiley & Sons.)
  • Blunt, C.J. (2015) Hierarchies of Evidence in Evidence-Based Medicine (PhD Thesis) London School of Economics and Political Science, available at: http://etheses.lse.ac.uk/3284/
  • Blunt, C.J. (2019) A Ghost of Progess: how hierarchies become fixtures, available at: http://cjblunt.com/a-ghost-of-progress-how-hierarchies-become-fixtures/
  • Blunt, C.J. (2020) Random Reflections: Cochrane and the Origins of Hierarchies, available at: http://cjblunt.com/random-reflections/
  • Briss, P. A., et al. (2000) “Developing an evidence-based Guide to Community Preventive Services–methods. The Task Force on Community Preventive Services”, Am J Prev Med, 18(1 Suppl), 35-43
  • Campbell, D.T. & Stanley, J.C. (1963) Experimental and quasi-experimental designs for research (Chicago, IL: Rand McNally & Co.)
  • Canadian Task Force on the Periodic Health Examination (1979) “The Periodic Health Examination”, Canadian Medical Association Journal, 121(9): 1193-1254
  • Cochrane, A. (1972) Effectiveness and efficiency: Random reflections on health services (Nuffield Trust)
  • Cook, D. J., et al. (1992) “Rules of evidence and clinical recommendations on the use of antithrombotic agents”, Chest, 102(4 Suppl), 305S-311S
  • Craven, O. (2001) “Screening for colorectal cancer using the faecal occult blood test: a critical literature review.” European Journal of Oncology Nursing, 5(4), 234-243.
  • Croswell, J.M., & Kramer, B.S. (2009) “Clinical trial design and evidence-based outcomes in the study of liver diseases.” Journal of hepatology, 50(4), 817-826.
  • Dagenais, S., Tricco, A. C., Bian, Z. X., Huang, W. H., & Moher, D. (2006) “Critical appraisal of clinical studies in Chinese herbal medicine.” Zhong xi yi jie he xue bao, Journal of Chinese integrative medicine, 4(5), 455-466.
  • Daly, J. (2005) Evidence-based medicine and the search for a science of clinical care. (Berkeley, CA: University of California Press)
  • Daly, J., et al. (2007) “A hierarchy of evidence for assessing qualitative health research”, J Clin Epidemiol, 60(1), 43-49
  • Davidoff, F., et al. (1995) “Evidence based medicine”, BMJ, 310(6987), 1085-1086.
  • Davies, H.T.O. & Nutley, S.M. (1999) “The Rise and Rise of Evidence in Health Care”, Public Money & Management, 19(1): 9-16
  • DiCenso, A., Bayley, L. & Haynes, R.B. (2009) “ACP Journal Club. Editorial: Accessing preappraised evidence: fine-tuning the 5S model into a 6S model”, Ann Intern Med, 151(6), JC3-2, JC3-3
  • Evidence-Based Medicine Working Group (1992) “Evidence-based medicine. A new approach to teaching the practice of medicine.” JAMA268(17), 2420–2425.
  • Forrest, J.L. & Miller, S.A. (2001a) “Enhancing your practice through evidence-based decision making: Finding the best clinical evidence”, Journal of Evidence-Based Dental Practice, 1(3): 227-36.
  • Forrest, J.L. & Miller, S.A. (2001b) “Enhancing your practice through evidence-based decision making”, Journal of Evidence-Based Dental Practice, 1(1): 51-7.
  • Glasziou, P., Vandenbroucke, J. & Chalmers, I. (2004) “Assessing the quality of research”, BMJ, 328, 39-41.
  • Glover, J. et al. (2011) “EBM Page Generator”, from www.ebmpyramid.org
  • Gold, M. et al. (1994) For a Healthy Nation: Return on Investments in Public Health (US Public Health Service & US Department of Health and Human Services: Hyattsville, MD.)
  • GRADE Working Group. (2004) “Grading quality of evidence and strength of recommendations”, British Medical Journal, 328, 1490
  • Gray, J.A.M. (1997) Evidence-Based Healthcare. (NYC: Churchill-Livingstone)
  • Greenhalgh, T. (1997) How to read a paper: the basics of evidence based medicine. (London: BMJ Pub. Group)
  • Guyatt, G.H., et al. (1995) “Users’ guides to the medical literature. IX. A method for grading health care recommendations. Evidence-Based Medicine Working Group”, JAMA, 274(22), 1800-1804
  • Guyatt, G.H., et al. (2000) “Users’ Guides to the Medical Literature: XXV. Evidence-based medicine: principles for applying the Users’ Guides to patient care. Evidence-Based Medicine Working Group”, JAMA, 284(10), 1290-1296
  • Guyatt, G.H., et al. (2001) “The Philosophy of Evidence-Based Medicine”, in Guyatt, G.H. et al. (eds.) Users’ Guides to the Medical Literature (Chicago, IL: American Medical Association)
  • Haynes, R.B. (2001) “Of studies, syntheses, synopses, and systems: the 4S evolution of services for finding current best evidence”, ACP J Club, 134(2), A11-13
  • Haynes, R.B. (2005) “Of studies, syntheses, synopses, summaries, and systems: the 5S evolution of information services for evidence-based health care decisions”, ACP J Club, 145(3), A8-A8
  • Houben, R.M.G.J. et al. (2022) “Tuberculosis prevalence: beyond the tip of the iceberg”, Lancet: Respiratory Medicine, 10(6): P-537-9.
  • Jonas, W.B. (2001) ‘The evidence house: How to build an inclusive base for complementary medicine’, in Western Journal of Medicine, 175: (2) 79-80
  • LaForce, F. M. (1987) “Immunizations, immunoprophylaxis, and chemoprophylaxis to prevent selected infections. US Preventive Services Task Force”, JAMA, 257(18), 2464-2470
  • Last, J.M. (1963) “The Iceberg: ‘Completing the clinical picture’ in general practice”, Lancet, 2: 28-31.
  • Last, J.M. (2013) “Commentary: The iceberg revisited”, International Journal of Epidemiology, 42(6): 1613-5.
  • Mantzoukas, S. (2008). “A review of evidence‐based practice, nursing research and reflection: levelling the hierarchy.” Journal of clinical nursing17(2), 214-223.
  • Maslow, A.H. (1943) “A Theory of Human Motivation”, Psychological Review, 50: 370-396.
  • McAlister, F. A., et al. (1999) “Users’ Guides to the Medical Literature: XIX. Applying clinical trial results B. Guidelines for determining whether a drug is exerting (more than) a class effect”, JAMA, 282(14), 1371-1377
  • McAlister, F. A., & Sackett, D. L. (2001) “Active-control equivalence trials and antihypertensive agents.” American journal of medicine, 111(7), 553-558.
  • McLeod R.S. (2001) “Evidence-Based Surgery.” In: Norton J.A. et al. (eds) Surgery. (Springer, Berlin, Heidelberg)
  • Melnyk, B.M. & Fineout-Overholt, E. (2005) Evidence-Based Practice in Nursing and Healthcare: A Guide to Best Practice (Philadelphia: Lippincoot, Williams & Wilkins)
  • Murad MH, Asi N, Alsawas M, & Alahdab F. (2016) “New evidence pyramid.” Evid Based Med. 21(4): pp.125-7.
  • Pfeiffer, D.U. (2002) Veterinary Epidemiology: An Introduction (Royal Veterinary College: Hertfordshire, UK)
  • Phillips, R. et al. (1998) “Levels of Evidence”. Centre for Evidence-Based Medicine, www.cebm.net. [no longer accessible online – pdf available on request to the author]
  • Rychetnik, L., et al. (2002) “Criteria for evaluating evidence on public health interventions”, J Epidemiol Community Health, 56(2), 119-127.
  • Sackett, D.L. (1986) “Rules of evidence and clinical recommendations on the use of antithrombotic agents”, Chest, 89(2 Suppl), 2S-3S.
  • Sackett, D.L. (1989) “Rules of evidence and clinical recommendations on the use of antithrombotic agents”, Chest, 95(2 Suppl), 2S-4S.
  • Sackett, D.L. & Rosenberg, W.M. (1995) “On the need for evidence-based medicine”, J Public Health Med, 17(3), 330-334.
  • Sackett, D.L., et al. (1996) “Evidence based medicine: what it is and what it isn’t”, British Medical Journal, 312: 71-72.
  • Sackett, D.L., et al. (1997) Evidence-based medicine: how to practice and teach EBM. (New York; Edinburgh: Churchill Livingstone)
  • Sprague, S., McKay, P., & Thoma, A. (2008) “Study design and hierarchy of evidence for surgical decision making.” Clinics in plastic surgery, 35(2), 195-205.
  • Tomlin, G. & Borgetto, B. (2011) “Research Pyramid: A New Evidence-Based Practice Model for Occupational Therapy.” American Journal of Occupational Therapy, 65, pp.189-196.
  • Vere, J. (2018) Evidence based medicine: a critical analysis [PhD Thesis] University of Sheffield
  • Vere, J. & Gibson, B. (2020) “Variation amongst hierarchies of evidence”, Journal of Evaluation in Clinical Practice, 1-7
  • Wagoner, B. et al. (1997) “Guide to Research Methods: The Evidence Pyramid”, SUNY Downstate Medical Center: Medical Research Library of Brooklyn, EBM Tutorial, version 1.3, first known publication at latest 11 May 2000, archived at https://web.archive.org/web/20000511202739/http://www.servers.medlib.hscbklyn.edu/ebm/2100.htm
  • Wagoner, B. et al. (2004) “Guide to Research Methods: The Evidence Pyramid”, in SUNY Downstate Medical Center: EBM Tutorial, formerly available at: http://library.downstate.edu/EBM2/2100.htm
  • Weaver, N., Williams, J. L., Weightman, A. L., Kitcher, H. N., Temple, J. M. F., Jones, P., & Palmer, S. (2002) “Taking STOX: developing a cross disciplinary methodology for systematic reviews of research on the built environment and the health of the public.” Journal of Epidemiology & Community Health, 56(1), 48-55.
  • Webb, S.A. (2001) “Some considerations on the validity of evidence-based practice in social work”, British Journal of Social Work, 31: 57-79.
  • Weightman A.L., Barker J., & Lancaster J. (2000) “A systematic approach to identifying the evidence. Project Methodology 3.” Health Evidence Bulletins Wales. Cardiff: Department of Information Services, University of Wales College of Medicine
  • Yetley, E.A. et al. (2016) “Options for basing Dietary Reference Intakes (DRIs) on chronic disease endpoints: report from a joint US-/Canadian-sponsored working group”, American Journal of Clinical Nutrition, 105(1): 249S-285S.
  • Zuccon, G. et al. (2015) “Automatic detection of tweets reporting cases of influenza like illnesses in Australia”, Health Information Science and Systems, 3: S4-13.

Acknowledgements:

My thanks go to Prof. Burt Gerstman who first raised the question with me of the origins of evidence pyramids specifically. Most sincere thanks are due to Pekka Louhiala who also posted about the origins of evidence pyramids on the Philosmed mailing list, and for their correspondence and effort in researching this topic and other hierarchies of interest. Thanks also to Dr. Andrea Markinson and Christopher Stewart of SUNY Downstate for their recollections of the SUNY pyramid. Thanks to Jeanne Dutton and the Bloomington Watercolor Society for their help in my ultimately fruitless mission to track down Betty Wagoner, the arcane details of which could be the subject of their own monograph.

Most recent update: 23/01/23 (addressed typo)