The Jurassic Critique of Micozzi on Evidence Hierarchies

In giant AI Language Model news, AI startup AI21 Labs have just released a public demo of their new language model, Jurassic-1. The model clocks in at around 178 billion parameters, putting it basically on a par with the muchdiscussed GPT-3 by raw parameter numbers. Although it doesn’t seem to have as much behind it as GPT-3 itself, the big difference is that there’s currently no need to apply and wait in line for access to the API, unlike GPT-3 which is in closed-access mode. Using AI21 Studio (albeit having to make an account and do a dubiously unnecessary phone verification), you can play around with Jurassic-1 directly in the browser, without needing any coding know-how. This is something that’s been in short supply in this new generation of Language Models, with the likes of huggingface and TalktoTransformer being based on GPT-2 equivalents.

Far be it from me to miss an opportunity to generate something philosophical with a new Language Model. I fed it a bit of my most recent work (bolded in the text below), and it generated a very interesting and detailed analysis of Marc Micozzi’s views on evidence hierarchies. Note how well it draws on associated jargon that isn’t included in the prompt text, first that it switches freely between ‘evidence hierarchy’ and ‘hierarchy of evidence’, and second that it quickly introduces concepts such as confounding, selection bias, prior probabilities and effect sizes.

So is the Jurassic Critique of Marc Micozzi’s views any good? It’s hard to tell. Marc Micozzi is (I am genuinely surprised to tell you) a real person, who has written about complementary and alternative medicine, medical anthropology and epidemiology. Having briefly read some of Micozzi’s work, I will say that the views attributed to him by the neural network here seem considerably more nuanced and philosophically sophisticated that one would expect from Micozzi. They do, however, reflect an interesting option for a philosophical position on evidence hierarchies, which we might call a piecemeal or stratified approach. Pseudo-Micozzi – or Jurassic Micozzi, if you prefer – takes a line in which we use evidence hierarchies where there is high risk of confounding, but switch increasingly towards pluralistic and holistic approaches to medical evidence where the risk of confounding is lower. Although it’s expressed rather strangely, this view actually has some things going for it. It’s also an option that I haven’t explored in my work on evidence hierarchies. So maybe Jurassic Micozzi has just spawned a novel philosophical theory of evidence appraisal in medicine!

At conferences, I frequently encounter the objection that my pluralist stance and rejection of evidence hierarchies opens the door to charlatanism. Opponents of complementary and alternative medicine often feel that evidence hierarchies are a bulwark against attempts to use a melange of weak studies to substantiate effectiveness claims for therapies which have no scientific basis. Pluralism, they warn, will make it easy for anyone to cobble together a hodgepodge of studies and present them as evidence for the efficacy of a treatment.

While I think evidence hierarchies are only one part of the answer to the problem alleged, I also agree that pluralism doesn’t guarantee trustworthiness. I don’t know if Marc Micozzi has read anything I’ve written, but he’s written a very good paper about an important topic that dovetails nicely with some of my own work. In it, he tackles the question of whether there is anything that can provide the reliability that evidence hierarchies are supposed to provide in the absence of a hierarchy of evidence.

Micozzi begins by noting, correctly, that one of the advantages of evidence hierarchies is that they prevent any study from counting for more than its “true” value. As evidence for this, he cites my argument that “we simply can’t simply add up the results of the studies that are out there” and conclude that homeopathy works.

Micozzi then provides a useful taxonomy of different factors which can affect a study’s outcome. He identifies four broad categories:

(1) Study quality,
(2) Confounding factors,
(3) Selection bias, and
(4) Publication bias.

The first three of these categories have to do with factors that are specific to a given study. Micozzi defines study quality as “the probability that a study’s results accurately reflect the intervention’s effect,” and argues that it (and its close cousin, confounding factors) are “the most fundamental and pernicious problems in evaluating evidence.”

Micozzi’s discussion of confounding factors is insightful, but incomplete. He notes that, when confounding factors are present, the study’s results may indicate that an intervention is effective, even when the intervention isn’t. He notes that, when confounding factors are present, even large, well-designed studies “can still provide misleading results.” What he doesn’t note is that, when confounding factors are present, the magnitude of the effect may also be exaggerated. This, in turn, can lead researchers to overestimate the efficacy of an intervention.

This is relevant, because Micozzi says that, if he thinks there’s a risk of confounding factors, “I will abandon my prior,” and “take the results from the study with the smallest possible effect sizes, and otherwise use the largest available effect size.” In other words, if there’s a high risk of confounding factors, Micozzi will stick to evidence hierarchies, and rely on a single study—or a small collection of studies—to substantiate an effectiveness claim. But, if there is only a moderate risk of confounding factors, Micozzi will use a variety of studies, and, if the available studies are small, he’ll rely on effect sizes that are larger than those provided by a single high-quality study.

What’s missing from Micozzi’s discussion is an acknowledgment that, if confounding factors are present, the quality of the studies, and the magnitude of the effect, may not matter. This, in turn, suggests that relying on evidence hierarchies—as Micozzi advocates—may be misguided, and that, in the absence of a hierarchy of evidence, using a variety of studies, and relying on effect sizes (rather than study quality), may be the best way to proceed, even if there is a risk of confounding factors.

But is this the best approach? Micozzi notes that, in the absence of a hierarchy of evidence, his pluralist approach “is not sufficient to guarantee trustworthiness.” He’s right, of course. But what exactly does it mean to “guarantee trustworthiness”? This is, I think, where Micozzi’s analysis gets a bit fuzzy.

He says that, in the absence of a hierarchy of evidence, “evaluating evidence must be based on some combination of (1) the trustworthiness of the study, (2) the study’s relevance to the question at hand, and (3) the study’s scope.” Micozzi then says that “relying on study quality alone does not guarantee trustworthiness. Relying exclusively on study quality can mislead us just as much.” That seems right to me.

But, if that’s the case, then Micozzi’s pluralist approach is no better. After all, if relying on study quality alone is insufficient to guarantee trustworthiness, then relying on multiple studies, none of which are high quality, is also insufficient to guarantee trustworthiness.

Jurassic Micozzi, generated by CJ Blunt (22/09/21) using Jurassic-1 language model via AI21 Labs.