Stop Fighting, You’re Making Me SAD

Does medical science have to be adversarial to make progress? Does progress in patient care consist of weeding out treatments which are ineffective and replacing existing therapies with new and better alternatives?

Subacromial Decompression (SAD) surgery is one of the most commonplace shoulder surgeries, performed regularly as a treatment for shoulder impingement syndrome. To briefly oversimplify: shoulder impingement occurs (we think) when a rotator cuff tendon becomes irritated or damaged as it passes through the space under the acromion (hence subacromial), which is a bony projection forming the high-point of the shoulder. Through persistent rubbing and catching on bone or tissue when raising and lowering the arm, the tendon becomes damaged, torn, or inflamed, leading to pain and loss of mobility. In subacromial decompression surgery, the surgeon goes in, often arthroscopically, frees the tendon, removes damaged tissue (subacromial bursa – of which much more in another post and below) and shaves back the acromion to try to prevent reinjury. So far so good.

“Examiner” by shando is licensed under CC BY-SA 2.0

But last year, two major RCTs (CSAW and FIMPACT) were published which suggested that subacromial decompression does not outperform physiotherapy or sham arthroscopy by all that much. A BMJ ‘Rapid Recommendation’ in February 2019 argued that SAD surgeries should no longer be performed. The NHS has considered pulling the plug on the procedure. This presents a challenge: given the harms, risks and cost involved in the surgical procedure, is arthroscopic subacromial decompression surgery justified?

The debate has been quite brutal and very adversarial. Twitter is lit with claims that surgeons who perform the operation are acting in defiance of ‘the evidence’, and some surgeons I’ve spoken to feel personally uncomfortable. The tone at some of the events and conferences I’ve attended has been hostile. Part of the adversarial atmosphere is probably due to the clash of specialties and healthcare providers. One side of the debate is prinicipally populated by surgeons, the other mainly by physiotherapists and strength and conditioning coaches, with a range of clinical epidemiologists exulting in yet another hitherto-unquestioned treatment finally feeling the RCT axe. The reputations of surgeons as cavalier scalpel-jocks and the often mistrustful attitude towards newer professions such as strength and conditioning lends itself to stereotyping and personal insults (which Twitter does so well). Quite a few shoulder surgeons perform a lot of subacromial decompressions, so it’s hard to find their responses feeling objective – but physios too have financial and reputational stakes at play, so competing interests abound. The debate ends up as a proxy war for professional politicking.

Sidestepping the issue of which side is right (spoilers: as usual, it’s neither and a little bit both) for the time being, I want to ask whether medical science should have this adversarial quality, and why we’ve ended up here. The model which Evidence-Based Medicine lays out is pretty clear on this: it draws heavily on Karl Popper and Ronald Fisher for inspiration, favouring refutation and falsification above all else. The iconic cases for EBM are primarily examples in which interventions, both established and experimental, failed in randomised placebo-controlled trials and were subsequently jettisoned from the canon. The remonstrations that followed can be ascribed to the mismatch between clinical experience easily waylaid by confirmation bias, and the truth finally revealed by the RCT. The appropriate response to such objections and holdouts is first indifference and then pity.

This is where medical science could use a little more philosophy of science. The falsificationist model has a well-appreciated flaw, often thought fatal. As Pierre Duhem pointed out, no test of a theory offers definitive refutation. In order to deduce a prediction from a theory (e.g. ‘SAD will outperform diagnostic arthroscopy in a randomised trial’), we need to make a whole host of other assumptions. To test the hypothesis that subacromial decompression actually works, we make a bunch of assumptions: that the surgeons involved in the test have adequate surgical expertise, that appropriate physiotherapy is provided afterwards to facilitate recovery, that the sham surgery in the control group doesn’t actually include crucial components of the surgery (in some cases, the sham operations did include removing subacromial bursa tissue, which at least one surgeon I know, and at least one rather low-key systematic review, reckon is the real core of the procedure), and so on. This set of assumptions contains assumptions about absence of bias in the test, and assumptions about fairness of controls. In sum, they’re the conditions that would need to be met in order for us to only expect a negative result if the treatment itself was not beneficial. As Duhem pointed out, all that a negative result shows is that at least one of the assumptions and/or the hypothesis is false. The debate becomes about which is at fault. This doesn’t let the surgeons off, but it does legitimise a real debate about where the axe falls.

Now the kicker: no trial is perfect. There are always flaws that a suitably tooled evidence appraiser can find. The two trials that stoked this fire – FIMPACT and CSAW – are no exception. I and others have picked them apart and found plenty to object to. To coin myself an eponymous law: Blunt’s Law states: ‘If you look hard enough, you’ll always find reasons to doubt the quality of a study’. (Aside: This has all the qualities of a good eponymous law – it sounds pretty plausible but is basically impossible to check). The issues with the two trials are completely unsurprising and totally consistent with subacromial decompression turning out to be the culprit all along. But Duhem’s Thesis coupled with the “fact” that we can pretty much always find decent reasons to reject some of our auxiliary assumptions in the case of clinical trials leaves us in a bind. We need a way to figure out whether negative results are telling against the treatment under study or against the quality of the trial, and preferably a more scientific way than seeing who wins in a Twitter shouting match.

To shortcut around half a century of philosophy of science, we’re not going to get a convenient method to deal with this. The best we can hope for, keeping the same model of progress and evidence, is to produce further studies which don’t have the same problems as the others (but have some new ones of their own) to try to reduce the credibility of those who doubt the result. Replication might help resolve a stalemate by attrition, but won’t eliminate it.

There are other options. The falsificationist, refutation-oriented model is only one model of how progress might be sought in scientific medicine. An adversarial approach is not a necessity to do good science. The iconic case of good use of evidence in medicine doesn’t need to be the case where we take a treatment which we thought had good effects and show it to be no better than placebo or sham treatment on average. The goal should be the case in which we know a great deal about the variability in effects of treatments, we know what markers and features should give us pause about using a treatment or should make us more confident in doing so. We have upsettingly few iconic examples with those features, and getting them proves more complex.

Under this approach, when we want to know how to treatment subacromial impingement syndrome, we don’t simply conduct some trials and then decide on the ‘correct’ interpretation of those results, then stop at that. Rather, we have to treat the results of clinical trials – and a range of other evidence besides – as our inputs, our starting point from which to do some science. The outcomes of studies are data-points which we need to create a theory to explain. That theory is an explanatory and predictive model of the effects of interventions on subacromial impingement. The theory needs to explain all of the data – RCT results, subgroup analyses, mechanisms of action, laboratory studies, clinical experiences, harms, variability, etc.

The theory must make predictions. We should test those predictions. Where predictions come true, that’s evidence for the model (but accommodating existing evidence ain’t – anyone can do that). Where they don’t, we have to explain the predictive failure in the context of our model (and then generate new predictions from that explanation which we test!), refine the model, or create a new one. This approach foregrounds the integration of a range of evidence, and allows for competition between different theories rather than different treatments. It doesn’t necessitate a view of treatment which has a single best-fit solution for a complex and multivariate problem. It doesn’t require practitioners of different therapies to fall into different sides. It will leave plenty of room for arguments, including angry tweeting, when we try to understand how our theories have to adapt to conflicting data and respond to predictive failures. But the approach values construction, building a complex model which learns from predictive failures and new inputs, rather than jettisoning interventions when they don’t achieve. It does not insist that there be a winner in the shoulder impingement stakes, but accepts that the question in clinical therapeutics is almost always ‘When is this the right treatment for a patient?’, not ‘What is the best treatment for all patients?’