I’ve argued that information about variation in treatment effects is vital for doctors, patients and regulators alike. This information does not come from RCTs. Nonetheless, we can acquire high-quality, compelling evidence of variation. The case studies I’ve presented in the last few years show this in action but don’t prove it beyond doubt. It remains possible that the work I cite has misattributed causation or been misled by biases. To establish the case without room for doubt, let’s take a hypothetical. This hypothetical example is adapted and expanded from Collins and Pinch’s book on the sociology of medical knowledge, Doctor Golem. 
There is a rare medical condition known as UBL. There are four main treatments on the market for UBL, known by the acronyms CLA, CLL, CRA and CRL. A patient presents with UBL, and we want to decide which of the four treatments to give to her—if any. We consult the medical literature, and find that there are many reports of RCTs. What do they say?
There are several trials comparing each treatment to a placebo known as CN. Each of the four treatments significantly outperforms the placebo across all trials. All four have done enough to be licensed by the FDA as a treatment for UBL. There are also a few multi-arm trials which compare between the treatments. In one, CLA is compared to CLL, and the average treatment effect is higher by 10% (absolute) for CLL as for CLA. The authors conclude that CLL is significantly more effective than CLA. Meanwhile, another trial compares CRA and CRL, and finds CRA 10% more effective than CRL. Those authors too conclude that CRA is better than CRL.
Spurred on by these results, letters pages and editorials brim with calls for a trial to compare CLL to CRA. Surely this is the information doctors and patients need. A massive multi-centre trial compares CRA to CLL, and finds that CRA outperforms CLL by 10%, just as it outperformed CRL. Researchers have all but abandoned CLA and CRL as pharmaceutical companies pull their funding. CLL will now go much the same way. CRA becomes the standard of care.
Finally, we learn that it’s possible to give both CRL and CLL, or both CRA and CLA, together (but no other combinations have been tested, for whatever reason). A trial of the two combinations, CRL+CLL vs. CRA+CLA, showed no significant difference between the two, but both combinations outperformed the recent “gold standard” treatment of CRA by 15%. However, the side-effects of the combinations are significantly worse than any individual treatment alone, and the side-effects of CRA+CLA are worse than CRL+CLL. This exhausts the evidence rated as high-quality by hierarchies of evidence.
How should we proceed?
Before reading any further, you should think about this question. The recommendation we would make based on a hierarchical approach is not clear. But most likely we would recommend either CRA, the best of the individual treatments according to the RCTs—or we would recommend CRL+CLL, because it is even better than CRA alone and although its side-effects are significantly worse than CRA alone, it at least has less harmful side-effects than CRA-CLA. It probably depends on how burdensome UBL is, and how bad the side-effects are. A doctor with trust in her patients might think the best solution is to explain the additional side-effects of the combined approach and its increased performance, compared to CRA alone, to the patients, and let them make an informed decision. Most obviously, though, there are some treatments that we simply won’t be recommending. Individual rounds of CRL, CLL and CLA are all not defensible treatment options, because CRA is demonstrably more effective. Also, we’d be unlikely to ever recommend CRA+CLA because it is no more effective than CRL+CLL, but has reliably worse side-effects.
So do you recommend CRA? CRL+CLL? Offer the choice? What would be the appropriate reaction to a doctor who believes CLL is the right course of action? Would you take her off the case, demand to see her evidence, campaign for her to be struck off? What should we say to a patient who will only accept CRL and refuses to consent to CRA instead?
Let’s now fill in the details of the case and explain a lot more about UBL and the four candidate treatments. Read on when you’re ready to see the picture the RCTs didn’t—and most importantly couldn’t—paint.
UBL is an abbreviation for a rare condition, Unidentified Broken Limb. This condition occurs when we know that a patient has broken one of her legs or arms, but we don’t know which. (Perhaps we didn’t ask.) The four treatments are abbreviations for the practice of putting a cast on each of the limbs. CRA = Cast on Right Arm. CLA = Cast on Left Arm. CRL = Cast on Right Leg. CLL = Cast on Left Leg. And finally, the placebo treatment which was so ineffective compared to these four, CN = Cast on Neck—that is, a neck-brace.
But outcomes research has discovered that in cases of Unidentified Broken Limb, it’s not the case that every limb is equally likely to be broken. It turns out that UBL patients show the following distribution of broken limbs:
Broken Right Arm: 35%
Broken Right Leg: 25%
Broken Left Arm: 15%
Broken Left Leg: 25%
So, as it happens, 50% of UBL sufferers have a broken leg and 50% have a broken arm. But while each leg is as likely as the other to get broken (25% each), the right arm is much more prone to unidentified breakage than the left.
Now think about our four treatment plans. As long as we don’t ever get any more information about which limb the patient has broken, putting a cast on the right arm is the most likely to work of the four. 35% of the time, we will have fluked the correct limb. This is why, in comparative clinical trials, CRA outperformed both CRL and CLL by 10% (35% vs. 25% in both cases), and CLL outperformed CLA (25% vs 15%). Of course, assuming no UBL patients have also broken their neck, each of the treatments will do better than CN precisely because a minority of the patients will be lucky enough to get the correct limb put into a cast.
The researchers also happened to try out putting both arms and both legs in casts. That was CRA+CLA and CRL+CLL. Now, this raised the probability of getting the correct limb to 50% in each case. But the side effects were much worse. Having both arms in casts is much more debilitating than just one. It puts much bigger limits on your enjoyment of your life, and leaves you dependent on a carer for basic needs like food and dressing. Here the judgment was that the side effects of having both arms in casts are worse than having both legs in casts. Both legs in casts makes it very hard to get about without a wheelchair. It could be very difficult if a wheelchair wasn’t provided, or for people who don’t live in wheelchair accessible locations. But at least it allows for feeding oneself and would leave many people still able to work. You might disagree with that judgement, but nothing here depends on it. We could, of course, have invented a treatment that was 100% effective—CRA+CRL+CLA+CLL. Putting every limb in a cast covers all the bases, but at such a cost to quality of life. The 35% increase compared to the placebo that CRA managed looked so good that putting casts on all limbs might never have been considered.
But we all know there’s a better way. Find out which limb is broken. Put a cast on that limb. Assuming that we can successfully identify the broken limb in every case, we should be able to gain 100% effectiveness with a toolkit of all four treatments—targeted appropriately to the patients who need them. We also minimize the side-effects. The only limbs that get put out of commission are the broken ones.
We understand what limbs are and how to identify when they’re broken. This is a much easier game of treatment selection than the ones that we meet in the wild. The crucial point, though, is that relying on RCT evidence leaves us with the kind of information you had before the definition of UBL was filled in. You had no way of knowing that the 35% of patients who benefitted from CRA and the 25% who benefited from CRL were entirely different patients. When judging the quality of your whole evidence base, if you find yourself in a situation like the pre-reveal UBL situation, it seems the only reasonable judgment is that your evidence base to make a decision is weak and patchy. Those gaps can’t be filled by RCT evidence. It follows that an evidence base can’t realistically be high quality and detailed without observational and mechanistic evidence, and can’t be high quality with RCTs or even systematic reviews and meta-analyses of RCTs, alone.
 Collins and Pinch, Dr. Golem: How to Think about Medicine.