The Parachute Problem: Extracorporeal Life Support and the Demand for Trials

“One of the best things about [extracorporeal life support] is that it acts as a parachute. It’s there when everything else fails and has known results”.

Robert Bartlett⁽¹⁾

1: Parachutes and the Demand for Trials

The United States Parachute Association recorded 120 deaths while skydiving in America in 2008-2015.⁽²⁾ Most were due to human error, while others resulted from collisions with other parachutes or aircraft. Only 5 were due to equipment failure. With around 3 million parachute jumps per year in the USA, this signifies around 5-6 deaths per million jumps. With 4% of those fatalities due to equipment failure, the chance of fatal malfunction is of the order of 1 in 10,000,000. Parachute use, then, is an extremely effective and reliable way to prevent death when jumping from planes. A trial of parachute use could recruit tens of thousands of subjects without expecting a single failure.

We all accept that we have the highest level of evidence that parachutes are incredibly effective. The evidence must be extraordinarily strong, as no one seems inclined to disagree, or conduct research. Indeed, this is what Gordon Smith and Jill Pell found when—with tongue firmly in cheek—they performed a systematic review of studies of parachute use, satirically entitled: “Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials” ⁽³⁾. They searched for controlled trials of parachutes, and found none.

For all the morbid black humour of Smith and Pell’s article, their central point is clear: “Individuals who insist that all interventions need to be validated by a randomised controlled trial need to come down to earth with a bump” ⁽⁴⁾. Parachute use is as close as we get to an “all or nothing” case—almost everyone who jumps without one dies, and almost everyone who jumps with one, and uses it correctly, survives. If this is not a compelling evidence base for an intervention, nothing is.

Smith and Pell are targeting proponents of Evidence-Based Medicine (EBM), an ideology which prioritises the use of Randomized Controlled Trial (RCT) evidence, and has put forward ‘hierarchies of evidence’ which rank RCT and systematic review evidence as stronger and higher-quality than the evidence provided by non-randomized studies. Smith and Pell goad EBM proponents: “those who advocate evidence based medicine and criticise use of interventions that lack an evidence base will not hesitate to demonstrate their commitment by volunteering for a double blind, randomised, placebo controlled, crossover trial.” Their satire is not entirely targeting a straw man. Many overzealous EBM promotors have insisted that only RCTs count as high-quality, strong evidence ⁽⁵⁾, and many evidence hierarchies are consistent with that demand, and would rate the evidence for parachute use as weak and low-quality ⁽⁶⁾. Those hierarchies fall at the first hurdle: they endorse the indefensible claim that the only strong evidence is RCT evidence.

More sophisticated EBM proponents try to accommodate cases like parachute use into recent hierarchies. GRADE, one of the most influential contemporary hierarchies, allows upgrading for large effect sizes ⁽⁷⁾. The Centre for Evidence-Based Medicine in Oxford puts “all or none” evidence at Level 1c, the third-highest tier in their hierarchy. Still, this is problematic, especially when considering that the parachutes example still fails their definition for “all or none”, namely: “Met when all patients died before the [intervention] became available, but some now survive on it; or when some patients died before the [intervention] became available, but none now die on it.” ⁽⁸⁾ This is still not quite true, even for parachute use—some people still die when parachuting, and a fortunate few have survived long falls without parachutes. Perhaps, then, this is a matter of tweaking the hierarchy of evidence, not overhauling it entirely.

But the parachute use case just shows us the extreme end of the spectrum. It is not adequate to accommodate “all or none” cases without further analysis. GRADE allows evidence from observational studies to be upgraded. But the evidence for parachute use is not based on any traditional observational study, but on historically-controlled evidence, without systematic data gathering or analysis. Yet it is as compelling as evidence gets. If evidence where the effect sizes are close to “all or none” is high-quality and strong, could unconventional evidence a little further from that extremity still be very strong and compelling? Are there medical cases less extreme than the parachute use example, in which relatively unsystematic historically-controlled data without any controlled studies, let alone RCTs, provides strong, high-quality evidence that a treatment works? If we can move further down that spectrum, then this challenge to hierarchies and Evidence-Based Medicine as an ideology remains severe. To explore these questions, this paper will analyze one case which has been cited by critics of EBM ⁽⁹⁾ as an example in which clinical trials were unnecessary to demonstrate the effect of a treatment: extracorporeal life support.

Robert Truog’s work made Extracorporeal Life Support (ECLS) known amongst philosophers of medicine. ⁽¹⁰⁾ His concern is primarily ethical: that denying a potentially life-saving treatment to newborn babies in a control group is indefensible. John Worrall’s work on the ethics of Evidence-Based Medicine develops this line: if sufficient evidence had been amassed that the treatment worked without performing an RCT, performing further trials of the therapy would be unethical. Worrall does not pass judgment on whether sufficient evidence had been established: he argues that this depends on whether we accept the “statistical orthodoxy” ⁽¹¹⁾ that only a standard large-scale RCT can provide compelling evidence that an intervention is effective. He argues that the ethical question of whether the trials were justified depends on whether we accept that RCTs are necessary for all interventions. If not, then “the medical community […] was in the grip of an overly simple view of what counts as real scientific evidence” ⁽¹²⁾. This overly simple view—that RCT evidence is necessary for a strong evidence-base—would lead researchers to compromise their ethical duties: “They were not in “objective equipoise” ahead of the trial, but instead had good reason to think that the control treatment was inferior.” ⁽¹³⁾ If there are some treatments which we can establish are effective without the need for clinical trials, then adopting the EBM philosophy of evidence will mandate unethical research behaviours.

Truog also finds the RCT methodology unfit for purpose in the ECLS case. Because both the conventional therapy and ECLS were evolving rapidly whilst trials were taking place, the conclusions of RCTs of ECLS would be inapplicable by the time they were published: ECLS “was sufficiently different by the end of the studies that the results of the trials could truthfully only be said to apply to a form of [ECLS] that was already obsolete” ⁽¹⁴⁾. Truog is correct in this assessment (as Section 3.2 will show), but far from being a technical problem with RCT methodology, or unique to RCT designs, this continual evolution of the conventional and experimental technologies was central to the debate. Truog suggests that ongoing outcomes research would have provided the information the medical community needed. However, as we will see, his proposal would not have addressed the key question which divided medical opinion on ECLS—much as the RCTs did, this research project would leave the question of whether and when ECLS was needed unresolved.

Robyn Bluhm develops Truog’s argument, generalizing his findings out from “therapies that are potentially life-saving or that are rapidly evolving” ⁽¹⁵⁾ to chronic disease research. Bluhm criticizes the EBM movement’s assumption that an RCT design should be used wherever possible, arguing that RCTs are not always the best design to evaluate treatments: “the best design for situations that resemble the ECMO case is a long-term observational study that gathers information about treatment outcomes in the course of day-to-day medical practice” ⁽¹⁶⁾. If this is correct, then the dilemma between creating the highest-possible-quality evidence via an RCT or respecting ethical requirements collapses—the best possible evidence can be produced by research of a different design, which is also ethically defensible.

This paper argues that the assessments made by philosophical commentators to date have been based on a perception of the fundamental disagreement in the ECLS debate which does not match the reality. The key question as portrayed by Truog, Worrall and Bluhm is whether historical data (in Worrall’s case) or a prospective observational study (in the case of Truog and Bluhm) provides strong, high-quality evidence that ECLS was effective. Could the effectiveness of ECLS be established without requiring RCTs? Truog goes further, arguing that RCTs could not in fact establish the effectiveness of ECLS—if researchers were genuinely to show that it worked, they would need to perform observational studies.

This portrayal of the debate serves the purpose of EBM’s critics, but does not attend to question that truly fueled the controversy in the pages of pediatric journals. It was not whether ECLS worked that bothered most clinicians in these debates—for the most part, everyone involved accepted that it produced dramatic results. Rather, the question at issue was whether it was needed: were the newborn babies receiving ECLS really in sufficient danger of death to warrant an invasive and risky treatment? The debate focused not on whether ECLS could produce a high survival rate—it could. It focused instead on establishing what the death rate would be without ECLS. It was for that purpose that researchers decided to perform a range of randomized controlled trials. But, as this paper will show, by funneling the debate through the lens of RCT methodology, the central issue of the death rate without ECLS was warped and skewed. In trying to make the debate fit a model of a question that could be answered by an RCT, the medical community jettisoned its ability to find a suitable answer to a genuine and important disagreement.

Truog, Worrall and Bluhm raise significant concerns about the ethical issues involved in the ECLS trials. These are issues which run throughout the background of the debate. Ultimately, the question of whether the trials were ethically defensible turned on whether they were the right way to answer the important clinical questions. In this respect, this paper will argue that the trials were not ethically defensible—not because they were an inappropriate vehicle to assess whether ECLS worked, but because the RCT methodology is ill-equipped to answer the question on the table.

2: Extracorporeal Life Support

Extracorporeal Life Support (ECLS), previously known as Extracorporeal Membrane Oxygenation (ECMO), is a form of life support in which a machine drains low-oxygen blood from the patient, filters out carbon dioxide, oxygenates the blood, then re-warms it and returns it to the body. The process is essentially a long-term version of the heart-lung bypass machine. There are many reasons why ECLS might be used, from cardiac arrest to respiratory failure, to failure to be weaned off bypass during surgery. ECLS offers a bridge, keeping the patient alive while the heart or lungs grow or heal, to transition onto other devices, or to receive a transplant.

The origins of ECLS are in the heart-lung machine, the bypass system which facilitated sophisticated heart surgeries. In the 1940s and 50s, following the first successful extracorporeal oxygenation of a dog in Russia, there was a race to produce a machine which worked in humans. In 1953, John Heysham Gibbon performed the first successful heart-lung bypass during a surgery to correct a heart defect in an 18-year-old woman. The open-heart surgeries this technique facilitated underpinned a revolution in cardiac surgery in the mid-20^th century.

Gibbon believed the heart-lung machine was adaptable to sustain long-term bypass. The technical limit seemed to be around 6 hours; any longer, and the damage done in exposing the blood to oxygen gas made the technique lethal. This problem was eventually solved by putting a silicon gas exchange membrane between the blood and the oxygen, allowing oxygenation without direct exposure. With this innovation, ECLS might take over the functions of the heart and lungs for much longer periods. The first successful long-term ECLS use was in 1971. J. Donald Hill, in Santa Barbara, kept a 24-year-old man with acute respiratory distress alive on ECLS for 3 days. He was successfully weaned off the bypass and survived ⁽¹⁷⁾.

At the time, Robert Bartlett, an aspiring surgeon, was earning his stripes at the Children’s Hospital in Boston. Cardiac surgery was still in its infancy, especially for newborn babies. Repairing birth defects in the heart and respiratory system was dangerous and challenging. Bartlett recalls what happened when a corrective surgery went wrong: “Myself and my resident buddies sat at many bedsides, many nights ventilating babies by hand with a bag.” ⁽¹⁸⁾ Bartlett had an idea: deliberately put the babies onto a heart-lung machine. “We knew that if they could make it through the first couple of days, they all survived and they all did very well” ⁽¹⁹⁾.

Bartlett explored the possibility of a long-term heart-lung machine for newborns. In the late 1960s, there was nothing like it. He enlisted a senior resident at the Children’s Hospital, Lou Plazik, who was interested in using silicon rubber technologies in clinical care, to create an oxygenation membrane like that subsequently used by Hill. He brought in MIT engineer Phil Drinker to put a working system together. The system showed real promise: “We could put dogs on bypass for four days at a time and that was pretty hot stuff in 1967 and 1968.” ⁽²⁰⁾

In 1970, Bartlett moved to the University of California at Irvine, but kept researching extracorporeal life support systems in his spare time. As Pearl O’Rourke, an early ECLS researcher, recalls, the prototypic ECLS programs were breaking ground in bioengineering, and even more so in Bartlett’s project to apply the process to newborns. “It is important to remember that setting up an ECMO program at that time was a rather daunting exercise”, she wrote; “ligating a carotid artery and manipulating the internal jugular could not be good; heparinizing a neonate is asking for trouble; and ECMO may not improve survival but rather prolong death. In addition, little of the equipment used for the ECMO circuit was approved for long-term use. You could not buy an ECMO system ‘off the shelf.’ Infant cannulae were not available. We had to fashion our own using chest tubes or endotracheal tubes with side holes cut into them.” ⁽²¹⁾

The early applications of the technology were all in adults. Cases like the one performed by J. Donald Hill created a buzz around ECLS. But the treatment was invasive and designed for use only where everything else had failed. What’s more, it might be used for any range of problems. As the technology pushed into the spotlight, a major trial was funded by the National Institute of Health (NIH). The trial reported in 1979, bringing bad news. 90 patients were given ECLS or conventional mechanical ventilation. The trial found no difference in survival between the new technique and the old ⁽²²⁾. 90% of patients in both groups died. ECLS went through a “period of disgrace”, with diminished attention and little enthusiasm. ⁽²³⁾.

Bartlett felt that the NIH trial was a misstep. Not only were adults the wrong population to be the frontline of ECLS research, less likely to benefit than his target of newborn babies, but he believed that the trial had been mishandled. The study selected nine centres to provide ECLS, but only three had ever administered ECLS before. Bartlett had spent a decade refining his machinery and his technique. ECLS was a complex, highly technical and difficult intervention. If performed poorly, it would create all manner of complications and be less effective. “You need to learn it in a lab first, and then it takes about 30 patients to become proficient at it”, Bartlett believed.⁽²⁴⁾ But the trial, originally intended to recruit 300 patients, was stopped after just 92 when interim analysis showed no difference between the groups, before, in Bartlett’s view, the researchers had even begun using the technique properly.

Bartlett’s disagreement with the NIH is quite typical of what Collins and Pinch⁽²⁵⁾ call the experimenter’s regress. Bartlett believed that the treatment worked, but that it had to be done properly to get results. He was doing it properly in his laboratory and getting great results, which he had reported. The NIH trial didn’t replicate these results because they’d picked the wrong patients to enroll and lacked his expertise in applying the intervention. The only way to tell whether the NIH or Bartlett was correct was if the ‘true’ effect of ECLS could be brought out. But which data one believed reflected the ‘true’ capability of ECLS depended on whether you believed Bartlett’s dataset or the NIH’s trial.

As interest in ECLS waned, Bartlett continued to hone his machinery for use on newborns with respiratory failure. While adults were being treated for various problems, there was a more targeted condition when treating babies. When a newborn’s respiratory system had not developed sufficiently to support itself, the baby needed time to develop (“lung rest”) or for medical interventions to correct underlying issues. Persistent Pulmonary Hypertension of the Newborn (PPHN) was the target. PPHN affects around 2 in 1,000 babies. It is responsible for over a third of deaths in newborns.⁽²⁶⁾ During pregnancy, the fetus draws oxygen from the mother and placenta. The fetal lungs are essentially bypassed, and the blood vessels in the baby’s lungs are mostly closed, the lungs filled with amniotic fluid. The constriction of these blood vessels in the lungs means the pressure in the fetus’s lungs is high (pulmonary hypertension).

Once born, when the baby takes its first breaths, those blood vessels open. The lungs fill with air. When these vessels don’t adequately open, or the lungs are underdeveloped, the newborn is unable to take over oxygenating the blood. The heart may be forced to continue using the fetal circulatory system. The lack of oxygen in the bloodstream means the brain and other organs get an inadequate supply. The skin turns blue. The baby’s extremities are cold to the touch. The high pressure in the lungs has continued past birth—hence, persistent pulmonary hypertension of the newborn. ECLS might buy the newborn time for their lungs to grow and mature, for their doctors to help them fight off infections, and ultimately for full recovery.

In 1975, Bartlett had an opportunity to put neonatal ECLS into practice. The neonatal intensive care unit at the university hospital were treating an orphaned newborn they had named Esperanza (translating as “Hope”). Esperanza had extremely low blood oxygen levels due to meconium aspiration syndrome, one of the major causes of PPHN. Meconium aspiration is extremely serious and difficult to prevent, and severe cases have a very high death rate. Bartlett and his colleagues got their experimental ECLS device approved for emergency use in the hospital, and Esperanza became the first newborn to be put onto bypass. She survived.⁽²⁷⁾

Bartlett’s success fed the excitement around ECLS. In the years before the NIH trial reported and killed off the momentum, cases poured in of successful ECLS use for newborns. In 1980, Bartlett returned to the University of Michigan. He began tirelessly promoting ECLS for severe PPHN in newborns: “Many of us trekked to Michigan, learned the basics, and then returned to our own animal laboratories to hone our skills in anticipation”, Pearl O’Rourke remembers. ⁽²⁸⁾ At Bartlett’s former department in the Children’s Hospital in Boston, O’Rourke set up a neonatal ECLS unit. She was inspired by Bartlett’s work, and reports of ever more successes keeping young babies alive against the odds: “I remember reading [Bartlett’s paper] as well as, at the time being young and impressionable, thinking, ‘this is Nobel Prize material’ and that, why of course, we can do this!” ⁽²⁹⁾ Other centres were established in Japan and Italy. The early days of newborn ECLS were not in large-scale trials, but in the precursors to neonatal intensive care unit. Doctors, influenced and supported by Bartlett, tried to rescue their most hopeless cases with ECLS.

Bartlett and his collaborators were aware that to influence the wider medical community, they’d need data. They quickly established a worldwide registry of ECLS treatment for newborns. Every procedure was recorded in a database, which became the Extracorporeal Life Support Organization (ELSO) registry. But once the NIH trial reported no demonstrable effect for ECLS in adults, and enthusiasm evaporated, this data became even more important and was placed under scrutiny and suspicion.

A major issue was that ECLS was a rescue therapy: a parachute. It was meant to be deployed when a patient was in freefall towards almost certain death, and nothing else was working. Patients like Esperanza had no hope other than a dramatic, invasive and risky treatment. ECLS was also expensive and cumbersome, at the time requiring engineering expertise and considerable technical resources. But Bartlett was not suggesting using ECLS for all or even most PPHN babies. It was intended only for cases where everything else had failed. It was supposed to replace the resident sitting over the bed helping the baby breathe with a bag. As he saw it, Bartlett was doing the opposite of cherry-picking the best-placed patients for his new treatment. He was taking only the most critical and severe cases, who would die without ECLS.

Recent estimates of PPHN death rates are highly variable, ranging between 4-33%. ⁽³⁰⁾ In the 1980s, the mortality rate estimates were a little higher: 11-34%. ⁽³¹⁾ In his early case reports that inspired the likes of Pearl O’Rourke, of 45 newborns treated with ECMO in Michigan, 25 survived; a mortality rate of 44%. ⁽³²⁾ In 20 of the 25 survivors, growth, development and brain and lung function were unaffected by PPHN and the treatment. Developmental and brain damage was a commonplace side-effect of oxygen deprivation as a newborn. But if judged against the figures for PPHN as a whole, his results look not just unimpressive, but downright harmful. However, Bartlett was taking only the worst cases. From his experience treating newborns, he believed that at best 5-10% of newborns made recoveries once all available treatments had failed. He described the babies his unit took as “moribund”—bound for death. But without data about how that specific subset of babies fared prior to ECLS becoming available, there was no data to demonstrate the difference ECLS had made.

Reaction to Bartlett’s work was polarized. Some were astonished by the difference ECLS could make and joined the research community around Bartlett. Others found the case series unconvincing. Bartlett found resistance right away when submitting his manuscript to journals: “The pediatrics journals refused to publish it because they didn’t want people to know about ECMO for neonates. They thought it was dangerous”, he remembers. ⁽³³⁾ In the end, Bartlett published his paper in a thoracic surgery journal, rather than a pediatric one. ⁽³⁴⁾ Letters pages poured scorn on the findings. “Colleagues at conferences told us that we were committing ‘academic suicide’”, said Bartlett, “Some neonatologists wrote editorials claiming that they did not have any deaths of term neonates in their units and, therefore, that the reason we needed to use ECMO on our patients was that we did not know how to take care of babies.” ⁽³⁵⁾

There were two sticking points. ECLS was dangerous. Bartlett’s case reports showed lots of equipment failures and concurrent harms. Putting a cannula in a baby’s carotid artery seemed a shockingly dramatic and risky intervention. If clinicians hadn’t seen many otherwise hopeless cases, they wouldn’t see it as worth the risk. Moreover, without any data about how those patients fared beyond Bartlett’s own assertion that without ECMO 90-95% would have died, there was no comparison to be made. As one commentator close to the debate, James Ware, put it: “The assumption of low survival rates in untreated infants was based on historical experience, and the data supporting these rates were not presented”. ⁽³⁶⁾ It was Bartlett’s credibility as a witness for the hopelessness of the cases which was pivotal, and that credibility could only be established if ECLS was accepted by the community at large. He was too intimately connected to the project to be taken as a reliable source for the predicted fates of the babies he had treated. To prove his point, he felt the need to at least go through the motions of conducting a clinical trial.

3.1: The Bartlett Trial

There’s no doubt that Bartlett was convinced that ECLS saved babies that conventional therapy could not. Given that, it may be surprising that he, and others, felt it necessary and ethically appropriate to conduct trials. Nevertheless, he felt he could not convince others of the treatment’s worth otherwise, and this tension is evident in his trial report: “we were compelled to conduct a prospective randomized study, but reluctant to withhold a lifesaving treatment from alternate patients simply to meet conventional random assignment technique.” ⁽³⁷⁾ Bartlett was aware of the ethical challenge of performing this research when he believed that ECLS was a demonstrated lifesaver, writing: “We anticipated that most ECMO patients would survive and most control patients would die”. ⁽³⁸⁾

There were three historically significant trials of ECLS for newborns. The first was Bartlett’s. His trial was published in 1985, including only 12 babies. The big problem was in patient selection. ECLS was a rescue therapy by nature. It would not be a case of waiting for PPHN cases to come along and then randomizing to ECLS or conventional therapy. ECLS should only be used when it was needed: in the cases where all conventional treatments failed.

But if the selection criterion was that conventional treatment had failed, it seemed bizarre if not unethical to put half of the newborns in the trial on the conventional therapy. Bartlett was in a strange position as a researcher. He had plenty of data to demonstrate the survival rate on his new therapy. Initially, this rate was 56%, but as more cases came in, and expertise and technological sophistication grew, the rate gradually increased. Bartlett’s critics largely didn’t dispute this. They disputed whether the babies Bartlett was treating were as likely to die without ECLS as he believed. Whether he knew it or not, what Bartlett’s critics wanted was an estimate of the survival rate without ECLS in the “moribund” newborns he treated. The dispute was not about showing high survival rates on ECLS, but high death rates without it.

But Bartlett was constrained by research ethics and his conscience. He did not need convincing that the babies he treated with ECLS died without it. He felt he’d seen it in the pediatric wards of Children’s Hospital in Boston. His colleagues knew that there were babies who didn’t respond to any conventional therapy: they’d seen them. Bartlett had three responses to this predicament. First, he engineered the selection criteria so that they made only a tangential reference to the fact that conventional treatments had failed. Second, he tried to minimize the number of babies who received the conventional treatment, using an adaptive trial design, rather than conventional 50-50 randomization. Third, he tailored the way consent was obtained from parents so that no one had to be confronted with a choice of randomly getting a treatment known to have failed.

The eligibility criteria were very specific, but could not entirely disguise the fact that conventional treatment had failed. The criteria included: “Newborn respiratory failure; >2 kg birth weight. Optimal/maximal treatment in University of Michigan Newborn Intensive Care Unit (ventilator, pharmacologic, surgical).” ⁽³⁹⁾ In other words, the other options Michigan had available were unsuccessful. In addition to these basic criteria, the newborn must meet at least one of five further criteria to show that their case was extremely severe and necessitated ECLS. These included acute deterioration in condition, unresponsiveness, and a “Newborn Pulmonary Insufficiency Index: 80%+ mortality rate at 24 hours of age”. The final criterion was emphasized strongly. Every baby selected for the study was predicted to have at least an 80% probability of death if left untreated, and at this point, conventional treatments had all failed, so were unlikely to change that. As Bartlett put it: “Objective criteria were established to select patients with an 80% or greater chance of mortality despite optimal therapy”. ⁽⁴⁰⁾

Bartlett used an innovative trial design to minimize the number of patients who would receive only the failing conventional therapy: a form of adaptive randomization, which changes the odds of each patient receiving ECLS or the control treatment depending on how previous patients fared. This “play-the-winner” design was first described by statistician Marvin Zelen ⁽⁴¹⁾, who became synonymous with innovations in trial methodology. Bartlett used an adaptation of Zelen’s design by Wei & Durham. ⁽⁴²⁾ Every time a newborn survived, the odds for the next random allocation shifted in the direction of the treatment which had allowed them to survive. For each death, the odds shifted away from the treatment on which they had died. Wei and Durham illustrate the principle with balls in a box. Initially, one red and one white ball are placed in the box. A ball is randomly selected. If red comes out, the patient gets ECLS; if white, they get conventional therapy. The ball goes back in. If an ECLS patient survives, an additional red ball is added; if they die, a white ball is added. If a conventional therapy patient survives, a white ball is added; if they die, a red ball, and so on.

Because of this design, if one treatment turns out to be more likely to lead to survival than the other, the odds rapidly stack in favour of that treatment. This is exactly what happened in Bartlett’s study. The first patient had 50-50 odds, and was randomized to receive ECLS. The newborn survived, adding another red ball and shifting the odds to 2:1 in favour of ECLS for the next patient. The second newborn was randomized to receive conventional therapy and died. This shifted the odds further to 3:1. Every subsequent patient was randomized to receive ECLS. Every one survived. By the end, the odds were 13:1 in favour of ECLS. Of the 12 participants, 1 baby had received conventional therapy and died, while 11 received ECLS and lived.

Building on this unusual design, Bartlett implemented a controversial consent technique. ⁽⁴³⁾ In Zelen’s randomized consent design, when babies are randomized to the control treatment, their parents are not notified that their child is part of a trial or informed that ECLS exists. They continue with their existing therapy without any special consent procedures. The ethical judgment was that parents of newborns who couldn’t receive ECLS would not benefit from knowing it had been available to others. Meanwhile, for the babies randomized to receive ECLS, parents were consulted and gave consent. The idea was to spare parents an agonizing decision: whether to gamble on allowing their baby’s treatment to be decided by a randomizer in the hope of receiving the rescue therapy. Instead, they were simply told that an experimental therapy was available and asked to try it.

These two Zelen designs were controversial. ⁽⁴⁴⁾ The randomized consent design diminishes parental control and understanding of their babies’ treatment, and looks like an attempt to disguise a trial from participants. Neither technique had been employed before. The Michigan team were breaking new methodological ground.

Bartlett clearly found the study completely convincing. He wrote: “this study proves that ECMO improves survival when compared to conventional therapy.” ⁽⁴⁵⁾ But Bartlett had been convinced all along. The real question was whether this trial, with just one baby receiving conventional treatment, could convince sceptics. It is hard to see how, if the 45 cases Bartlett had already reported didn’t convince them, merely randomizing one baby to conventional treatment, and that baby dying, could make much difference. It was the death rate on conventional therapy that was at issue. A single patient provided little help in estimating that rate. Again, the real work was being done by Bartlett’s repeated claim that the death rate for untreated patients would be at least 80%. This claim hadn’t convinced critics before. This time, Bartlett had assembled “objective criteria” to guarantee a survival rate less than 20%. But without data to back that up, sceptics remained unconvinced.

Many of Bartlett’s colleagues found nothing new in the results: “Despite the advantages and success of adaptive designs, clinicians and editors feel that they are not ‘real’ randomized trials, insisting on conventional 50-50 randomization” ⁽⁴⁶⁾. It didn’t help that Bartlett’s trial was published in Pediatrics alongside a group from Columbia University reporting 15 out of 15 babies had survived in their study of modified conventional therapy for PPHN. ⁽⁴⁷⁾ James Ware and Michael Epstein followed up Bartlett’s trial report with a skeptical commentary concluding: “the data were not sufficient to justify routine use of ECMO in the treatment of PPHN”. ⁽⁴⁸⁾ They were “uneasy about rapid acceptance of a new and potentially dangerous technology based on inadequate experience”. For Bartlett’s part, the Columbia data was a red herring. The babies were unlike the moribund newborns he was treating. What’s more, he felt he had been misunderstood: he didn’t want ECLS to become a routine treatment. It didn’t need to outperform conventional therapy. Quite the opposite: it was supposed to be for exceptional cases where conventional therapy failed.

This example should hold at least one important lesson for exponents of the RCT: just the fact that randomization was performed doesn’t guarantee much. Bartlett’s study was a long way from the archetypal RCT that EBM proponents value. Any hierarchy that can’t differentiate between a 12-person study with a single baby randomized to the control arm and a huge-scale multi-arm trial is surely a massive oversimplification of the assessment of evidence. The demands were made for a new randomized trial. Those demands were soon met.

3.2: The Boston Trial

Pearl O’Rourke at Children’s Hospital, Boston, followed Bartlett with a second controversial trial, reporting four years later. ⁽⁴⁹⁾ Her team included both Ware and Epstein. Although already using ECLS in their practice, the team were more skeptical of the evidence Bartlett had amassed. But they shared Bartlett’s worries, sympathizing with “the concerns of the Michigan group about the ethical difficulties if early experience again suggested that ECMO therapy was dramatically effective in this group of very high-risk infants.” ⁽⁵⁰⁾ Like Bartlett, they adopted Zelen’s consent design.

They did not use the “play the winner” design, but remained worried that deaths might rapidly stack up in the control arm. After all, if Bartlett was right, that was exactly the expectation. Sadly, the evidence that Ware and other sceptics needed was precisely this high death-rate for conventional therapy. The Boston group made the decision that they could not tolerate more than 4 deaths for this purpose. They began their trial with a “stopping rule” in place: once four deaths had occurred in either arm of the trial, all subsequent patients would go to the other arm. ⁽⁵¹⁾

What made this decision particularly interesting was the way the ‘4 deaths’ figure and the stopping rule was formulated. The group needed initial hypothesized survival rates. To get this hypothesized rate for conventional therapy, the group conducted a “chart review” of all newborns treated for PPHN in 1982-3 at two Harvard hospitals which did not use ECLS. They found 39 PPHN patients, of whom 13 were classed as “severe” and would be ECLS candidates. 11 of the 13 severe cases (85%) died despite receiving maximal conventional therapy. They used this 85% death-rate as their hypothetical rate to calculate the number of patients to enroll. They used the same criteria they had used for severity in the chart review to determine which patients would be eligible for their trial. The criteria were “virtually equivalent in determining eligibility” to the criteria Bartlett had used for his trial, in which he’d predicted at least 80% mortality. ⁽⁵²⁾

The Boston trial began in February 1986. 19 patients participated. 10 were randomized to conventional therapy, 9 to ECLS. All nine ECLS patients survived. Four of the 10 conventional therapy patients died. After patient 19 received conventional therapy and died, bringing its tally to four deaths, the conventional therapy arm was closed. The second phase of the trial began, in which every patient received ECLS. 20 more newborns received ECLS. All but one survived. 39 patients had been enrolled in all. 28 had received ECLS, one had died. 10 had received conventional therapy, and four had died.

Critics were unconvinced. James Ware recalls one line of objection: “Some colleagues have argued that randomization should have continued even after four deaths had occurred in the conventional therapy arm. This position is easy to defend on scientific grounds.” ⁽⁵³⁾ There was a potential difference between the control and ECLS patients when both phase 1 and phase 2 of the trial were combined: the control patients were on average treated earlier in the trial than the ECLS patients. The researchers and their staff had more experience in treating severely ill babies by the time they were treating most ECLS newborns. That was a confounding factor which might explain why the later ECLS babies outperformed the earlier conventional therapy babies. Because the second part of the trial introduced “confounding variables, including time of enrollment”, some critics considered the later 20 babies ineligible. The comparison only between the 10 conventional therapy patients and the 9 ECLS patients was (very narrowly) not statistically significant at the 0.05 level demanded by the critics.

This criticism seems overly demanding given the background of evidence of effectiveness and the low survival rates for conventional therapy seen in the Harvard chart review. A bigger problem from the perspective of the debate around ECLS was the inconsistency between the high survival rate in the conventional therapy group and the predictions based on Bartlett’s criteria and the chart review. 60% of conventional therapy babies survived. The chart review and Bartlett’s estimations expected at best 20%. Conventional therapy outperformed all expectations in the Boston trial. The severity criteria were exactly matched with the ones used to find an 85% death rate in the Harvard hospitals. Were the Boston trial staff extraordinarily good, to outperform the Harvard hospitals’ outcomes by such a margin, a 45% absolute difference in death rate?

In reality, intensive care for newborns—the “conventional therapy”—had made major strides too. Bartlett’s innovation was not the only one being developed. The picture of intensive care Bartlett had experienced in the late 1960s was being overhauled. Dedicated neonatal intensive care units were developing. All manner of small improvements might transform the “conventional therapy” from one that failed severe cases to one that saved more than it lost. This is part of the problem when comparing a treatment to ‘maximal’ and ‘optimal’ current practice; that practice describes an ever-shifting reality.

That same year, a study by April Dworetz and her colleagues at Yale investigated this possibility. ⁽⁵⁴⁾ They performed a retrospective analysis, looking at newborns who met Bartlett’s criteria but who had not received ECLS. They analyzed patients in two time-periods. Group 1 included 6 babies treated between January 1980 and December 1981. Group 2 consisted of 10 babies, up to 9 years later, between January 1986 and December 1988. In the first group, only 1 of 6 severe cases survived: an 83% death-rate. But in the second, 9 of the 10 survived on conventional therapy, a 10% death-rate. The difference that a decade of pediatric advances made, they suggested, was huge. They conclude: “Because the prognosis of persistent pulmonary hypertension is changing with time, indications for alternative modes of therapy, such as ECMO, and assessment of the effectiveness of such therapy should not be based on historical data. A randomized clinical trial […] should be undertaken before further expenditures on ECMO centers are made.” ⁽⁵⁵⁾ Another trial was needed, and historical data were no use.

The Yale group’s argument is a strange one. It seems to undermine itself. They argue that the comparison between non-contemporaneous datasets invalid because of the advances in conventional treatment. But these same advances would undermine the application of any past contemporaneous comparisons to the new setting, too! Suppose that in the early 1980s, Bartlett had performed a large 50-50 trial, finding the high death-rate he expected (and Dworetz et al. found) of around 85%. This study would have had a massive impact, and probably propelled ECLS to the top of must-have lists for neonatal centres. It would be exactly the kind of data that Dworetz et al. recommend. But then the same advances in neonatal care that apparently slashed the death-rate in Yale’s centre would have occurred. Now, ECLS would be a staple, given to all severe PPHN patients. A few under-resourced or hold-out centres might have still provided comparative data, and suggested that conventional therapy was quickly catching up to ECLS’s survival figures. But if so, that would have posed the same problem for the hypothetical big contemporary-controlled trial as it did for the historically-controlled studies Bartlett and others performed. Dworetz et al. called for a big trial, but that trial is subject to the same shifting comparison. As the state of the art continued to develop into the future, it would become irrelevant. ⁽⁵⁶⁾

Change over time in the standard of care affects RCT and non-randomized evidence alike. The only major difference comes when data is presented against a historical control from before a major change in survival rates. Bartlett presented his data against contemporary controls from other centres that did not use ECLS. The problem would have occurred rather if O’Rourke had used to the Harvard chart review cases as the control for their study, which took place four to six years later.

How should we react to the Boston trial data? First, ECLS still dramatically outperformed conventional therapy, although that therapy was catching up. There was still a case for seeing this as confirmation of Bartlett’s original results. It did not demonstrate that Bartlett’s original prediction of a very high death-rate in the control arm was wrong, just outdated. Seemingly, the study shows a role for ECLS for now, but close monitoring of advancements in neonatal intensive care would be needed to ensure that ECLS, with its dangers and risks, did not outstay its welcome.

3.3: The UK Trial

Ware and Epstein had been convinced. But other critics were not, Dworetz et al. amongst them. The calls began again for a big 50-50 trial, the trial that many biostatisticians had wanted from the beginning. The calls were answered by a British group in 1996. A large group of British clinicians at multiple neonatal centres around the UK recruited patients for a large ECLS trial. ⁽⁵⁷⁾ ECLS had not caught on in Britain to anything like the same extent as in the USA. By 1992, 75 US centres provided ECLS to newborns. In the UK, severe PPHN seemed much rarer than in America, and demand had not been so high. British clinicians were very cautious. ⁽⁵⁸⁾ The first UK centre for ECLS was only established in 1989.

Five centres recruited patients referred from 55 hospitals. Unlike previous trials, Zelen’s consent design was not used. Parents had to consent to their baby being randomly allocated to ECLS or conventional therapy. Controversially, while all the babies randomized to ECLS were immediately transported to the closest of the five ECLS centres for treatment, babies randomized to conventional therapy remained at their referring hospitals, receiving conventional care in the field. Given the variability in conventional therapy suggested by Dworetz et al.’s previous analysis, this decision brought considerable criticism and was perceived to favour ECLS. ⁽⁵⁹⁾

185 babies took part. 93 were allocated to ECLS, and 92 received conventional therapy at their original hospitals. In the ECLS arm, only 78 (84%) received the ECLS treatment. Of the 15 who didn’t, 8 died before ECLS could begin, and 4 were found to have congenital heart disease (2 died).

In the ECLS group, 68% survived. For conventional treatment, the survival rate was only 41%. Their interpretation of the findings was unequivocal: “The trial results leave little doubt that allocation to the ECMO policy reduced the risk of death or severe disability.” ⁽⁶⁰⁾ One powerful result from the UK collaboration was the ability to see how newborns had fared when broken down by the underlying cause of the PPHN. Bartlett and his colleagues had long believed that one cause of PPHN, congenital diaphragmatic hernia, was associated with far worse outcomes than the others. This finding was dramatically illustrated by the UK trial. Of the 185 babies enrolled, 35 had diaphragmatic hernias, 18 in the ECLS group and 17 in the conventional therapy. In the ECLS group, 14 of 18 died, while all 17 babies in the conventional therapy group died. This result contrasted starkly with the death rates for the other diagnoses (16/75 for ECLS and 37/75 for conventional therapy). ECLS was still able to save a few diaphragmatic hernia patients, but there was a clear and demonstrable need to consider those patients’ figures separately.

The UK trial all but settled debate internationally and silenced most critics. Though imperfect, the trial had adhered to the most standard procedures of the RCT design. ECLS had earned the Evidence-Based Medicine seal of approval.

But this RCT again did not remove the concern about the changing nature of the conventional therapy. Although the rapid advances in neonatal intensive care in the 1980s might have slowed by the mid-90s, the results cannot be considered permanent or unchanging. As conventional care progresses, we must be careful not to assume that old comparisons still hold true. The lifesaving potential of ECLS as a parachute for the most severe cases has been established—if it was ever in doubt—but the goal should be to prevent ECLS being needed. It remains invasive and expensive. Replacing ECLS or preventing newborns needing it is a worthwhile aim. One commentary published alongside the UK trial report in the Lancet expressed this view: “Although effective, ECMO is clearly a bridging technique, to be supplanted by any number of treatments that would be considered superior.” ⁽⁶¹⁾

4: ECLS as a Parachute

How does neonatal ECLS resemble the parachute use case? There are two ways in which it might be similar. Straightforwardly, if ECLS was obviously effective and trials were redundant or egregiously unnecessary, then it could be a medical representative of the tongue-in-cheek parachute example. The case came to the attention of philosophers of medicine, critical of Evidence-Based Medicine, in this way. An article by Richard Royall lamented that even Bartlett had felt that a randomized trial of ECLS was necessary, given the data available: “This is particularly disturbing to me as a statistician because it is we statisticians who are largely responsible for creating the attitudes and assumptions that ‘compelled’ this study.” ⁽⁶²⁾

Worrall questions whether the outcome data that Bartlett and others had amassed, in contrast with historical data from before the adoption of ECLS, was sufficient to convince any reasonable observer that ECLS worked. ⁽⁶³⁾ If this historically-controlled comparison with Bartlett’s case series was compelling, then ECLS would resemble the parachute use case in just this respect: it would be a case in which RCTs were unnecessary, but were performed anyway due to a pathological culture of dependence on RCT data. In that case, the RCTs would be rendered unethical because they were redundant, and unnecessarily exposed newborns to a known inferior treatment.

But this analysis has shown that this argument fails. The historical data was not able to provide a substantive backdrop for comparison with Bartlett’s catalogue of successes. The overall outcomes for PPHN babies was far better on conventional therapy than Bartlett could achieve, precisely because Bartlett was picking out the very worst cases. But there had been no coherent attempt to measure the death rate—contemporary or historical—in that subset of babies. Bartlett was relying not on a compelling comparison with historical controls, but first on his own perceptions of which babies needed treating, and later on a set of criteria that predicted mortality rates of 80+% if left without ECLS treatment which he had developed, but had not experimentally validated. The argument that no further study was needed in order to know whether ECLS was warranted fails, along with the argument that further research would be redundant and thus unethical.

For ECLS to resemble a parachute use case, there must be strong, compelling evidence that the treatment works which is derived from experience rather than from systematic research, coupled with an easily assumed high death-rate in the absence of the intervention. The first part of this picture was accepted by almost everyone in the debate. Bartlett was achieving great results with his technique. The ELSO registry and his mini-trial backed that up. The ECLS results resembled the parachute use statistics: though things went wrong occasionally, most patients survived a life-threatening situation. The ELSO registry remains a compelling demonstration. As of July 2017, 35,598 newborns have received ECLS since the inception of the registry. 28,210 survived (79%). This is despite an appreciable minority having the highly problematic underlying cause of congenital diaphragmatic hernia. The criteria Bartlett used, and ECLS centres took on, mean that plenty of extremely severe, high-risk cases have been seen and treated successfully with ECLS. Trials do little to embellish this case.

But the second part of the picture is lacking. Unlike the parachute use case, in which everyone accepts that falling long distances without a parachute is almost always fatal, the critics of ECLS were unconvinced that the fates of the newborns Bartlett was treated were as dire as he claimed. Moreover, as Dworetz et al.’s study suggests, these critics were not behaving unreasonably in demanding such evidence. For that reason, contrary to the argument that could be derived from the work of Truog, Worrall and Bluhm, further research was not redundant.

The question, then, becomes whether performing RCTs was a reasonable course of action to fill that evidential gap. Could a trial provide anything that the historical record, or a prospective observational study such as that suggested by Truog and Bluhm, could not? One line of research would investigate whether Bartlett and others’ estimations of the likelihood of babies dying without ECLS were reliable. The 80% figure was doubted from the start. Evidence from a range of sources compound that concern. Trusting doctors’ perceptions of their patients’ chances had proved naïve in the past—as EBM proponents often emphasize. ⁽⁶⁴⁾ If Bartlett and others systematically underrated the chances of these babies surviving without ECLS, that would have two important consequences. First, the benefits of ECLS would be overstated. Second, far more babies than necessary would be exposed to an invasive, risky and expensive treatment. This suggested that the pivotal question was: How likely are the severe cases which were being treated with ECLS to survive without it? The ELSO data has no answer here. Constructing a registry like ELSO is precisely the kind of longitudinal outcomes research project envisioned by Truog and Bluhm. ⁽⁶⁵⁾ Yet, that project cannot sway a critic. To meet critical scepticism, the observational study must systematically compare outcomes in centres which did and did not practice ECLS. At this point, a pragmatic cluster-randomized RCT, which allowed for variation and evolution in treatment protocols, could potentially provide an equally potent, if not superior, approach to answer this question. The question still has not been systematically answered. The debate rests largely through a combination of critics conceding the point in face of changing general opinion, and through the assumptions introduced by RCT testing skewing the debate away from the question which had long been the heart of the disagreement.

The debate to this point does not validate Bluhm and Truog’s argument in favour of non-randomized methodologies. Nor does it justify pointing to ECLS as a case in which further study was unnecessary or unethical, as in Worrall’s discussion. However, to this point, the debate has accepted one of the core assumptions of Evidence-Based Medicine, an assumption that can and should be challenged: that the question of whether ECLS could outperform conventional therapy for these patients is important and relevant.

This brings us to the second sense in which ECLS resembles a parachute. ECLS as presented by Bartlett was a rescue therapy. It was not supposed to be a first-line therapy for PPHN babies. Babies who were not already in need of rescue would not receive ECLS. If conventional therapy was working for them at all, they wouldn’t need the parachute. They would turn to ECLS as a last resort. That was the sense in which Bartlett thought of the treatment as a parachute: “It’s there when everything else fails”. ⁽⁶⁶⁾

When ECLS is presented as a rescue therapy, the evidence situation looks very different. Providing evidence for a normal intervention means showing it outperforming its rivals—it is a question of comparative effectiveness, which of two or more treatments results in the best outcomes within some population of patients. Those who saw ECLS as an intervention like any other wanted this proof. It wasn’t enough to save people, ECLS must be better than conventional therapy at the task. That was why the death-rate under conventional therapy was needed: so that clinicians could see whether ECLS was better. To someone in Bartlett’s position, this demand looked irrational or even absurd. The conventional therapies weren’t working for the patients he treated: that’s precisely why they needed ECLS.

Seen in this way, the debate truly turned on how reliably pediatricians can determine when conventional therapy has failed. Reliably predicting the fate of a severely ill baby, and determining whether conventional therapies are working, is tough. There are few touchstones for clinicians in neonatal intensive care units to rely upon. If the realization that treatment is failing arrives too late, the baby may die before ECLS has a chance to open a parachute for them, as for the 8 babies in the UK trial who died before making it to an ECLS centre. If that judgment is reached prematurely, the baby risks an invasive, expensive treatment when conventional therapy could have succeeded. Emergency care doctors routinely make these difficult calls. Experience and expertise is not eliminable in favour of “objective criteria” to determine, say, an 80%+ chance of death, precisely because the variability we’ve seen here, whether in terms of underlying cause, neonatal weight, or the evolution of neonatal intensive care units, renders those criteria unreliable.

Bartlett formulated his “objective criteria” for severity because he was conducting a trial. He had to systematically define a study population in which to test whether ECLS outperformed conventional therapy. But in practice, his policy was to employ ECLS only when all other hope appeared lost. So, the question became: what was the true survival rate without ECLS for these so-called 80%+ death-rate babies? Researchers could have answered those questions in several ways: by conducting trials to establish experimental death-rates in a control group, or by validating the criteria for severity against past and/or future cases in centres which were not employing ECLS. But this question only arose because ECLS was being considered through the lens of RCT methodology. The debate had to be restructured to fit that mould, recast as a question about comparative effectiveness. For Bartlett, this made no sense. ECLS would only be deployed when all rival treatments had failed, so there was no point in asking whether it worked better than the conventional therapy.

Yet Bartlett felt forced to recast the fundamental disagreement into terms of comparative effectiveness anyway, because the prevailing philosophy, endorsed by EBM, is and was that the RCT methodology is the best way to resolve clinical questions. The assumption that RCT methodology should be used wherever possible spawns several secondary assumptions: that any significant clinical question can be recast in terms of comparative effectiveness, answerable via an RCT; that questions about treatment which don’t relate to comparative effectiveness can’t be answered well (or perhaps at all), and that such questions are only secondary concerns to the central question of comparative effectiveness.

What, then, of the changing level of effectiveness of conventional treatment, emphasized by Truog? To be sure, the survival rate for “severe” cases changes. The standard of care rises, and severity is less likely to render conventional therapy useless. There will be fewer cases in which ECLS is needed. How, then, should we judge ECLS? One perspective is that the advances in conventional therapy change very little about ECLS. It remains a parachute therapy to be used just when conventional treatment is demonstrably failing. Perhaps conventional therapy will fail less often as technologies advance. But when it does, ECLS remains as a fallback. From this vantage, the death rates on conventional therapy—the heart of the debate and the reason for conducting trials—were unimportant. What matters is clinicians’ ability to reliably judge when conventional therapy isn’t working. If we step back from the RCT-led comparative effectiveness model, the key questions are: Could doctors independently and reliably determine which patients needed ECLS? What were the signs and features of newborns that could help them decide and maximize survival while minimizing risk? Could a standard set of criteria be developed to support doctors in determining whether conventional treatment had failed, and the time had come to open a parachute in the form of ECLS?

The most useful research program to answer these questions would study the properties of PPHN babies to design a predictive model to support doctors in their clinical judgements. Such a model would need to take account of which features of a case affect the probability that conventional therapy has failed or will fail. Clinical trials are not generally well equipped to answer these kinds of questions—certainly not in isolation. At this point, well-designed outcomes research would be a more appropriate vehicle. But this program would be cast not in terms of evaluating whether outcomes in real-world settings differed between centres which did and did not use ECLS. Rather, research would focus on questions of which subgroups are systematically disposed towards failure of the conventional treatment. Case-control studies in which the features of patients who did and did not respond to conventional treatment were analysed would be needed. This program would focus on the predictors and indicators of conventional therapy failure. Those predictors and indicators would, of course, change over time as the conventional therapy evolved.

The goal of such a program would be to develop a predictive model to support clinical decision-making in determining when conventional therapy is failing. A suitably sophisticated model would draw on a range of features, such as birth weight, pulmonary sufficiency index, responsiveness to various therapies, and duration of treatment. As guidelines developed, the predictions of such models could be validated in clinical trials comparing the outcomes in centres which implement model-based guidelines with centres in which clinicians use their best judgment to make the call on when and whether ECLS was needed. This is the point at which there is a role for comparative effectiveness studies, albeit first premised on the assumption that approaches other than RCTs can provide strong evidence in response to questions about differential prognosis in developing the models. The RCT is far from the only way in which a model’s predictions could be validated, though. Models could be validated against both historic and new cases using work such as the ELSO register, and with sufficient work might be made sufficiently flexible to adapt to changing conditions, such as an upturn in the effectiveness of conventional lines of treatment for particular patients.

5: Conclusion

In summary, ECLS was not a straightforward story like parachute use, in which RCTs were never needed and the data was clear from the outset. It was a challenging case. At its heart was simple distrust: the critics could not trust practitioners that newborns were highly likely to die unless dramatic action was taken. Their doubts were not unreasonable. Bartlett and his supporters saw the case differently. For them, this was a proven rescue therapy. Scepticism about their abilities to perceive which patients needed it was moot; their claim was only that when needed, it should be employed. None of the research in the literature addressed that core disagreement: Could practitioners judge when ECLS was needed? How could they be supported in making those judgements?

As philosophical critics suggest, the RCT data did very little, if anything, to clarify that situation. But contrary to the thrust of much philosophical criticism, this was not primarily due to methodological or ethical failings. Rather, RCTs are comparative effectiveness studies, and this was not debate about comparative effectiveness. The approach was ill-fitting to the questions that really mattered to the ECLS debate. But because the debate was configured from the start through the lens of the RCT as the dominant methodology, the research questions had to be forced to fit the methods used. There were many rival treatments to ECLS. The alternatives include ventilation through a breathing tube, inhaled nitric oxide which helps to open the closed blood vessels, and surfactant therapy. Trials comparing these therapies against one another would tell us which was superior. But this was not the question at issue. This mismatch between ideology of evidence and purpose underpinned the interminability of the debate.

Because the EBM movement sets such stall by RCT results, the UK trial had to be regarded as settling the debate on its own terms: it had supplied what the EBM movement demands. But the UK trial is a poor evidence base for ECLS. The evidence ranks highly on EBM criteria, but is peppered with inconsistencies, concerns and unresolved questions. The hole in the heart of the debate has not been settled, because it is a question that RCT evidence is not designed to answer, but the kinds of evidence that could provide answers have been undervalued by EBM and consequentially underperformed by researchers. Where the EBM model of evidence led the neonatal community down a blind alley was not in forcing RCTs to be performed despite the evidence already being strong, but in forcing the researchers to adopt an RCT methodology which did not address the real questions at stake.

Bibliography:

Anonymous. (1995). Evidence-Based Everything. Bandolier, 12, 91–98.
Balshem, H., Helfand, M., Schunemann, H. J., Oxman, A. D., Kunz, R., Brozek, J., … Guyatt, G. H. (2011). GRADE guidelines: 3. Rating the quality of evidence. J Clin Epidemiol, 64(4), 401–406.
Bartlett, R. H., Andrews, A. F., Toomasian, J. M., Haiduc, N. J., & Gazzaniga, A. B. (1982). Extracorporeal membrane oxygenation for newborn respiratory failure: forty-five cases. Surgery, 92(2), 425–433.
Bartlett, R. H., Gazzaniga, A. B., Fong, S. W., Jefferies, M. R., Roohk, H. V., & Haiduc, N. (1977). Extracorporeal membrane oxygenator support for cardiopulmonary failure. Experience in 28 cases. The Journal of Thoracic and Cardiovascular Surgery, 73(3), 375–386.
Bartlett, R. H., Roloff, D. W., Cornell, R. G., Andrews, A. F., Dillon, P. W., & Zwischenberger, J. B. (1985). Extracorporeal circulation in neonatal respiratory failure: a prospective randomized study. Pediatrics, 76(4), 479–487.
Bluhm, R. (2010). The epistemology and ethics of chronic disease research: further lessons from ECMO. Theor Med Bioeth, 31(2), 107–122. https://doi.org/10.1007/s11017-010-9139-8
Canadian Task Force on the Period Health Examination. (1994). The Canadian guide to clinical preventive health care. Canadian Government Pub Centre.
Collins, H., & Pinch, T. (2008). Dr. Golem: how to think about medicine. Chicago: University of Chicago Press.
Dalton, H. J., & Garcia-Filion, P. C. (2013). Chapter 20. Extracorporeal Life Support for Cardiopulmonary Failure. In M. J. Tobin (Ed.), Principles and Practice of Mechanical Ventilation (3rd ed.). New York, NY: The McGraw-Hill Companies. Retrieved from accessanesthesiology.mhmedical.com/content.aspx?aid=57067601
Dworetz, A. R., Moya, F. R., Sabo, B., Gladstone, I., & Gross, I. (1989). Survival of Infants With Persistent Pulmonary Hypertension Without Extracorporeal Membrane Oxygenation. Pediatrics, 84(1), 1–6.
Evans, D. (2003). Hierarchy of evidence: a framework for ranking evidence evaluating healthcare interventions. J Clin Nurs, 12(1), 77–84.
Grosfeld, J. L. (2003). Robert H. Bartlett: Pediatric History Center, American Academy of Pediatrics: Oral History Project. Elk Grove Village, IL: American Academy of Pediatrics.
Howick, J., Phillips, B., Ball, C., & Sackett, D. (2009). CEBM Levels of Evidence. Centre for Evidence-Based Medicine, www.cebm.net, accessed 01/04/15.
Konduri, G. G., & Kim, U. O. (2009). Advances in the Diagnosis and Management of Persistent Pulmonary Hypertension of the Newborn (PPHN). Pediatric Clinics of North America, 56(3), 579–Contents. https://doi.org/10.1016/j.pcl.2009.04.004
O’Rourke, P. P., Crone, R. K., Vacanti, J. P., Ware, J. H., Lillehei, C. W., Parad, R. B., & Epstein, M. F. (1989). Extracorporeal membrane oxygenation and conventional medical therapy in neonates with persistent pulmonary hypertension of the newborn: a prospective randomized study. Pediatrics, 84(6), 957–963.
Roofthooft, M. T. R., Elema, A., Bergman, K. A., & Berger, R. M. F. (2011). Patient Characteristics in Persistent Pulmonary Hypertension of the Newborn [Research article]. https://doi.org/10.1155/2011/858154
Royall, R. M. (1991). Ethics and statistics in randomized clinical trials. Statistical Science, 6(1), 52–62.
Sackett, D. L. (2000). Evidence-based medicine: how to practice and teach EBM (2nd ed.). Edinburgh: Churchill Livingstone.
Sangalli, F., Marzorati, C., & Rana, N. K. (2014). History of Extracorporeal Life Support. In F. Sangalli, N. Patroniti, & A. Pesenti (Eds.), ECMO – Extracorporeal Life Support in Adults (pp. 3–10). Milan: Springer-Verlag.
Scottish Intercollegiate Guidelines Network (SIGN). (2008). SIGN 50: A Guideline Developer’s Handbook. available at: www.sign.ac.uk, accessed 18/01/13.
Smith, G. C., & Pell, J. P. (2006). Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials. International Journal of Prosthodontics, 19(2), 126.
Soll, R. F. (1996). Neonatal extracorporeal membrane oxygenation—a bridging technique. The Lancet, 348(9020), 70–71. https://doi.org/10.1016/S0140-6736(05)64602-8
Steinhorn, R. H. (2010). Neonatal Pulmonary Hypertension. Pediatric Critical Care Medicine, 11(2 Suppl), S79–S84. https://doi.org/10.1097/PCC.0b013e3181c76cdc
Truog, R. D. (1992). Randomized controlled trials: lessons from ECMO. Clin Res, 40(3), 519–527.
Truog, R. D. (1999). Informed consent and research design in critical care medicine. Critical Care, 3(3), R29-33.
UK Collaborative ECMO Trial Group. (1996). UK collaborative randomised trial of neonatal extracorporeal membrane oxygenation. Lancet, 348(9020), 75–82.
United States Parachute Association (USPA). (2017). Safety & Training: Accident Reports. Retrieved July 27, 2017, from http://www.uspa.org/Safety-Training/Accident-Reports
van Tulder, M., Malmivaara, A., Esmail, R., & Koes, B. (2000). Exercise therapy for low back pain: a systematic review within the framework of the cochrane collaboration back review group. Spine, 25(21), 2784–2796.
Ware, J. H. (1989). Investigating Therapies of Potentially Great Benefit: ECMO. Statistical Science, 4(4), 298–306.
Watson, R. S., & O’Rourke, P. P. (2006). The Bartlett et al extracorporeal membrane oxygenation case series from 1977, with expert commentary provided by Dr P. Pearl O’Rourke. Journal of Critical Care, 21(2), 151–155. https://doi.org/10.1016/j.jcrc.2006.03.001
Wei, L. J., & Durham, S. (1978). The Randomized Play-the-Winner Rule in Medical Trials. Journal of the American Statistical Association, 73(364), 840–843. https://doi.org/10.2307/2286290
Worrall, J. (2008). Evidence and ethics in medicine. Perspect Biol Med, 51(3), 418–431. https://doi.org/10.1353/pbm.0.0040
Wung, J. T., James, L. S., Kilchevsky, E., & James, E. (1985). Management of infants with severe respiratory failure and persistence of the fetal circulation, without hyperventilation. Pediatrics, 76(4), 488–494.
Zapol, W. M., Snider, M. T., Hill, J. D., Fallat, R. J., Bartlett, R. H., Edmunds, L. H., … Miller, R. G. (1979). Extracorporeal Membrane Oxygenation in Severe Acute Respiratory Failure: A Randomized Prospective Study. JAMA, 242(20), 2193–2196. https://doi.org/10.1001/jama.1979.03300200023016
Zelen, M. (1969). Play the winner rule and the controlled clinical trial. Journal of the American Statistical Association, 64(325), 131–146.
Zelen, M. (1979). A new design for randomized clinical trials. N Engl J Med, 300(22), 1242–1245.