The Avoidable Scandal: Benoxaprofen and Theories of Medical Evidence

The Avoidable Scandal: Benoxaprofen and Theories of Medical Evidence

“This debate is about Britain’s greatest drug disaster. It is about the scandal of a huge United States pharmaceutical company coming to Britain and boasting of Opren, a new wonder drug to treat arthritis — with tragic results.”

—Lord Jack Ashley (1)

Benoxaprofen, marketed as Opren in the UK, created a scandal in 1982 when it was withdrawn from sale amidst reports of over 60 deaths and thousands of adverse reactions. The question of who was at fault for this disaster has been addressed many times in the academic literature, Parliament, and the courts. But the case still has much to show us about the way we treat, apply and value evidence in medical practice. The mistakes made have renewed relevance in a context in which evidence is evaluated and ranked according to rigid systems. This paper shows that the scandal was an avoidable one: there was evidence available at the time that allowed astute clinical observers to predict the adverse reactions, and would have allowed modification of the drug regimen to prevent them. However, this evidence was overlooked and underappreciated at the time by the Committee on the Safety of Medicines (CSM), the advisory body to the UK’s regulator. Although the CSM is long gone, the assumptions about what evidence counts, and how much attention evidence sources should be paid, have only been reinforced since.

In the last few decades, the Evidence-Based Medicine (EBM) movement has espoused a theory of medical evidence which rates and prioritizes evidence according to the underlying methodology that produced it. (2) They emphasize evidence from Randomized Controlled Trials (RCTs), distrust evidence from observational studies, and denigrate mechanistic reasoning and clinical experience as sources of evidence. EBM proponents have developed ‘hierarchies of evidence’, influential tools which rank the strength and quality of evidence in terms of the methodology. (3) RCTs dominate these hierarchies, with sources such as mechanistic reasoning and clinical experience relegated to the lowest levels. The implication is that RCT evidence should drive policy and medical decision-making, whilst pharmacologic studies, clinical experience and observational data has a marginal role.

This epistemic program privileges certain kinds of inquiry in medical research: those questions easily addressed through randomized trials. Meanwhile, it disincentivises research which asks questions RCTs are ill-equipped to answer: questions about variation in effects, side-effect profiles and application in the field. The benoxaprofen disaster occurred before the heyday of EBM, but its results warn against focusing on questions about generalized average treatment effects at the cost of de-emphasizing predictable variation in effects and understanding of the underlying pharmacology in making treatment and licensing decisions.

Only by incorporating evidence that EBM marginalizes would another scandal like benoxaprofen be avoided. Had the pharmacologic and observational evidence received due consideration, lives could have been saved and the drug most likely would not have been withdrawn. The evidence that eventually led to the harms of benoxaprofen becoming known, and the drug withdrawal, was also a form of evidence that EBM minimizes: reports of clinical experiences. If the EBM model held sway, the information that would allow us to foresee a potential disaster before it hit, and the information that allowed us to see the damage of that disaster once it arrived and avoid further harms, would be regarded as weak and low-quality. But EBM’s preferred sources, randomized trials, supported benoxaprofen, and masked the crucial issues because such research designs are unable to answer (or even ask) the questions that could have prevented the disaster.

The tragedies here are twofold. Thousands of patients were affected, and many died, in a way that was primarily foreseeable and avoidable. On the reverse, the treatment regimen probably could have been modified to retain benoxaprofen as a treatment for even those same patients. A viable treatment was lost unnecessarily.




Benoxaprofen was called “Britain’s greatest drug disaster” by Labour MP Jack Ashley. (4) It was not a uniquely British disaster, but it did begin and end in the UK. In the 1960s, a British laboratory run by American pharmaceutical giant Eli Lilly began working on arthritis medications. By the early 1970s, Benoxaprofen was created and patented. It is a non-steroidal anti-inflammatory drug (‘NSAID’). NSAIDs include the best-known over-the-counter drugs, such as aspirin and ibuprofen. They are painkillers and anti-fever medications. In large doses, they produce an anti-inflammatory effect, reducing swelling and joint stiffness.

A key target for NSAIDs is chronic rheumatoid arthritis. Arthritis is a disease of the joints, though it has a range of other injurious effects. The 2015 Global Burden of Disease study estimated that 24.5 million people live with rheumatoid arthritis. (5) Joints become swollen and sore. Stiffness limits range of movement. Severe rheumatoid arthritis makes day-to-day life painful and challenging. There is no cure. The treatment options slow disease progression and treat the symptoms to allow patients to live relatively normal lives. NSAIDs can reduce pain and swelling. The prevalence of rheumatoid arthritis, afflicting nearly 1% of people in the developed world, provides a massive market for anti-inflammatories.

Eli Lilly pharmaceuticals attempted to tap this market with benoxaprofen. In a crowded marketplace of similar NSAIDs, benoxaprofen had one distinctive and demonstrable advantage: a long elimination half-life. A drug’s elimination half-life is the time taken for half the dose to leave a patient’s system. It is one way of measuring how long the drug is active within the body, and therefore how frequently patients must take the drug to maintain a stable concentration and sustain the effects. Benoxaprofen remained in patients’ systems longer than many rival NSAIDs. Patients would only need to take benoxaprofen once daily. Many elderly arthritis patients need assistance in managing their drug regimen due to age and arthritic damage to their hands. The long elimination half-life became benoxaprofen’s selling point.

Over 500,000 Brits received benoxaprofen, under the brand name Opren, between 1980-2. The UK was the first to roll out benoxaprofen on a large scale. By 1982, data had amassed to suggest a range of side-effects and over 60 deaths had occurred in Britain. According to Lilly pharmaceuticals and several commentators, the evidence base had not indicated the risk. There were several scandals fomenting here. Had Lilly or the British regulators failed to do their due diligence and released an improperly vetted drug? Had Lilly attempted to conceal information that they knew, or should have known, indicated the risks? Or, maybe, had reports of harm been overblown in the press and precipitated a false scandal, leading to a drug that was no riskier than similar medications being withdrawn unnecessarily? Wrapped up this were other accusations and potential improprieties, including allegations that marketing materials for Opren included exaggerated, unsubstantiated and implausible claims that about the potential effects of the drug.


Laboratory to Market


Drugs come to market through a three-stage process. Before this begins, there are usually extensive studies in vitro and in animals. David Chatfield and John Green of the Lilly research centre in Surrey tested benoxaprofen on dogs, mice, rats, rabbits and rhesus monkeys, as well as in humans. (6) They were testing the pharmacological properties of the drug—how it is absorbed, metabolized and excreted. Following from these early tests, Phase I testing is primarily experimentation with dosage. Very small groups of subjects are recruited. Each group receives a different dose of the drug, to work out how much can be safely tolerated by the body and what dose seems necessary for the drug to be active. Side-effects are monitored, although the numbers involved in these studies are far too small to compile any decent side-effect profile.

One such study by the Surrey lab was designed to determine what dose was needed to produce the desired concentration of the drug in blood plasma. (7) Doses of 100, 200 and 400mg were tested on 17 subjects. Crucially, all were male, healthy, and aged 21-55. They received benoxaprofen for up to 11 days. This study found an average elimination half-life of 30-35 hours, which the researchers considered long. This was the figure that captured the marketing advantage for benoxaprofen. Single doses of 100, 200 and 400mg were all well tolerated by the test subjects.

In tests on rats, a blood plasma concentration of 5 micrograms per milliliter reduced arthritic symptoms by 30%, but 35 micrograms produced up to 70% reductions. In the study on these 17 men, 100mgs of benoxaprofen and greater could maintain the 5 micrograms blood plasma concentration needed, but only the 400mgs dosage achieved the more desirable 35 micrograms level. The researchers hypothesized that once a steady concentration of benoxaprofen in the blood was reached, it could be maintained with smaller daily doses. They recommended initial clinical trials proceed with a dose of 100mgs twice daily.

Phase II studies are slightly larger routinely recruiting around 100 patients. The aim is not to demonstrate that the treatment works, but to determine whether the treatment is promising enough to warrant a full-scale Phase III trial. Researchers will also be trying to work out the best dosage regimen, and checking for further side-effects. It is common to perform several Phase II studies. In the benoxaprofen case, two small double-blind randomized trials were performed comparing benoxaprofen with aspirin and ibuprofen respectively. (8) Each trial lasted 28 weeks. In each, the dosage for benoxaprofen was 400-600mg per day, up to three times that recommended by Phase I research. Again, Phase II trials are often performed in populations who are otherwise healthy, predominantly male, and relatively young. Patients with other conditions (‘comorbidities’) or taking other medications are excluded. These stringent exclusion criteria are intended to ensure straightforward comparisons despite low numbers of participants, and to prevent the comparison being muddied by effects of other treatments or harms of other diseases. Phase II trials are more of a proof of concept than a demonstration that the treatment works.

These trials are not expected to show the superiority of the new treatment to existing treatments. They were too small to demonstrate a statistically significant improvement. Rather, the idea is to show a comparable effect to existing therapies. In both of trials reported in 1979, this was observed. Benoxaprofen had a similar effect to aspirin and ibuprofen. Benoxaprofen had fewer side-effects in both studies, and fewer patients stopped taking benoxaprofen than aspirin or ibuprofen.

Phase III is a large randomized controlled trial. This phase was necessary for FDA approval, which allows the drug to access the American marketplace. They can involve hundreds or thousands of patients, often across many centres. The Phase III trial is what hierarchies of evidence mean when they identify RCTs as the highest level of evidence: the “gold standard” of clinical evidence.

Benoxaprofen gained access to the British market before Phase III trials had reported, in 1980. Trials reported to two symposiums during this time (British Medical Journal Editorial 1982, 459). At that time, clinical trial data was available to regulators on over 2,000 arthritis patients. (10) Lilly submitted this data, along with other trial results, to the FDA to request approval. In the UK, the Committee on the Safety of Medicines (CSM), the advisory body to the regulatory authority at the time, acted more swiftly. The CSM refused requests for papers relating to their decision. (11) It’s not possible to know exactly how much trial data the CSM reviewed in making their decision to allow Lilly’s British subsidiary, Dista Ltd., to market the drug in the UK in 1980. It was not until April of 1982 that the FDA granted Lilly access to the American market for benoxaprofen (there marketed as Oraflex). By that time, over 3,000 patients had participated in benoxaprofen clinical trials, and 5,000 had been monitored for adverse effects. (12)

As John Abraham’s detailed study of the regulatory process has shown,(13) the data that justified licensing benoxaprofen was never clear nor public. The CSM and FDA both seemed to regard benoxaprofen as equally effective as aspirin and ibuprofen. They did not see it as a breakthrough; it was no safer or better than its rivals. But the novel mode of delivery and less frequent dose regimen provided a significant niche in both markets. But it is not the trial data or the question of whether benoxaprofen was superior, inferior or equivalent to its rivals which concerns us here. Even had benoxaprofen passed large-scale trials with flying colours, the outcomes could have been the same.


A “New Era” for Arthritis


In 1980, Dista rolled out Opren in the UK with a marketing crusade, targeting both arthritis patients and rheumatologists with material hyping benoxaprofen and emphasized its less intrusive schedule. Radio and newspaper adverts encouraged arthritis sufferers to ask their doctors about Opren. The campaigns were successful enough to bring 500,000 British arthritis patients into contact with Opren inside two years. (14) Once Lilly received FDA approval, benoxaprofen would form a significant part of their business. In 1982, Lilly stood to make over $250 million annually from the drug. (15)

Opren was controversial from the outset. Dista’s aggressive marketing provoked indignant reactions from British clinicians. One editorial stated that the decision to circumvent regulations to market Opren directly to consumers was “in clear breach of the agreed Code of Practice for the Pharmaceutical Industry”. (16) Portraying Opren as a wonder-drug and telling patients to request it from their doctors angered many clinicians.

At the same time as Opren was released, the foundations of the movement that later became known as Evidence-Based Medicine were being laid. In November 1980, the British Medical Journal published “A plea for clinical epidemiology”, in which the editors lament that the only access to epidemiological education for British practitioners was a one-week course at the University of Southampton, and presciently proposed that “The image of epidemiology will need to be changed if it is to grow in stature and in appeal. Too often at present it is regarded by clinicians as a fringe activity … Yet good medical practice is surely based on a knowledge of the population”. (17) It was the proponents of clinical epidemiology in the 1980s who formulated the first hierarchies of evidence and transitioned the RCT from a component of medical evidence to its epitome. (18)

In the same volume of the BMJ, a concerned clinician, Bernard Caplan, wrote to the editors to complain about the heavy-handed marketing tactics for Opren: “Opren … was described as “a new era” in the treatment of arthritis in one newspaper. The reports were followed by a steady stream of patients with newspaper cuttings coming to the surgery and demanding it. For the last few weeks we have had to tell our patients gently that it is too early to know”. (19) Given Opren’s rapid rollout, it seems that Caplan’s resistance in the face of demanding patients was atypical.

Another clinician painted an even more dramatic picture of the widespread media excitement: “In the last 3-4 weeks, however, large numbers have come carrying cuttings proclaiming “breakthrough,” “the most exciting development since the discovery of aspirin,” and “provides relief from pain and may even cure the disease.” Heady stuff. … Such heavy promotion of a drug does harm rather than stimulate studies of its correct use. Some reputations are now on the line in a very public way”. (20) In the letter, a “sober appraisal” sees the data as falling far short of justifying the bold claims replicating themselves across the press but notes that the drug “so far seems safe”. That perception was about to shift drastically.

The marketing furor provoked hostility in the profession. But anger was directed not just at the manipulation of patients’ expectations and the unsavory experience of having to disappoint cutting-carrying customers. There was also a perception that the claims made on Opren’s behalf, be it due to media exaggeration and misunderstanding or deliberate misinformation, were unfounded or at best poorly supported. Claims were circulating, “implied more often than expressed” (21), that Opren would not just reduce swelling and pain, but would combat or even cure the disease.

There are arthritis treatments that attempt to slow down or even roll back the progress of the disease. This broad and often unrelated class of medications are known as DMARDs (Disease Modifying Anti-Rheumatic Drugs). This category is used in contrast to NSAIDs like aspirin and benoxaprofen. It was not believed that anti-inflammatories would slow down the disease. They were only supposed to provide relief from symptoms. The modern first-line DMARD for rheumatoid arthritis is methotrexate (22), a chemotherapy drug. It has many of the toxicities and adverse effects associated with chemotherapy—liver and kidney damage, hair loss, skin discolouration, nausea and fatigue, amongst others. In smaller doses, its side-effect profile is less severe, yet still serious. Methotrexate has been used to treat a range of autoimmune diseases including arthritis.

Methotrexate is used to delay disease progression and joint damage. Many patients cannot tolerate it, and are then given other therapies instead. It can be given in combination with NSAIDs such as aspirin to supplement the pain and inflammation relief, though there have been many reports of harmful interactions between the drugs. Aspirin and ibuprofen can inhibit kidney function which allows a toxic concentration of methotrexate to build up in the patient’s system.

In the 1980s, as today, slowing the progress of arthritis was difficult and involved treatments that were risky, intrusive and often intolerable for patients. If benoxaprofen could combine the pain relief of aspirin with disease-modifying effects, whilst being relatively side-effect free, it would corner a huge market and represent the “breakthrough” and “new era” that the newspapers were forecasting. But the evidence for these claims was thin.

Arthritis measures struggle to differentiate between effects on symptoms and effects on disease progression. The extent of the symptoms is a common way of measuring the progress of the disease. Popular modern measures include various forms of swollen and tender joint counts, in which clinicians manipulate sets of joints and patients report whether they are experiencing pain, swelling or tenderness. Measures like the DAS-28 score (Disease Activity Score at 28 joints) and the American College of Rheumatology (ACR) scale depend heavily on these measures. Holistic tests like health assessment questionnaires and burden of disease assessments again are strongly informed by the level of pain, inflammation and immobility the patient is experiencing. Whenever a patient is asked to rate the severity of their disease, their replies are coloured by the severity of their symptoms. Therefore, treatments that only suppress symptoms can make patients report a slow-down in disease progress.

An alternative approach is to devise measures which are insensitive to change in symptoms, only suppression of disease. This measure would need to be independent of patients’ perceptions. Evidence-Based Medicine is skeptical of laboratory measures in trials and in evidence more broadly. There is an emphasis on patient-oriented measures—those that really matter to sufferers; symptoms, in other words. This is laudable. But this kind of reasoning only really applies where the question, as ever in EBM, is “Does it work?” When we get into the more intricate realities of mapping the various effects of a treatment—“Does it slow down the disease or merely mask the symptoms?”—rather than conflating them together into one undifferentiated ‘effect’, we need to draw on our full arsenal of information.

In the arthritis case, this remains difficult. There are respected laboratory measures like C-reactive protein levels and erythrocyte sedimentation rates. But these more ‘objective’ measures primarily gauge inflammation, which can be suppressed by anti-inflammatories without a definitive disease-modifying effect. It is hard to see where symptom suppression ends and retardation of disease begins.

One study in the benoxaprofen debate used measures primarily oriented at teasing out disease-modifying effects independently of symptom suppression. Bluhm, Smith and Mikulaschek’s study (23) was completed one year after Opren hit the UK market, but some of the data was accessible to Lilly, Dista and the CSM earlier. The CSM’s refusal to share details of its decision-making again leaves us unable to know what conclusions they reached about benoxaprofen’s potential. The study used two hard measures of disease modifying activity: progression of osseous defects (OD) and joint space narrowing (JSN).

Both are radiographic measures. Osseous defects are areas bone damage. Joint space narrowing is a more commonplace measure used in diagnosing arthritis, and measures cartilage loss. Cartilage is the cushioning between bones at the joints, and functions as a shock absorber. As cartilage is eroded in arthritis, patients experience increasing pain, stiffness and immobility in that joint. Severe damage, to the point that the cartilage is lost entirely, is called a bone-on-bone joint. By this stage, the joint is extremely painful, stiff and inflamed. X-raying joints under pressure, the researchers could see areas in which the space between the joints had narrowed and thus infer that the cartilage had been lost, signifying that the disease was progressing. But JSN is not usually used on its own as an indicator of arthritic progression as factors other than arthritis (aging, overuse and sports injuries) contribute to joint space narrowing.

Bluhm, Smith and Mikulaschek’s study was small, uncontrolled and preliminary, recruiting 39 patients and treated them with benoxaprofen, measuring OD and JSN. Their findings were promising. 28 of the 39 patients showed some evidence of decreased rate of progression compared to expectations on at least one measure. 14 patients (36%) showed decreases in both measures. Their conclusions were optimistic: “Because there is also a trend for this drug to retard or arrest radiologic progression, it becomes a promising agent for the long-term treatment of rheumatoid arthritis.” (24)

There are several assumptions being made here. First, this is an uncontrolled study. It’s hard to be confident that the predicted rate of progress in these measures would indeed have been experienced by these 39 patients had they not been treated with benoxaprofen. If predicted rates of progress were overly pessimistic, then this alone would generate the findings from the study. This is one of the reasons why Evidence-Based Medicine derides uncontrolled observational studies. They are right to be concerned.

A small sample of patients, a somewhat unorthodox measure, and the uncontrolled nature of the study combine to mean that the evidence for benoxaprofen as a disease-modifying drug was perceived as weak outside of the conclusions of the authors. Editorials in the British Medical Journal point to “some preliminary evidence that benoxaprofen may have disease-modifying properties” (25) and conclusions which were “based on evidence that was tenuous to say the least”. (26)

What would be convincing? Larger-scale studies for a start. But really, the writers wanted evidence that benoxaprofen (a) does something truly different to other NSAIDs which everyone believes don’t have disease-modifying activity, and (b) does something similar enough to a known DMARD to make them believe it really could work. One editorial wrote: “What are lacking, however, are convincing clinical trials indicating a range of activities more generally associated with drugs like gold”. (27) In other words, it wasn’t not evidence that it works which would convince the detractors, but evidence about how it works. The claims made on behalf of benoxaprofen contravened the established principle that NSAIDs are not disease-modifying. The exceptional claim for benoxaprofen would need evidence that the drug was not a typical NSAID and evidence that it belonged amongst the DMARDs. The critics were asking a different question to the one that a large trial could answer. They asked not “How big is the effect of benoxaprofen?” but “What is the composition of these effects, in terms of symptom suppression and disease suppression?” That latter question could be answered only with an appeal to a range of information from a range of sources, pulling together different outcome measures with pharmacokinetic detail.

We won’t find out whether benoxaprofen had disease-modifying powers. It was withdrawn a year later, before more analysis could be performed. There is some mechanistic evidence in favour of the claim. Benoxaprofen was indeed unusual amongst NSAIDs. It had a different mode of action to drugs like aspirin and ibuprofen. Its effects on pain and inflammation were similar, but the way it worked was very different, taking a different path to the same result. This gave several advantages. Benoxaprofen’s route had much less chance of causing stomach ulcers, a major risk of aspirin and ibuprofen. When Walter Mikulaschek monitored 1,681 patients treated with benoxaprofen, only 0.4% suffered ulcers, a boon in comparison to its rivals. (28) This generated a lot of the excitement: if it worked differently, it might transcend the limitations of the NSAIDs and break into disease-modifying territory.

But a new mode of action means new side-effects. Just as the potential benefits of benoxaprofen might break into uncharted ground, so might the harms: “A drug with unusual properties may be expected to have unusual side effects”. (29) Some unusual side-effects were already well-known when the drug arrived in Britain. The most intriguing and surprising was high levels of photosensitivity. Photosensitivity occurs when the skin reacts abnormally when exposed to sunlight, particularly to ultraviolet rays. Photosensitivity was not easy to manage. In one of Lord Ashley’s parliamentary debates on benoxaprofen, John Farr, MP for Harborough, who himself had taken Opren, recounted his experiences with photosensitivity: “my skin suffered the most tremendous prickling sensation and burning. I could not into any form of sunlight without having my hands covered … Eventually the pain was so intense that the only way I could recover was to go indoors out of the sun.” (30)

Another odd effect was onycholysis, the painless detachment of the fingernails, and occasionally toenails, from the nail bed. Both were common—photosensitivity was found in 9.4% of patients, and onycholysis in 12.5%. (31) These side-effects could be dismissed by some as strange but manageable. For other commentators, unexplained and unusual adverse reactions were signs that all was not well. The novel mode of action meant benoxaprofen was unpredictable, and unpredictability put patients at risk. As the BMJ put it: “This unusual range of pharmacological properties and bizarre side effects should, we believe, have alerted doctors to a cautious approach to the drug before the deaths were reported.” (32) Again it is mechanistic evidence in combination with pharmacologic studies and observational data that forms a solid evidence-base for a tentative approach. The CSM did not yield to this caution.


A “Drug Disaster”


On August 4th 1982, the new British Minister for Health, Kenneth Clarke, announced that sales of Opren in the UK had been suspended “with immediate effect on grounds of safety”. (33) The CSM telegrammed FDA reporting 61 deaths and over 3,500 adverse effects linked to Opren. (34) Lilly pharmaceuticals withdrew the drug that afternoon. It never returned.

In the aftermath, Lilly’s share prices slumped from $60 highs in July to $48. It was a double whammy for Lilly. Just the day before, officials from the FDA had testified before Congress that Lilly’s reports on adverse reactions to benoxaprofen had been incomplete. By that time, benoxaprofen was on the market in America too. The FDA attested that Oraflex had wrongly not been indicated as the drug responsible for adverse reactions in a range of reports submitted by Lilly. Edgar Davis, vice president at Lilly, denied any wrongdoing and insisted that “a proper investigation will confirm our view” that the submissions had been filed properly. The president of Lilly’s pharmaceutical division said: “We were convinced – and we remain convinced – that the drug is safe and effective when it is used properly”, and claimed that the drug was withdrawn in an “environment of hysteria”. (35)

Lilly had suddenly gone from having a promising multi-million-dollar drug hailed as transforming arthritis treatment, to having a prominent and controversial failed drug with accusations of misconduct and malfeasance in tow. The allegations of impropriety exacerbated the outcry. Only a few months earlier, in April 1982, the FDA had approved Oraflex for sale. What had happened in those months to transform the picture from a breakthrough to a catastrophe?

Reports of adverse reactions to Opren had been feeding through to the CSM for some time. The “yellow card” system, a fledgling process for reporting adverse effects of new drugs, was coming into play. Doctors could report unusual reactions directly to the CSM. By February 1982, the CSM had received reports of 24 Opren-related cases of liver disorders including two deaths. They sent the list to Brian Gennery, Dista’s medical director. (36) Gennery passed the information to Lilly’s US director, William Shedden, later that month.

Public reports of deaths began in the Lancet in April 1982, when six Scottish doctors co-authored a report of three cases of jaundice and one death in benoxaprofen patients. (37) The patients were elderly women. The jaundice and the death of one patient were due to cholestasis, a stoppage in the flow of bile caused by liver damage or blockage of bile ducts. Jaundice indicated that the patients were in various stages of liver and kidney failure.

This pattern of jaundice followed by hepato-renal failure recurred at a larger scale in a report published in May in the BMJ . (38) Hugh Taggart and Joan Alderdice of Belfast City Hospital reported five cases in which elderly female patients died after taking benoxaprofen. The five women were aged between 80 and 86, and had been taking benoxaprofen for 3-7 months. From the onset of jaundice to death was quick: a few weeks to 3 months in the longest case. Benoxaprofen was the only treatment common to all five patients. Autopsies showed clear evidence of cholestasis, alongside liver and kidney damage and pancreatic inflammation, and ruled out other causes of cholestasis such as hepatitis or bile duct obstructions. They also reported a sixth patient who died of kidney failure without developing jaundice first. Taggart expressed his fear that benoxaprofen had already become the third most prescribed NSAID in Northern Ireland, despite being so new and untested. They wrote that “caution should be exercised in the use of benoxaprofen in elderly patients”. (39)

Taggart’s report was not news to Dista or Lilly. He had sent the findings to the CSM and Dista two months previously in March 1982. Brian Gennery’s testimony shows that he initially gave benoxaprofen the benefit of the doubt, suspecting that other factors caused the deaths. But he changed his mind in April 1982, after seeing the reports from Scotland in the Lancet matching the profile in the Taggart cases. He sent the information over to Lilly and William Shedden.

It was not until May 1982, when Taggart published his cases, that the CSM decided to include jaundice on labels as a side-effect of Opren. Evidently, reports in the UK’s most prominent medical journals and acceptance of the side-effect by the CSM prompted other clinicians to join the dots. Increasing reports of jaundice and hepato-renal failure appeared: “the floodgates of notifications opened”. (40) By August, 61 deaths and over 3,500 adverse reactions had been reported, primarily in older female patients.

In June, the CSM finally changed their guidance, indicating lower dosages for elderly patients and recommending stopping treatment if symptoms were observed. Having allowed the drug onto the market in the US just two months previously, the FDA reacted similarly. They met with Lilly representatives on the 23rd June, discussing 16 cases of jaundice, most of which fatal. The FDA realized that the liver failure was time-dependent, emerging after many months of exposure. Benoxaprofen had entered the U.S. market in May after approval in April. With imminent rapid expansion of the market, they feared “the flurry of cases in the U.K. may be a harbinger of a great number of cases”.(41) When Lilly withdrew the drug at the start of August, the confirmed death toll had risen to 86 globally, including 61 in the UK.

John Abraham distributes responsibility for the Opren scandal widely. He argues that William Shedden at Lilly knew of 29 deaths unreported to the FDA when they approved the drug in April. Abraham argues that Lilly were slow to disclose adverse events, and their marketing department continued producing materials claiming that there was no evidence of jaundice as a side-effect months after Lilly knew of the British cases. But the FDA and CSM were also slow to act. The FDA depended upon pharmaceutical companies to disclose adverse events, and were unequipped to investigate individual cases. Testifying to congress, John Harter, the FDA’s leading officer for the regulation of NSAIDs, explained that his department was trapped under a four-month backlog of new data, unable to review and verify the reports even by the time the drug was recalled. The Reagan administration pushed deregulation and industry-friendly policies, and the FDA lacked the resources and regulatory powers to do more.

In the UK, Thatcherism had similar effects. Between the election of the Thatcher government in 1979 and the Opren scandal, the CSM was defunded by 20% of its real budget. Proposals to improve its adverse reaction monitoring systems were rejected or indefinitely delayed. (42) It had an annual budget of just over £1million and only 15 pharmacists on staff. (43) Thatcher’s government had a deregulatory agenda and defunded the CSM heavily, to the point that, by 1990, British regulators were chiefly funded by the pharmaceutical industry. This business-friendly attitude left regulators forced to appeal to pharmaceutical companies’ interests, even promoting themselves as “the fastest licensing authority in the world”. (44) The CSM had certainly been very quick to open the door for Opren, licensing it two years before their American counterparts.

Were Lilly right to give Opren the benefit of the doubt, and the regulators justified in keeping benoxaprofen available until August 1982? Did they withdraw the drug too early or too late? Officials from Lilly, Dista, the CSM and the FDA put forward several arguments for keeping benoxaprofen on the market. They challenged the strength of the evidence from the case reports. In each case, patients had things wrong with them other than rheumatoid arthritis, and were taking medications other than benoxaprofen. The reports were unexpected because no cholestatic jaundice had occurred in the trials. They suspected that causation might have been misattributed. Even if benoxaprofen was responsible, the number of cases reported were quite small as a proportion of the 500,000 patients taking Opren in Britain. They might be comparable to adverse event rates of other well-accepted NSAIDs like aspirin and DMARDs like gold injections. If side-effects were comparable, some liver damage might become accepted as the price of treatment. Lilly’s PR director argued that, despite adverse effects, withdrawing the drug was worse: “You’ve got to consider the case of that elderly person who has been literally crippled by the disease and finally, with Oraflex, found something that worked”. (45) All of which led several Lilly executives to defend their drug even as they removed it from the market, blaming the atmosphere of hysteria.

We cannot tease out the impact the various perceived slights and marketing faux-pas of Lilly and Dista had on the CSM’s decision to suspend the drug’s license. The aggressive marketing strategies had angered doctors and put Opren in the spotlight, sufficiently that when adverse events were reported, it was bound to be big news. As a BMJ editorial put it: “The manufacturers by their over-zealous promotion of the drug may have sown the seeds of its downfall”. (46) Failures to disclose information made the regulators nervous. The drug’s novel mode of action and its unusual side-effects natural created suspicion of unexpected events. All of these concerns probably contributed to benoxaprofen’s downfall. But did the evidence warrant withdrawal?

Case reports are the lowest ranking form of evidence in most Evidence-Based Medicine hierarchies. They are uncontrolled reports of experiences. Doctors, like everyone, are prone to cognitive biases. They tend to remember unusual cases and underestimate the number of unmemorable cases. So, when assessing treatments, a doctor is likely to remember cases in which the drug appeared to work and patients recovered, and overlook the litany of cases in which it didn’t. Examples of a treatment appearing to work are poor evidence that the treatment works generally. For that reason, EBM is right to put case reports low in their hierarchies with respect to comparative effectiveness.

But as evidence for the claim that an adverse effect is experienced, case reports are much more important. Cases of severe side-effects and deaths, like cases of miraculous recoveries, are memorable and therefore liable to be overemphasized. But when we are not trying to show that a treatment works well for a broad population, but rather working out what the potential side-effects are and who experiences them, anecdotal evidence can be powerful. Of course, it would be misguided to blame benoxaprofen for the deaths if all patients were taking another drug that could’ve been responsible or suffering another condition causes cholestasis. But the range of reports rendered those alternative explanations implausible, thus strengthening the power of the case reports considerably. The uniting factor was benoxaprofen alone. One lesson from this case, then, is that sets of case reports can become compelling evidence for a side-effect.

Lilly officials repeated the claim that “No jaundice and no deaths due to hepatic failure were reported in approximately 2200 carefully followed patients who participated in clinical trials in the U.S.”. (47) This claim has since been discredited. Lilly appointed Walter Mikulaschek to monitor clinical outcomes. He quashed the claim that no jaundice or hepatic failure occurred: “it would be wholly incorrect to say … that there were no patients who had jaundice in the U.S. Phase III clinical trials”. (48) Lilly admitted in June 1982 that five non-fatal cases of jaundice and two cases of kidney damage were observed.

But even if the trials had been clean, would they constitute strong evidence that adverse events were unlikely? RCT evidence is given the highest quality rating EBM hierarchies. But RCTs are not designed to capture adverse effects. The quality rating should only apply, if at all, to RCTs as evidence for a beneficial average effect. The absence of harms in a trial is not strong evidence that harms won’t occur in practice. Clinical trials measure specific outcomes, only collecting data about adverse effects as a side-project. They do not seek out adverse events or process such data systematically. They don’t ask why patients experience certain adverse reactions, which patients experience them, or how to predict them. Because of the emphasis on RCTs as the pinnacle of modern medical evidence, collection of side-effect data is often left to this unsystematic byproduct of another function. Philosopher Adam La Caze notes that most current regulatory systems rely heavily on side-effects found in RCTs to profile the harms of treatments. As he puts, they: “rely on the serendipitous findings of randomized trials set up to test benefit hypotheses”. (49)

Worse still, because RCTs are designed to test benefit hypotheses, they tend to recruit participants who are likely to benefit and less vulnerable to side-effects. As we saw, early phase trials especially are typically conducted in young, otherwise healthy, predominantly male subjects. Older patients, women, minorities and those suffering from other diseases or taking other medications are explicitly excluded from trials. Rheumatoid arthritis is one of the worst offenders of excluding its target population from its trials. A study by Sokka and Pincus showed that the majority of arthritis patients do not qualify for the majority of clinical trials on arthritis. (50) The arthritic population is older, suffers more comordibities, takes more drugs, and is more predominantly female, than any trial population. Simply put, relying on trials to find side-effects was dependent on the side-effects being similar in young, healthy men as in elderly women suffering a range of other conditions. As the next section shows, this was predictably implausible.

Many alternative approaches offer more systematic recording and prediction of harms (and benefits). These include longitudinal studies, in which patients are followed over the long-term to detect latent harms, and outcomes research, in which clinicians record treatments and outcomes in large databases for analysis. Both are inherently observational. In 2000, Gordon Guyatt, one of the foremost proponents of EBM, accepted that, “much of the evidence regarding the harmful effects of our therapies comes from observational studies”. (51) But the EBM movement see this as a “challenge” to Evidence Based Medicine, calling for side-effect data to be put on a higher quality footing through trial data. But side-effects cannot be evaluated as the main target of an RCT. We must use the information available, and build upon it to develop more sophisticated models, databases and methods to meet that challenge. Drawing upon observational data could be a cornerstone of EBM, rather than a sticking point. But to achieve this, the idea that RCTs can answer any question better than other methods must first fall.

The benoxaprofen scandal had positive effects on side-effect reporting. The “yellow card” system in the UK, by which doctors should routinely report adverse effects, was new in Scotland at the time. That system was boosted, and professional, public and political awareness of the importance of monitoring side-effects of new drugs increased. Grahame called the yellow card system: “a welcome innovation and is the kind of measure that could forestall another tragedy.” (52)

Finally, was all of this blown out of proportion amidst hysteria? Was this level of serious side-effects comparable to other treatments? If so, benoxaprofen got a rough deal. If we’d accept a different treatment with the same balance of harms and benefits, then we should accept benoxaprofen too. Abraham thinks that the preponderance of evidence discredits this notion. Lilly executives disagreed. Medical opinion was divided. A Rheumatology editorial described the Opren scandal as a “calamity”, but considered that for a less prominent and successful drug, the adverse effects could be acceptable: “What would otherwise have been a few sporadic serious idiosyncratic reactions assumed alarming proportions, with the result that the CSM felt that the time had come”. (53) The CSM was heavily criticism, not just by Lilly and Dista. They were condemned for hasty overreaction and for creating panic by going public with their suspension of the drug before communicating the facts to doctors. They sent over 98,000 letters to individual doctors explaining what had occurred, but announced the suspension to the press before the letters could arrive. Getting the word out may have averted further crisis, but escalated panic and uproar, effectively killing any chance of reintroducing benoxaprofen the problems were better understood.


The Avoidable Scandal


Whether the scale of harm was proportionate and acceptable given the benefits is hard to say, not least because the level of harm we can tolerate depends on whether benoxaprofen was an NSAID or a DMARD. But the introduction and then the withdrawal of the drug were handled in a way that ensured it would be seen as a scandal and a disaster. Kenneth Clarke tried to represent the events as unforeseeable and unpreventable: “the history of Opren was a tragedy. However, I believe that it was not a scandal.” He dismissed lamentations in the press, saying: “The politest thing one can say is that that is the wisdom of hindsight by people who are not trained to know better”. (54)

But the disaster is made so much worse because it was predictable and avoidable. The role of clinical experience and case reports as evidence of adverse effects has been seen. Let’s turn to the power of mechanistic reasoning and laboratory evidence from pharmacokinetic studies of the properties of drugs, the evidence that allowed the Opren scandal to be predicted twice before it happened.

Two studies were published back-to-back in the European Journal of Rheumatology and Inflammation in 1982. They were presented in June 1981 at the second Benoxaprofen Symposium in Paris, alongside the early benoxaprofen trials. The papers were amongst others unpacking the mechanism of benoxaprofen’s photosensitivity effect, trials, and the paper from Bluhm, Smith and Mikulaschek reporting the potential disease-modifying effect of the drug. These papers were published together as conference proceedings in 1982. (55) They should, then, have been familiar to anyone interested in Opren. Both were pharmacokinetic studies, the branch of pharmacology which studies how a drug moves through the body, being absorbed, distributed, metabolized and excreted.

The long elimination half-life figure, reported back in 1977 in the Phase I studies as 30-35 hours, was the standard estimate of elimination half-life throughout discussions of benoxaprofen. Phase I studies recommended 100mg doses twice daily. Phase II trials had used, but didn’t test or analyze, a 400-600mg daily dosage. When Opren was marketed, the standard dosage was 600mg. This dosage didn’t vary. The drug was marketed with the highest dose used in the clinical trials, despite that dosage exceeding that tested for safety and studied for its pharmacokinetic properties in the Phase I studies.

While a relatively large dose may have concerned the pharmacologists, their real worry was variation. As is typical for arthritis, the Phase I and II trials were conducted on young, otherwise healthy male subjects taking no potentially conflicting medications. In that group, the elimination half-life was 30-35 hours. The pharmacokinetics researchers understood that all sorts of factors would affect the half-life, and with it the tolerable dose.

The first study was led by Ronnie Hamdy of St. John’s Hospital, London. (56) They measured the half-life in two groups of elderly female patients. The first group received the 600mg dose, and the second 300mg. They measured blood levels over 120 hours. Contrary to the estimates of 30-35 hours, they found a mean elimination half-life of 111 hours for the 600mg group and 86.4 hours for the 300mg dose. Half-life ranged from 36 to 120+ hours. Even the shortest half-life amongst elderly female subjects was longer anything found in the Phase I studies. The longest had half-lives off the scale: they still had more than half of the drug in their bloodstream after 5 days when the experiment ended. Surprised, Hamdy and colleagues studied four more patients with a 600mg dose over 504 hours (21 days). In those patients, the mean half-life was 147.9 hours: over six days. In that time blood plasma levels were “unnecessarily elevated and potentially harmful”. (57)

The second study was by Kamal and Koch. (58) Koch worked with Hamdy on his study, and collaborated with Kamal to study eight elderly patients (mean age 77 years), giving a 600mg daily dose of benoxaprofen for 10 days, following them over a 17-day period. Blood plasma levels of benoxaprofen showed a continuous rise over the 10 days, not the maintenance of a steady-state level observed in Phase I. The mean half-life was 101 hours. They concluded that: “The higher benoxaprofen concentrations and the long elimination half-life show evidence of accumulation in the elderly, probably due to several causes, including poor bowel motility and decreased renal clearance common with increasing age. The recommended dose may require modification in geriatric patients.” (59)

These studies and their results were known to Lilly and Dista. Indeed, they were funded and commissioned by Lilly and the lab-work for Hamdy’s study was performed in Lilly’s laboratory. Lilly organized the Paris symposium at which each presented. The results of these two studies, and two others, reached Dista in May 1981. The results were formally communicated to the CSM in October. To Dista’s credit, they asked the CSM to recommend halving the dosage for the elderly. The CSM discussed this proposal in November 1981, but rejected it because the findings conflicted with the previous results. As the Health Minister Kenneth Clarke put it: “the drug appeared to be retained much longer in the bodies of the patients in Basingstoke than in the patients in Indianapolis” (60) (Lilly’s headquarters was in Indianapolis, their UK corporate office in Basingstoke). But clearly it was unlikely that geography, not demography, was the cause.

Ken Clarke’s reasoning for dismissing the Opren scandal as an unpredictable tragedy is that these studies did not demonstrate the link between the vastly elongated elimination half-life and liver and kidney failure. Addressing Parliament, he said: “there was no significant evidence of the real problem—the one that led to its withdrawal—of reactions in the liver and the kidneys”, and, “Dr. Hamdy’s report did not refer to deaths from jaundiced livers and kidneys.” (61) Lord Ashley’s interpretation was very different: “It implied a grave danger to the elderly, yet for a whole year the drug watchdog failed to bark.” Indeed, it took over a year from the Paris symposium for the CSM to accede. In June 1982, they recommended halving the dose for the elderly, eight months after Dista’s request. By that time, it was far too late.

Hamdy’s study, and the others, should not be expected to state that liver and kidney failure would be a consequence of rapid buildup of benoxaprofen. Their studies were not investigating that question. They did not follow patients for long enough to observe downstream effects. Rather, their studies invalidate the evidence-base for an unvarying 600mg dosage recommendation. They were explicit in recommending “adjustment of doses of benoxaprofen to prevent abnormally high plasma concentrations.” (62) Their recommendations were clear and specific: “we recommend that the dose of benoxaprofen be decreased (approximately 50 per cent) or the interval between doses extended (2.5 times)” for elderly patients. (63) For their audience, they hardly needed to explain why. The liver is primarily responsible for the metabolism of benoxaprofen, the kidney for its excretion. They had already noted what any clinician worth their salt knows: elderly patients generally have significantly reduced kidney and/or liver function compared to younger, healthier patients. This principle was now experimentally verified: the drug was not reaching a steady level but accumulating over time. Metabolism and excretion were failing. The results were clear: an “Opren pile up”. (64)

The pile-up was a vicious cycle ending in rapid multiple system failure. As the drug accumulated in the bloodstream, it damaged kidney cells. Kidney damage made excreting the drug slower still, and the concentration spiked further. The blood concentration increased until liver enzymes were saturated, decreasing the liver’s ability to metabolize the drug. Without the liver first metabolizing the drug, the time to excretion lengthened yet further. Ultimately, the liver and kidneys would incur such damage that they were fatally compromised. This cycle explains why the time from jaundice symptoms to death was so short, the decline so rapid, and why taking the patient off the drug once symptoms emerged didn’t avert disaster.

The danger of allowing large drug concentrations to build up in the bloodstream are obvious to any pharmacologist or clinician. Specific predictions of which disastrous consequence would manifest first once benoxaprofen levels went critical were superfluous. Their implied prediction was that toxic, dangerous and unnecessary levels of benoxaprofen accumulating in the systems of elderly patients would have harmful consequences. It was that prediction that meant the Opren tragedy could have been avoided.

In his report, Hamdy’s team are not pessimistic. They do not forecast deaths and scandal. Their interpretation is positive, and not unjustifiably so. The researchers are pleased that the dosage can be slashed in older patients whilst still achieving the same benefits. By using this new information, elderly patients could receive lower, less frequent doses, perhaps decreasing their exposure to side-effects whilst preserving the therapeutic effect. They concluded that “work is needed to determine whether this prolonged half-life is matched by an equally prolonged therapeutic effect.” (65) A prolonged effect from a lower dose was good news. But ignoring this news would “result in unnecessarily elevated and potentially harmful plasma levels”. Their tone doesn’t suggest they considered the new information being ignored was at all likely.

Not every predictable disaster is avoidable. Some risks identified beforehand can or even should be taken. This is the reasoning behind the argument that the harms of benoxaprofen were small enough, given the benefits, to accept. But this was not such a case. The Paris symposium reports did not offer a dilemma: take the risk or withdraw the drug. Rather, a review of the treatment protocols to ensure that age and kidney function were incorporated when prescribing would mitigate the risk. A conservative and responsive dosage regimen, alongside further study of blood plasma concentrations, could have avoided the suffering and deaths that followed. It might well, too, have prevented benoxaprofen being withdrawn and the disputed potential of this intriguing treatment being squandered.

Dista asked to do this. The CSM refused. In September 1981, Dista took the unorthodox step of circulating a booklet to doctors which included the reports from the Paris symposium, without the CSM’s authorization. Unfortunately, without a change to official datasheets, there was little chance of Dista influencing prescribing behavior: “experience suggests that most doctors would have thought the booklet to be just more advertising for Opren and thrown it straight into their wastepaper basket”. (66) When change came, the adverse reactions and deaths had accumulated to a point that a calamitous drug withdrawal was no longer avoidable.




With documents unreleased, it may never be clear why the CSM refused to change Opren’s dosage. It is even harder to understand why the CSM postponed their decision to follow the recommendations of the Paris symposium researchers for so long, waiting over a year to update the guidance. As reports of liver and kidney failure accumulated, they served as startling first-hand confirmation of Hamdy, Kamal and Koch’s warnings. Alone, the deaths could be explained away for a while. But coupled with a clearly-explaining biological mechanism, twiceover experimentally verified, leading to two predictions of toxic accumulation and anticipatable effects on the liver and kidneys, those deaths provided empirical verifications of an already highly plausible pharmacological mechanism. It would have been easy and largely risk-free to act on that prediction at the start. The motivation was even clearer once reports of deaths began. This is another example of the interaction of evidence: the case reports solidify the mechanistic reasoning and pharmacological evidence. They support and enhance one another to the point that the evidence is compelling. They unite to answer a set of questions about the ideal dosage, side-effect profile, and how the dose and side-effects vary with age and health, that were unaddressed by clinical trials.

Pharmacological evidence like the studies produced by Hamdy et al. and Kamal & Koch is rated at the lowest quality levels by hierarchies of evidence. Showing how the drug moves through a system, how it is metabolized and the reactions it has, is not judged by EBM to have any evidential power. Again, this is due to fixation with the question: “What is the average treatment effect?” But when considering how effects vary across populations, this data and these studies are crucial. The information is indispensable in deciding the dosage level for different groups, tailoring dosage to patients, and reducing the risk of severe adverse reactions.

The benoxaprofen scandal could have been avoided at three points. First, mechanistic reasoning and clinical understanding would tell a thoughtful practitioner that most elderly patients have decreased renal function relative to the patients from the clinical trials. This knowledge alone would probably suffice to justify stratifying the dose by age and kidney health until further information was available. However, this reasoning is given no evidential weight in current models. Indeed, varying away from the dosage as tested in clinical trials conflicts with the approach favoured by EBM. Second, pharmacokinetic studies confirmed that mechanistic reasoning, showing the reality of benoxaprofen retention and accumulation well beyond clinically acceptable thresholds. These studies are also relegated to the lowest levels of evidence. Thirdly, case reports demonstrating the anticipatable harms accruing from this over-accumulation bore out theory and pharmacology in practice. These reports, too, occupy the bottom rung of hierarchies. The combined result of lowest-ranked evidence can be entirely compelling. Individually, they can answer questions which clinical trial evidence was never able to touch. Finally, without this data, and so long as this information and evidence remains marginalized and undervalued, we lose the ability to predict and avoid scandals and disasters like benoxaprofen: double disasters in which both lives and viable drugs are lost.

The EBM model of evidence does not prevent drug disasters like benoxaprofen recurring. Rather, in order to predict and avoid such disasters, clinicians and policymakers have to against the recommendations of hierarchies of evidence, turning to marginalized evidence. Hierarchical approaches to evidence do not allow them to recognize the ways in which evidence sources interact, strengthening or undermining each other. They fail to allow for a complex picture of treatment effects and predictable variation between patients, instead focusing only on a single clinical question about the average treatment effect. Prioritizing this question flattens the landscape of variation and reduces complex pharmacology to a single headline figure, which can mislead—with serious consequences.


  • Abraham, John. 1994. Distributing the Benefit of the Doubt: Scientists, Regulators, and Drug Safety. Science, Technology, & Human Values 19: 493–522. doi:10.1177/016224399401900404.
  • Australian National Health and Medical Research Council (ANHMRC). 1999. A Guide to the Development, Implementation and Evaluation of Clinical Practice Guidelines. Commonwealth of Australia: available at:, accessed 01/04/15.
  • Balshem, H., M. Helfand, H. J. Schunemann, A. D. Oxman, R. Kunz, J. Brozek, G. E. Vist, et al. 2011. GRADE guidelines: 3. Rating the quality of evidence. J Clin Epidemiol 64: 401–6.
  • Bluhm, G B, D W Smith, and W M Mikulaschek. 1982. Radiologic assessment of benoxaprofen therapy in rheumatoid arthritis. European journal of rheumatology and inflammation 5: 186–197.
  • British Medical Journal Editorial. 1980. A plea for clinical epidemiology. Br Med J 281: 1163–1163.
  • British Medical Journal Editorial. 1982. Benoxaprofen. Br Med J (Clin Res Ed) 285: 459–460.
  • Canadian Task Force on the Periodic Health Examination. 1979. The Periodic Health Examination. CMAJ 121: 1193–252.
  • Caplan, Bernard. 1980. Newspaper reports of new drugs. Br Med J 281: 1493–1493.
  • Chatfield, David H., and John N. Green. 1978. Disposition and Metabolism of Benoxaprofen in Laboratory Animals and Man. Xenobiotica 8: 133–144.
  • Daly, J. 2005. Evidence-based medicine and the search for a science of clinical care. California/Milbank Books on Health and the Public 12. Berkeley, CA ; London: University of California Press.
  • Goudie, B M, G F Birnie, G Watkinson, R N M MacSween, L Kissen, and N Cunningham. Jaundice associated with the use of benoxaprofen. Lancet 319: 959.
  • Grahame, Rodney. 1982. The Rise and Fall of Benoxaprofen. Rheumatology 21: 191–193.
  • Gum, O B. 1980. Long-term efficacy and safety of benoxaprofen: comparison with aspirin and ibuprofen in patients with active rheumatoid arthritis. The Journal of rheumatology. Supplement 6: 76–88.
  • Guyatt, G. H., R. B. Haynes, R. Z. Jaeschke, D. J. Cook, L. Green, C. D. Naylor, M. C. Wilson, and W. S. Richardson. 2000. Users’ Guides to the Medical Literature: XXV. Evidence-based medicine: principles for applying the Users’ Guides to patient care. Evidence-Based Medicine Working Group. JAMA 284: 1290–6.
  • Guyatt, G.H., and D. Rennie. 2008. The philosophy of evidence-based medicine. In Users’ Guides to the Medical Literature, ed. G. H. Guyatt and D. Rennie, 9–16. New York: McGraw Hill Medical.
  • Hamdy, R. C., B. Murnane, N. Perera, K. Woodcock, and I. M. Koch. 1982. The pharmacokinetics of benoxaprofen in elderly subjects. European Journal of Rheumatology and Inflammation 5: 69–75.
  • House of Commons Debate. 1983. Hansard (27th January 1983) vol.35, cc1098-122.
  • House of Commons Debate. 1988. Hansard (17th March 1988) vol.129 cc1319-26.
  • Howick, J., I. Chalmers, P. Glasziou, and T. Greenhalgh. 2011. The Oxford 2011 Levels of Evidence. Oxford Centre for Evidence-Based Medicine, available at accessed 01/04/15.
  • Kamal, A., and I.M. Koch. 1982. Pharmacokinetic Studies of Benoxaprofen in Geriatric Patients. European Journal of Rheumatology and Inflammation 5: 76–81.
  • La Caze, A. 2009. Evidence-Based Medicine Must Be… Journal of Medicine and Philosophy 34: 509–527.
  • Lesser, Frank. 1983. Drugs monitor needs sharper teeth. New Scientist, March 17.
  • Lopez-Olivo, Maria Angeles, Harish R Siddhanamatha, Beverley Shea, Peter Tugwell, George A Wells, and Maria E Suarez-Almazor. 2014. Methotrexate for treating rheumatoid arthritis. In Cochrane Database of Systematic Reviews. John Wiley & Sons, Ltd.
  • Lueck, Thomas J. 1982. At Lilly, the Side-Effects of Oraflex. The New York Times, August 15.
  • McConkey, B. 1980. Newspaper reports of new drugs. British Medical Journal 281: 1564–1564.
  • Mikulaschek, W M. 1980. Long-term safety of benoxaprofen. The Journal of rheumatology. Supplement 6: 100–107.
  • Mikulaschek, W M. 1982. An Update on Long-Term Efficacy and Safety With Benoxaprofen. European Journal of Rheumatology and Inflammation 5: 206–215.
  • Smith, Gl, Ra Goulbourn, Ra Burt, and Dh Chatfield. 1977. Preliminary studies of absorption and excretion of benoxaprofen in man. British Journal of Clinical Pharmacology 4: 585–590.
  • Sokka, Tuulikki, and Theodore Pincus. 2003. Most patients receiving routine care for rheumatoid arthritis in 2001 did not meet inclusion criteria for most recent clinical trials or american college of rheumatology criteria for remission. J Rheumatol 30: 1138–1146.
  • Straus, S. E. 2011. Evidence-based medicine: how to practice and teach it. 4th ed. Edinburgh: Elsevier Churchill Livingstone.
  • Taggart, H M, and J M Alderdice. 1982. Fatal cholestatic jaundice in elderly patients taking benoxaprofen. British Medical Journal (Clinical research ed.) 284: 1372.
  • Vos, T, C Allen, M Arora, R M Barber, Z A Bhutta, A Brown, and A Carter. 2016. Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases and injuries, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet (London, England) 388: 1545–1602.