Strength of Recommendation hierarchies such as SORT and GRADE go a step further than your standard hierarchy of evidence. Standard hierarchies tend to rank or rate the evidence provided by a study on a scale of quality, strength or validity. Strength of Recommendation hierarchies are usually two-step processes. First, they perform a traditional ranking or rating. Then they equate their rating of evidence quality with a strength of recommendation rating. The higher the quality of evidence, the stronger a clinician can or should recommend for (or against) the treatment on that basis. However, a strong warning against this inference already awaits in the philosophy of science.
As Richard Rudner demonstrated back in 1953, evidence alone is not sufficient to determine when or how strongly a recommendation should be made. Two factors determine a recommendation—the confidence in the recommendation, and the threshold of confidence at or above which one is willing to make the recommendation. The evidence does not determine this recommendation threshold. Indeed, the threshold does and should vary. Consider: how confident should a doctor be before recommending an aspirin for a headache? How confident should she be before recommending exercise and a balanced diet? How confident should the same doctor be before recommending a leg amputation? Or total body irradiation? Clearly, the doctor must be much more confident in their evidence for the effects (and the necessity) of the leg amputation or the irradiation than she must in the aspirin or the bed rest. There are situations in which given the exact same strength and quality of evidence for the effectiveness of two different treatments, the clinician would recommend one but not the other. Beyond that, there are cases in which a weaker evidence-base could license a strong recommendation even where a stronger one couldn’t (weak evidence that aspirin is good for headache could license a strong recommendation, but a stronger evidence base for leg amputation might not).
What determines the recommendation threshold? According to Rudner, the consequences of error. The graver the consequences of recommending wrongly, the more confident the clinician must be before making her recommendation. That squares with the intuitions in the four cases above. Where strength of recommendation tables translate evidence quality directly into strength of recommendation, we’re risking some grave consequences.