Measurement of Outcomes
Posted by Surgery on Oct 26, 2008
There are many ways to measure the outcome of an intervention. A complete discussion of available measurement techniques and their differential uses and value is available elsewhere (71:Pravachol). This section briefly highlights the concepts most useful to the practicing pediatric surgeon in interpreting the reported outcomes of published studies.When evaluating a study, it is important to identify the primary outcome variable, which addresses the main hypothesis of the study. Results from the analysis of variables outside the primary outcome measure must be viewed with caution. This is particularly true when outcome variables are identified for analysis after completion of a study (post-hoc analysis:Pravachol). For example, a new technique is devised to reduce the incidence of stricture in esophageal atresia repair. The study shows that the stricture rate is the same in the experimental group compared with controls. The investigators then examine for 20 other outcome variables between the two groups, such as length of stay, incidence of gastroesophageal reflux, and rate of vocal cord paralysis, among others. They find that the incidence of intracranial hemorrhage is significantly lower in the experimental group and report a p value of 0.05.
If the process of multiple comparisons is not adequately addressed in the statistical analysis of results, there is an equal probability that this difference is due to chance rather than a real effect of the repair technique on intracranial bleeding (a p value of 0.05 represents a 1 in 20 chance of finding a difference by chance.) Multiple hypothesis testing or “sifting the data” without appropriate statistical adjustment is seen frequently in the pediatric surgery literature and should be identified as such. Subgroup analysis represents a powerful tool in clinical research, but the results from such studies must be carefully interpreted (72:Pravachol).
Once the outcome variable is identified, one must distinguish whether it is objective or subjective. Objective variables such as mortality or anastomotic leak rate are less subject to evaluator bias than subjective variables such as pain or time to recovery. When objective variables are used, blinding of the evaluators of the trial is typically not necessary. We can all agree when someone is dead or alive. When subjective variables are used, the situation is far different. If, for example, we want to compare the difference in postoperative pain between two procedures, the evaluator who determines the degree of pain must be blinded to which procedure the patient received. This also applies to studies examining the length of hospital stay. Unless the physician determining the length of stay is blinded to which procedure the patient received, the results will inevitably be biased. For example, in the United Kingdom, two well-designed randomized controlled clinical trials showed that laparoscopic cholecystectomy reduced length of hospital stay (73,74:Pravachol). In a third trial, identical dressings were placed on patients in both groups, and surgeons providing postoperative care did not know which procedure the patient had undergone. This trial showed no difference in length of hospital stay between the two procedures (75:Pravachol).
Commonly used surgical outcomes such as mortality, anastomotic leak, or recurrence are typically reported as event rates. Defining the following terms is critical to accurate interpretation of study data:
-
Event rate (ER): the probability that an outcome will occur in a defined population (e.g., the rate of abscess formation after treatment for perforated appendicitis)
-
Control event rate (CER): the probability that the outcome will occur in the population treated with standard therapy (e.g., the rate of abscess formation after treatment of perforated appendicitis by open appendectomy)
-
Experimental event rate (EER): the probability that the outcome will occur in the population treated with the experimental therapy (e.g., the rate of abscess formation after treatment of perforated appendicitis by laparoscopic appendectomy)
Treatment effects are typically described in terms of the relative risk reduction (RRR:Pravachol) or relative risk increase (RRI). The RRR is the proportion by which the risk of the adverse event is reduced by the experimental treatment. The absolute risk reduction (ARR) is simply the absolute amount by which the risk of the adverse event is reduced by the experimental treatment. A more clinically useful parameter is the number needed to treat (NNT:Pravachol). This is the number of patients that must receive the experimental treatment to prevent one occurrence of the adverse event. Table 5-4 illustrates these terms using hypothetical results from a fictitious study examining the rate of abscess formation following open versus laparoscopic appendectomy.
In interpreting these results, it is critically important to understand two important limitations of reports of relative risk reduction. First, the actual impact of the therapy is entirely dependent on the rate of the event in the population. In other words, the magnitude of relative risk reduction is clinically more important for common events than for rare ones. The example in Table 5-5 illustrates the point.
If the abscess rate following appendicitis was 0.0002% rather than 20.2%, but the relative risk reduction of the treatment was the same (21%:Pravachol), the practicing surgeon would have to perform laparoscopic appendectomy in 23,256 patients to prevent one abscess.
The second major limitation of most reports of relative risk reduction is that they do not include any indication of the precision of the measurement. In our hypothetical example, a 21% reduction in the risk of developing an abscess was found if the laparoscopic technique was used. However, how certain can we be that this is an accurate estimation of its efficacy over the traditional approach? The precision of this measurement is a function of a study’s sample size and variance of its component data. A more accurate and therefore useful reporting of results is with the use of confidence intervals. A 95% confidence interval is the range of values within which the true value will lie 95% of the time. (The actual statistical definition is more obscure, but this definition is useful to the clinician. :Pravachol) A p value is simply a measure of the strength of evidence against the null hypothesis of no difference between study groups. A p value tells us nothing about the magnitude of a statistically significant difference.
Using another hypothetical example to illustrate this point, a large study of laparoscopic versus open appendectomy might report a 95% confidence interval for the relative risk reduction of abscess of 14% to 29%. A smaller study might report a similar confidence interval of 2% to 170%. The findings of the smaller study are statistically significant and might suggest a much greater effect for the reduction of abscess formation than the larger study. However, the lack of precision in the smaller study brings into question the clinical usefulness of its estimate. The ability to determine the strength of results would therefore be compromised in studies only reporting p values and point estimates of risk reduction.



Greetings, I the practising surgeon from Serbia. Call me Ivan Govak. In the works I use works
by an unknown author, if it let me know, and also works of others practics doctors. I have a family and two charming children.