How Should the Distinction Between Internal and External Validity (with respect to causal claims) Inform Our Interpretation of the Evidence Hierarchy that Characterizes the Evidence Based Medicine/Policy Movement?

In what follows, I distinguish internal from external validity.  I present the evidence hierarchy and argue it holds for internal validity because RCTs minimize confounding variables whilst observational studies cannot.  Following Roush and La Caze, I contend RCTs do not meet the challenge of external validity and produce transferable results (test population to target population).  I claim observational studies have greater external validity than RCTs, and as such when the focus is external validity the evidence hierarchy does not hold.

The difference between internal and external validity is that internal validity deals with the study itself whilst external validity looks at whether one can take the results of the study and generalize.  Generalizing is important because often when a drug is tested, the test population does not consist of the people the drug company is actually looking to target (target population).  Good results should be able to be extrapolated and are applicable to ‘real’ populations (not controlled by a study).  To highlight the distinction, imagine a study to measure the effectiveness of a particular treatment on some disease.  Internal validity asks whether the probabilistic correlation between treatment and effect in the test population is a genuine causal relation.  It refers to what degree the results of a study accurately reflect the effects of the intervention on the participants in the study.  Studies that are subjected to more sources of potential bias or to biases of greater magnitude have a lower internal validity.  External validity concerns whether the causal relation between treatment and effect be true of the target population.  That is, whether we can expect the average result observed in the trial to accurately predict the average response in the target population.  It refers to the degree to which the results of a study can be extended to individuals not involved in the study.

The evidence hierarchy depicts the order scientists place on the various methods of conducting a study.  It suggests that evidence gained from randomized control trials (RCTs) is of the highest quality.  Second in the hierarchy is evidence gained from observational studies such as comparison studies and cohort studies (use patients who already have a disease or other condition and look back to see if there are any characteristics of these patients that differ from those who do not have the disease; the cohort is identified before the onset of the disease).  Finally, evidence gained form core science or expert opinion is considered to be of the lowest quality with respect to generating scientific knowledge.  Thus, the evidence hierarchy is as follows in decreasing order of evidential quality:

1.Randomized Control Trials

2.Observational Studies

3.Core Science/Expert Opinion

On first look the hierarchy appears to stand up.  RCTs require participants to be randomly assigned to experimental and control groups.  Neither the scientists, nor the participants know which group the participants are in (double blind).  It is essential that there be an equal distribution of causal factors between the experimental and control groups, or else the scientists may get confounding results; the experimental group and the control group should have equal proportions of young and old, exercisers and non-exercisers, and so on.  The control group is given a placebo; their purpose is so the scientists can eliminate external factors as causes of the participants’ recovery.  If the control group gets better at a similar rate as the experimental group then it is likely that some outside factor is responsible.  Randomization for the most part ensures that the observed changes in the experimental group can be attributed to the treatment being tested and not to some other possible cause.  It minimizes the chances that there is an alternative explanation for the outcome.  In this respect RCTs have a high internal validity and hence produce evidence of a higher quality justifying their dominance in scientific enquiry and position at the top of the evidence hierarchy.  Observational studies do not involve randomization and so cannot definitely determine that there is a genuine cause that is correlated with the treatment in question and that explains the results; or rather they cannot eliminate alternative explanations for certain.  Nevertheless, observational studies can indicate causal relations, which can subsequently be tested with a RCT.  Observational studies can be internally valid and they are internally valid more times than expert opinion has proven to be.  The evidence hierarchy has been shown to be acceptable.

In her article ‘Randomized controlled trials and the flow of information: comment on Cartwright’ Roush identifies the problem of transferability.  That is how transferable the results of a study are to a population that is different from the one studied.  In the different population, procedures and conditions may not be the same as those in the study, for instance participants in a study may take their medication in a doctors office under the supervision of a nurse, whereas people in the target population take the medication at home and so may take it in a different way, with alcohol perhaps, which they would not do in the study.  As such the medication may work in a different way, not work at all, or even have adverse effects.  Whilst this is a problem for all studies it is particularly prevalent in RCTs because the manipulations necessary for randomization assure that settings and populations differ from the targeted populations and situations.  The populations of observational studies can largely be expected to be closer to target populations than those of RCTs.  La Caze sees this problem of transferability as ‘the challenge of external validity’.  The challenge of external validity is to bridge the gap between “having good reason to expect the accuracy of the observed results of an experiment and having good reason to expect these results to appropriately generalize to an individual” (La Caze p.108).  It would seem that RCTs do not answer this challenge as convincingly as observational studies calling into question the soundness of the evidence hierarchy.

RCTs do not show the influence of outside factors on the way the test treatment effects the populations or to what extent they influence.  To be more precise, the control population is designed to counteract discrepancies in the results by accounting for these outside factors, but there is no way of knowing if the treatment only works when one of these causal factors is present.  The scientists conducting the study have no way of determining what the outside cause is or that it is necessary for the treatment to work.  For example, the Californian class-size reduction program.  The evidence gained from conducting an RCT on Tennessee schools showed that reduced class sizes improved the students’ reading scores.  This evidence was used to support a policy in California to implement a statewide program of reducing class sizes.  However, the Californian students’ reading scores did not improve.  The RCT was internally valid; the results were consistent in Tennessee.  The problem was that the conditions were not the same in Tennessee schools as they were in California schools and so the cause (reduced class size) could not have its effect (higher reading scores).  It may have been that Tennessee schools have better qualified teachers or access to better books that worked in conjunction with smaller classes to bring about higher reading scores.  The RCT does not identify these additional causes because they were not directly under investigation.  It does not tell us whether the causes produce effects under particular conditions, or are conditional on certain factors.  The RCT shows us that the probability of higher reading scores (effect) on condition of small classes (cause) and good teachers (factor necessary for the cause to do its causal work (F)) is higher than the probability of higher reading scores (effect) on condition of good teachers (F).  The RCT fails to inform us if our test situation (reducing class sizes) is “sufficient for the degree of effect seen” (Roush p.141).  The presence of good teachers and access to good books are insufficient but non-redundant part of unnecessary but sufficient conditions (INUS conditions); we do not get the effect if either is missing.  An observational study has a greater chance of identifying INUS conditions.  Observing the conditions in California schools and Tennessee schools ought to bring further differences to light.  Observational studies can uncover interacting factors.  The knowledge obtained from observational studies directs what RCTs investigate.  Thus, observational studies tend to have higher external validity than RCTs.  To reiterate, observational studies have higher external validity because they are able to identify trends and potential causal factors and which variables are most likely to bring about the desired effect out of an infinite number of variables.

The finding that RCTs have a lower external validity than observational studies contradicts the evidence hierarchy, which places RCTs above observational studies in producing evidence of superior quality.  It does not seem acceptable to have evidence that supports the existence of a certain causal relation that only works in the particular test situation.  Generalization is important and is the purpose of generating knowledge from RCTs, as with any other trail.  With respect to external validity, it might be more appropriate to turn the evidence hierarchy on its head.  On the other hand, it has been shown previously that the evidence hierarchy is indicative of internal validity.  RCTs have stronger internal validity compared to observational studies, and there seems little point in having evidence we could generalize if it is full of inconsistencies and confounding variables.  When we distinguish internal and external validity regarding causal claims, like the ones investigated by RCTs and observational studies, we are forced to conclude that the evidence hierarchy listing RCTs as producing better quality evidence than observational studies holds only with respect to internal validity.  If our focus were on external validity then it would be more fitting to rearrange the evidence hierarchy so observational studies is first.

To conclude, internal validity concerns the study itself and whether the cause we are testing is a genuine causal relation.  External validity concerns whether we can extend the results of the study to individuals not involved in the trail.  The evidence hierarchy in descending order of evidence quality is RCTs, observational studies, core science or expert opinion.  An RCT has its participants randomly assigned to either an experimental group or a control group (that take a placebo instead of the test treatment); it is double blind to minimize bias.  Randomizing eliminates confounding variables; observational studies are not randomized and so may contain confounding variables.  For this reason RCTs have greater internal validity than observational studies.  The challenge of external validity is to resolve the problem of transferability.  RCTs do not identify interacting factors and INUS conditions whilst observational studies can.  RCT populations may not accurately reflect target populations, the populations in observational studies are closer to target populations and often have a setting that is closer to the situations people in the target population are likely to find themselves in when administering the medication for instance.  For these reasons observational studies have a greater external validity than RCTs.  The evidence hierarchy is reflective of evidence quality when only internal validity is considered.  When we consider external validity the evidence hierarchy looks more like:

1.Observational Studies


There does not appear to be any way of reconciling these two interpretations.  It seems to me that both RCTs and observational studies play vital roles in establishing the truth of causal claims.  Observational studies identify potential causal relations and interacting factors and RCTs confirm these claims.  One without the other leaves us with only half the picture.  Scientists should recognize the value in each method and use them in conjunction in their investigations.



•La Caze (2008) Chapter 5 ‘The Challenge of External Validity’ in Evidence-based Medicine: Evolution, Revolution or Illusion?

•Roush, S. (2009). ‘Randomized controlled trials and the flow of information: comment on Cartwright’ Philosophical Studies

•Steele, K (2009) Lectures ‘Internal Validity’ and ‘External Validity & Evidence Based Policy’.  LSE.