Reliability and Validity in Qualitative Studies By Marilyn Simon and Jim Goes Discussions about reliability and validity are ubiquitous in quantitative research, but these essential elements of confidence in the research often receive less attention and scrutiny in qualitative studies.

O3 O4 This design controls for all of the seven threats to validity described in detail so far. An explanation of how this design controls for these threats is below.

History--this is controlled in that the general history events which may have contributed to the O1 and O2 effects would also produce the O3 and O4 effects. This is true only if the experiment is run in a specific manner--meaning that you may not test the treatment and control groups at different times and in vastly different settings as these differences may effect the results.

Rather, you must test simultaneously the control and experimental groups. Intrasession history must also be taken into consideration. For example if the groups truly are run simultaneously, then there must be different experimenters involved, and the differences between the experimenters may contribute to effects.

A solution to history in this case is the randomization of experimental occasions--balanced in Challenges to validity and reliability of experimenter, time of day, week and etc. Maturation and testing--these are controlled in that they are manifested equally in both treatment and control groups.

Instrumentation--this is controlled where conditions control for intrasession history, especially where fixed tests are used. However when observers or interviewers are being used, there exists a potential for problems.

If there are insufficient observers to be randomly assigned to experimental conditions, the care must be taken to keep Challenges to validity and reliability observers ignorant of the purpose of the experiment.

Regression--this is controlled by the mean differences regardless of the extremety of scores or characteristics, if the treatment and control groups are randomly assigned from the same extreme pool. If this occurs, both groups will regress similarly, regardless of treatment.

Selection--this is controlled by randomization. Mortality--this was said to be controlled in this design, however upon reading the text, it seems it may or may not be controlled for.

Unless the mortality rate is equal in treatment and control groups, it is not possible to indicate with certainty that mortality did not contribute to the experiment results. Even when even mortality actually occurs, there remains a possibility of complex interactions which may make the effects drop-out rates differ between the two groups.

Conditions between the two groups must remain similar--for example, if the treatment group must attend treatment session, then the control group must also attend sessions where either not treatment occurs, or a "placebo" treatment occurs.

However even in this there remains possibilities of threats to validity. For example, even the presence of a "placebo" may contribute to an effect similar to the treatment, the placebo treatment must be somewhat believable and therefore may end up having similar results!

The factors described so far effect internal validity. These factors could produce changes which may be interpreted as the result of the treatment. These are called main effects which have been controlled in this design giving it internal validity. However, in this design, there are threats to external validity also called interaction effects because they involve the treatment and some other variable the interaction of which cause the threat to validity.

It is important to note here that external validity or generalizability always turns out to involve extrapolation into a realm not represented in one's sample.

In contrast, internal validity are solvable within the limits of the logic of probability statistics. This means that we can control for internal validity based on probability statistics within the experiment conducted, however, external validity or generalizability can not logically occur because we can't logically extrapolate to different conditions.


Hume's truism that induction or generalization is never fully justified logically. Interaction of testing and X--because the interaction between taking a pretest and the treatment itself may effect the results of the experimental group, it is desirable to use a design which does not use a pretest.

Interaction of selection and X--although selection is controlled for by randomly assigning subjects into experimental and control groups, there remains a possibility that the effects demonstrated hold true only for that population from which the experimental and control groups were selected.

An example is a researcher trying to select schools to observe, however has been turned down by 9, and accepted by the 10th. The characteristics of the 10th school may be vastly different than the other 9, and therefore not representative of an average school. Therefore in any report, the researcher should describe the population studied as well as any populations which rejected the invitation.

Reactive arrangements--this refers to the artificiality of the experimental setting and the subject's knowledge that he is participating in an experiment.

This situation is unrepresentative of the school setting or any natural setting, and can seriously impact the experiment results.

To remediate this problem, experiments should be incorporated as variants of the regular curricula, tests should be integrated into the normal testing routine, and treatment should be delivered by regular staff with individual students.

Research should be conducted in schools in this manner--ideas for research should originate with teachers or other school personnel. The designs for this research should be worked out with someone expert at research methodology, and the research itself carried out by those who came up with the research idea.

Results should be analyzed by the expert, and then the final interpretation delivered by an intermediary. Tests of significance for this design--although this design may be developed and conducted appropriately, statistical tests of significance are not always used appropriately.

Wrong statistic in common use--many use a t-test by computing two ts, one for the pre-post difference in the experimental group and one for the pre-post difference of the control group.

If the experimental t-test is statistically significant as opposed to the control group, the treatment is said to have an effect.bine to create significant challenges for the management of organizations. Validity vs. Reliability: The Tension A perfectly reliable system is one that pro- Validity and reliability anchor down opposite ends of a spectrum that defines how systems are conceived and solutions are framed.

Challenges to validity and reliability In the based that the research question where answer by the survey show that there was a validity and reliability upon it. The challenges to gain the validity through out the entire research were less. Such mixed-methods research is more expensive than a single method approach, in terms of time, money, and energy, but improves the validity and reliability of the resulting data and strengthens causal inferences by providing the opportunity to observe data convergence or divergence in hypothesis testing.

2 and quizzes typically has higher reliability than the individual components. The positive and negative errors for individual students tend to even out over a semester.

Common Challenges Faced by EPPs •Designing observation instruments in ways that maximize construct validity and inter- •As with reliability, validity has several features and there are several ways to establish it It isn’t necessary to establish every form.

By Zafar Iqbal. Institute of Education () A Systematic review of the evidence of reliability and validity of assessment by teachers used for summative purpose.

