By:
Lindsay
E.
Ayearst
and R.
Michael
bagby
Although
diagnostic criteria are the framework for any clinical or epidemiological assessment,
no assessment of clinical status is independent of the reliability and validity
of the methods used to determine the presence of a diagnosis. —Regier
et al.
(1998, p. 114)
As noted by
Regier et al. (1998), our ability to make sound judgments about individuals
rests in the soundness of the measurement tools we use to make such judgments.
The unfortunate truth that not all tests are created equal increases the need
for test users to recognize fully the strengths and shortcomings of the various measurement tools available and to be skilled
evaluators of the psychometric properties of psychological measures.
The field of
psychometrics is commonly misconceived as being relatively static, with well-defined rules and methods (Bolt &
Rounds, 2000). Although it is true that many of the questions remain the same (e.g., What is
the test measuring and how accurately is it being measured?), the field of
psychometrics is an innovative one, in which new statistical tools are crafted and new
perspectives are brought to bear on the measurement process aimed at answering these
questions (Bolt & Rounds, 2000).
Unfortunately,
many of these innovations are reported in journals whose target audience is not
the researchers and clinicians who are creating and using psychological tests, but the psychometricians who are deriving the
methods. As such, one may say they are “preaching to the choir,” and in so
doing, have created an accessibility issue, in which those creating and using the measures
and those creating the methods operate in separate labs and report their results in
separate journals.
The substantial developments in the field of psychometrics have not
made their way into the standard toolkit of psychologists and, as such, contemporary
test analysis has been criticized as bearing “an uncanny
resemblance to the psychometric state of the art as it existed in the 1950s” (Borsboom, 2006,
p. 425).
In an attempt to remedy this problem, this article provides an introduction to guidelines and recommendations for best practices in the evaluation of the psychometric properties of psychological measures compiled from a variety of sources, both current and classic.
In an attempt to remedy this problem, this article provides an introduction to guidelines and recommendations for best practices in the evaluation of the psychometric properties of psychological measures compiled from a variety of sources, both current and classic.
Test elements are
discussed from both a classical test theory (CTT) perspective and from the more modern item response theory (IRT)
approach, providing guidelines for deciphering a test that meets minimum standards from
one that does not, and how to judge whether a test does a good job of measuring the
characteristics it purports to measure.
For a complete
example of how to conduct an evaluation of the psychometric properties of a
psychological measure, readers are directed to Quilty and Bagby’s (2007) analysis of the psychometric properties of the
Minnesota Mulitphasic Personality Inventory–2 Psychopathology Five (MMPI-2
PSY-5) facet subscales.
In evaluating the
psychometric properties of psychological measures, we are interested in addressing the degree to which an
assessment instrument provides an accurate and precise measure of the targeted construct
(Haynes, 2001). More specifically, we are interested in evaluating two related
concepts that underlie psychological tests: reliability and validity (Hambleton
& Pitoniak, 2002). It has become a universally accepted truth that psychological
assessment measures must be reliable and valid if they are to be of any use. Although
reliability and validity are related concepts, they are also distinct concepts that differ
in important ways.
“Reliability” refers to the consistency of the test scores obtained
from a measure across time, observers, and samples
(Garb, Lilienfeld, & Fowler, 2008; Goodwin & Goodwin, 1999). In psychometric terms,
reliability refers to the extent to which measurement results are precise and
uninfluenced by random error (Wasserman & Bracken, 2003).
“Validity,” on
the other hand, according to the Standards for Educational and Psychological
Testing (American Educational Research Association, American Psychological
Associations, and National Council on Measurement in Education, 1999), is defined as “the degree to which evidence and theory
support the interpretations of test scores” (p. 9). In an earlier edition of the
Standards, “validity” was defined as “the appropriateness, meaningfulness, and usefulness
of the specific inferences made from test scores” (American Psychological Association,
1985, p. 9). Combining the two definitions, “validity” can be defined as the accuracy of interpretations and
judgments based on test scores (Garb et al., 2008).
If test scores are consistent (e.g., across raters or time), then reliability is assessed as “good,” regardless of whether validity has also been demonstrated (that two raters agree on a diagnosis for a patient says nothing about whether the diagnosis is accurate). As such, a judgment may be consistent (reliable), but not valid (both raters may agree, but both may be wrong).
Although a test
can be reliable but not valid, a test cannot
be valid but unreliable. Test score reliability sets an upper limit on
validity, such
that test validity is constrained by reliability, so that an unreliable test
score is an
invalid test score (Wasserman & Bracken, 2003). It is a “necessary
prerequisite for
validity that the test must have achieved an adequate level of reliability”
(GrothMarnat, 2009, p. 16).
References
Martin
m. Antony & David H. Barlow, 2010, Handbook of assessment and treatment
planning for Psychological disorders,
Second
edition, structured and semi-structured diagnostic interviews, The Guilford
Press
Read Also
Potential considerations in Selection of a diagnostic interview
Brief measures for screening and measuring mental Health outcomes
Questions to consider when deciding which measurement instrument to use
Brief measures for screening and measuring mental Health outcomes
Questions to consider when deciding which measurement instrument to use
Psychometrics and tool development considerations
Barriers to the implementation of Standardized Screening and outcomes measurement
Principles and Practice of assessment in primary care settings
Barriers to the implementation of Standardized Screening and outcomes measurement
Principles and Practice of assessment in primary care settings
No comments:
Post a Comment