Search

Monday, April 22, 2019

Evaluating the Psychometric Properties of Psychological measures (Introductory)


By: Lindsay E. Ayearst and R. Michael bagby

Although diagnostic criteria are the framework for any clinical or epidemiological assessment, no assessment of clinical status is independent of the reliability and validity of the methods used to determine the presence of a diagnosis. —Regier et al. (1998, p. 114)

As noted by Regier et al. (1998), our ability to make sound judgments about individuals rests in the soundness of the measurement tools we use to make such judgments. The unfortunate truth that not all tests are created equal increases the need for test users to recognize fully the strengths and shortcomings of the various measurement tools available and to be skilled evaluators of the psychometric properties of psychological measures.

The field of psychometrics is commonly misconceived as being relatively static, with well-defined rules and methods (Bolt & Rounds, 2000). Although it is true that many of the questions remain the same (e.g., What is the test measuring and how accurately is it being measured?), the field of psychometrics is an innovative one, in which new statistical tools are crafted and new perspectives are brought to bear on the measurement process aimed at answering these questions (Bolt & Rounds, 2000).

Unfortunately, many of these innovations are reported in journals whose target audience is not the researchers and clinicians who are creating and using psychological tests, but the psychometricians who are deriving the methods. As such, one may say they are “preaching to the choir,” and in so doing, have created an accessibility issue, in which those creating and using the measures and those creating the methods operate in separate labs and report their results in separate journals.

The substantial developments in the field of psychometrics have not made their way into the standard toolkit of psychologists and, as such, contemporary test analysis has been criticized as bearing “an uncanny resemblance to the psychometric state of the art as it existed in the 1950s” (Borsboom, 2006, p. 425).

In an attempt to remedy this problem, this article provides an introduction to guidelines and recommendations for best practices in the evaluation of the psychometric properties of psychological measures compiled from a variety of sources, both current and classic.

Test elements are discussed from both a classical test theory (CTT) perspective and from the more modern item response theory (IRT) approach, providing guidelines for deciphering a test that meets minimum standards from one that does not, and how to judge whether a test does a good job of measuring the characteristics it purports to measure.

For a complete example of how to conduct an evaluation of the psychometric properties of a psychological measure, readers are directed to Quilty and Bagby’s (2007) analysis of the psychometric properties of the Minnesota Mulitphasic Personality Inventory–2 Psychopathology Five (MMPI-2 PSY-5) facet subscales.

In evaluating the psychometric properties of psychological measures, we are interested in addressing the degree to which an assessment instrument provides an accurate and precise measure of the targeted construct (Haynes, 2001). More specifically, we are interested in evaluating two related concepts that underlie psychological tests: reliability and validity (Hambleton & Pitoniak, 2002). It has become a universally accepted truth that psychological assessment measures must be reliable and valid if they are to be of any use. Although reliability and validity are related concepts, they are also distinct concepts that differ in important ways.

“Reliability” refers to the consistency of the test scores obtained from a measure across time, observers, and samples (Garb, Lilienfeld, & Fowler, 2008; Goodwin & Goodwin, 1999). In psychometric terms, reliability refers to the extent to which measurement results are precise and uninfluenced by random error (Wasserman & Bracken, 2003).

“Validity,” on the other hand, according to the Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Associations, and National Council on Measurement in Education, 1999), is defined as “the degree to which evidence and theory support the interpretations of test scores” (p. 9). In an earlier edition of the Standards, “validity” was defined as “the appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores” (American Psychological Association, 1985, p. 9). Combining the two definitions, “validity” can be defined as the accuracy of interpretations and judgments based on test scores (Garb et al., 2008).

If test scores are consistent (e.g., across raters or time), then reliability is assessed as “good,” regardless of whether validity has also been demonstrated (that two raters agree on a diagnosis for a patient says nothing about whether the diagnosis is accurate). As such, a judgment may be consistent (reliable), but not valid (both raters may agree, but both may be wrong).

Although a test can be reliable but not valid, a test cannot be valid but unreliable. Test score reliability sets an upper limit on validity, such that test validity is constrained by reliability, so that an unreliable test score is an invalid test score (Wasserman & Bracken, 2003). It is a “necessary prerequisite for validity that the test must have achieved an adequate level of reliability” (GrothMarnat, 2009, p. 16).

References

Martin m. Antony & David H. Barlow, 2010, Handbook of assessment and treatment planning for Psychological disorders, Second edition, structured and semi-structured diagnostic interviews, The Guilford Press

Read Also


No comments:

Post a Comment