By: Ian
McDowell
Psychophysical experiments have shown that people can make accurate and internally consistent judgments of phenomena. This is the case, at least, with laboratory experiments concerned with lights or noises in which the person has no particular stake. Judgments about health may not be so dispassionate: in real life, people often have a personal stake in the estimation of their health. Bias refers to ratings that depart systematically from true values.
We should, however, be careful to discriminate
between two influences in the judgment process. There is the underlying
and consistent
perceptual tendency to exaggerate or underestimate stimuli described by the exponent b
of the psychophysical experiments. This
may also be applicable
to health; we know little about this as yet, although several studies have
compared subjective
responses with physical or laboratory measurements of health status.
A tendency also exists to alter response to a stimulus
across time or under different situations, and this is termed bias. One
person may exaggerate symptoms to qualify for sick leave or a pension,
whereas another may show
the opposite bias and minimize ailments in the hope of returning to
work. Subjective ratings
of health blend an estimate of the severity of the health problem with a
personal tendency to
exaggerate or conceal the problem—a bias that varies among people and
over time.
Biases in subjective measurement can arise
from the respondents’
personalities, from the way they perceive questionnaires, or from particular
circumstances of their illnesses. Illustrative examples will be given here,
rather than an exhaustive list, for the main question concerns how
to reduce the extent of
response bias. Personality traits that may bias
responses include stoicism, defensiveness, hypochondriasis, or a need for
attention. The drive to portray oneself in a good light by giving socially desirable
responses illustrates a bias that reflects social influences. Goldberg cites
the example of a person, regarded by outside observers as fanatically tidy, but who
judged himself untidy.
judged himself untidy.
These biases are unconscious, rather than a deliberate
deception, and are typically more extreme where questions concern socially
undesirable acts, such as sexual behavior or the illicit use of drugs. Several scales
have been proposed to
measure a person’s tendency to give socially desirable responses, but these
scales appear to show rather low intercorrelations. A correlation of 0.42
between the
Crowne-Marlowe and Edwards scales was reported in one study, for example. Attitudes
may also bias responses and this has long been studied.
Biases can also arise from the way people
interpret questionnaire response scales: some prefer to use the end-position on response
scales, whereas others more cautiously prefer the middle. Other biases may be
particular to the health field and reflect the anxiety that surrounds illness. One example is named
the “hellogoodbye” effect, in which the patient initially exaggerates symptoms to
justify their request for treatment. Subsequently, the person minimizes any problems that
remain, either to please the clinician or out of cognitive dissonance. Similarly, in
the rebound effect, a patient recovering from a serious illness tends to
exaggerate reported
well-being. A related bias is known as “response shift,” whereby patients
with a chronic condition may shift perception as the disease progresses—typically
they lower
expectations and thereby score better on health measures despite physical deterioration.
Two general approaches are used to deal
with bias in health
measurement.
The first bypasses the problem and argues that health care
should consider symptoms
as presented by the patient, bias and all, given that this forms a part
of the overall
complaint: consideration of “the whole patient” is a hallmark of good care. From
this viewpoTypes of Health Measurementsint, it can
be argued that the biases inherent in subjective judgments do not threaten
the validity of the
measurement process: health, or quality of life, is inherently subjective and is
as the patient perceives
it.
The second viewpoint argues that this is merely a
convenient simplification and that the interests of diagnosis and patient
management demand that health measurements should disentangle the objective
estimate from any
personal response bias. As an example, different forms of treatment are appropriate
for a person who objectively reports pain of an organic origin and for
another whose pain is exacerbated by psychological distress; several
pain scales we review
make this distinction.
Most health indexes do not disentangle subjective and
objective components in the measurement and thereby tacitly (or overtly) assume
that the
mixture of subjective and objective data is inevitable. Among the relatively
few indexes that do
try to separate these components,
We discern several different tactics.
The simplest is to try to
mask the intent of the
questions, either by giving them a misleading title, or by phrasing questions
so as to hide their
intent. This is commonly done with psychological measurement scales. For example,
the “Health Opinion Survey” has nothing to do with opinions; it is designed to
identify psychoneurotic
disorders. Several of its questions appear to refer to physical symptoms (e.g.,
upset stomach,
dizziness) but are intended as markers of psychological problems.
A second approach is to have the
questionnaire completed by someone who is familiar with the patient. Examples
may be found in ratings of social adjustment, and in ratings of mental abilities.
A third way of handling response bias is to make an explicit
assessment of the patient’s emotional response to their condition.
Examples may be found in measurements of pain.
health that one wishes to detect; the noise represents measurement error over which the signal must be detected. They then link these ideas to the purpose of the measurement, noting that for evaluative instruments, the relevant signal concerns change over time, so the signal-to-noise ratio is represented by a measure of responsiveness. For a discriminative measure, signal represents the ability to distinguish between people, so that a signal-to-noise ratio is represented by a reliability coefficient.
A fourth approach is a statistical method of
analyzing patterns of
responses that provides two scores. The first is concerned with perception and
indicates the patient’s
ability to discriminate low levels of the stimulus, a notion akin to estimating
the size of “just
noticeable” differences. The second score reflects the person’s decision
whether to report
a stimulus; under conditions of uncertainty, this reflects a personal response
bias.
This field of analysis derived from the problem of
distinguishing signals from background noise in radio and radar, where it is
called signal detection analysis. The same analysis may also be applied to
other types of decision
(e.g., the behavior of baseball players in deciding whether to swing at the
ball, or of drivers in
deciding when it is safe to merge into traffic) and is here called decision
analysis or sensory
decision theory. Where it is difficult to judge whether or not a stimulus is
present (e.g., whether a
radiograph shows a small fracture), two types of error are possible: falsely
reporting a fracture, or
missing one. Where the radiograph is unclear, the decision is influenced
by factors such as the
frequency of seeing fractures of this type, clinical conservatism, and the
relative importance of
avoiding each type of error. The analytic technique uses the notions of
“hits” and “false
alarms.” A hit occurs where a stimulus is present and I rate it as present; a false
alarm occurs where I
report a signal that is in fact absent. When it is important to detect a signal, I
may set my decision
criterion to raise the number of hits, even at the
expense of also increasing false alarms. Thus, my performance
is characterized by my trade-off of hits against false alarms, which
can be shown graphically
by the receiver operating characteristic curve (ROC), which is a plot of
the probability of
detection (hits) against the probability of false alarms.
Guyatt et al. have applied this type of thinking to health
measurements; the signal represents true differences in Signal detection theory (SDT) has been applied to analyzing
responses to health measures . For example, detecting pain
involves the
patient’s ability to perceive the painful stimulus and the tendency to describe
the feeling as “painful.”
These can both be evaluated experimentally: two types of stimulus are presented
in random
order—noise alone or noise plus low levels of signal—and the ability of an
individual to identify
the presence of a signal against the noise is recorded. Applied to pain
research, the stimulus is usually an electric shock and the “noise” is
a low level of
fluctuating current. For each trial, the respondent judges whether the shock was
present and from the
resulting pattern of true and false-positive responses, two indexes are
calculated: discriminal
ability and response bias.
Using some basic assumptions, it is possible to
estimate these two
parameters from a person’s rate of hits and false alarms; this is well described
by Hertzog. In pain research, this analysis has been used to study whether analgesics
influence pain by altering discriminability (i.e., by making the stimulus
feel less noxious), or by shifting the response bias (i.e., making the
respondent less willing
to call the feeling “painful”). Presented in the form of ROC curves, the results may
show the influence of varying rewards or penalties for making correct or
incorrect decisions. SDT analysis has also been applied in studying the effect of
age on test scores: may
declines in memory scores among old people reflect changes in approach to
taking a test (e.g., cautiousness), rather than real reductions in memory?
Although this is the original application of
ROCs, similar curves are
often drawn to summarize the validity of screening tests; this is because hits
and false alarms are equivalent to sensitivity and 1-specificity. In this application, the area
under the ROC curve indicates the discriminal ability of the instrument,
ranging from 0.5
(indicating no discrimination) to 1.0 (indicating perfect discrimination).
References
Ian McDowell,
MEASURING HEALTH, A Guide to Rating Scales and Questionnaires, third
edition, 2006, Oxford University Press
No comments:
Post a Comment