By: Ian
McDowell
There are several ways to classify health measurements.They may be classified by their function, or the purpose or application of the method; descriptive classifications focus on their scope, whereas methodological classifications consider technical aspects, such as the techniques used to record information.
An example of a functional classification is
Bombardier and Tugwell’s
distinction between three purposes for measuring health: diagnostic,
prognostic, and evaluative.
Diagnostic indices
include measurements of blood pressure or erythrocyte sedimentation rates and are
judged for their
correspondence with a clinical diagnosis. Prognostic measures include screening
tests, scales such as the Apgar score, and measures such as those that
predict the likelihood that a patient will be able to live independently following
rehabilitation. Finally, evaluative indexes measure change in a person over time.
Kirshner and Guyatt also gave a functional classification .
In this, discriminative indexes distinguish between people,
especially when no external criterion exists, as with IQ tests.
Predictive indexes classify people according to some criterion, which may exist in
the present (hence equivalent to Bombardier’s diagnostic measures)
or in the future
(equivalent to prognostic measures). A simpler functional classification was
proposed by Kind and
Carr-Hill. Measurements monitor either health status or change in
health status, and they
may do this for individuals or for groups. Measuring the health status of
individuals is the
domain of the clinical interview; measuring change in the individual is the
purpose of a clinical
evaluation. Measuring health status in a group is the aim of a survey
instrument, while measuring group change is the domain of a health index.
Health measurements may also be classified
descriptively, according
to their scope or the range of topics they cover. A common distinction is
drawn according to the breadth of the concept being measured. These range from narrow-focus measures
that cover a particular organ system (e.g., vision, hearing); next are
scales concerned with a
diagnosis (e.g., anxiety or depression scales); then there are those that
measure broader
syndromes (e.g., emotional well-being); then come
measurements of overall health and, broadest of all, measurements of
overall quality of life.
A common distinction is that between broad-spectrum generic health
measures and specific
instruments. The latter may be specific to a disease (e.g., a quality of life
scale in cancer
patients), or to a particular type of person (e.g., women’s health measures, patient
satisfaction scales) or to an age group (e.g., child health indicators).
Specific instruments are generally designed for
clinical applications and intended to be sensitive to change after treatment.
Generic instruments are commonly applied in descriptive epidemiological studies
or health
surveys. They permit comparisons across disease categories. In addition to specific and
generic measures,
preference-based health measures may be distinguished. Whereas health
status measures, whether
generic or specific, record the presence and severity of symptoms or
disabilities,
preference-based measures record the preferences of individual patients for alternative
outcomes; this is relevant in policy analysis and in predicting demands for
care.
Preference based measures generally combine several aspects
of health into a common numerical index that allows comparisons
between different types of health programs. Drawing these categories together,
Garratt et al. divided measurements into dimension-specific measures (e.g., a depression
scale); disease or
population specific measures (e.g., an asthma quality
of life scale); generic measures that can be applied to different populations;
individualised measures that allow respondents to make their own judgments of the
importance of the
domains being assessed (as in the Patient-Generated Index), and utility measures
developed for economic evaluation that incorporate health state preferences and
provide a
single index score (e.g., the EuroQol EQ5D).
Many methodological classifications of health
measurements exist.
There is the distinction, for example, contrasting rating scales and questionnaires;
there is the
distinction between health indexes and health profiles. Cutting across these categories,
there is the more
complex distinction between subjective and objective measures. In essence, the
contrast between rating
scales and questionnaires lies in the flexibility of the measurement
process. In a rating
scale an expert, normally a clinician, assesses defined aspects of health, but
sometimes the precise
questions vary from rater to rater and from
subject to subject. An example is the Hamilton Rating Scale
for Depression: Hamilton
gave only a general outline of the types of question to ask and the clinician uses
personal judgment in
making the rating.
By contrast, in self-competed questionnaires and in interview
schedules the questions are preset, and we carefully train
interviewers not to alter the wording in any way. The debates over which approach is
better generate more heat than light; they also reveal deeper contrasts in how we
approach the measurement
of subjective phenomena. Briefly, the argument in support of structured questionnaires
holds that standardization is essential if assessments are to be compared among
individuals; this consistency is seen as a cornerstone of nomothetic science. This is
concerned primarily with
abstract constructs and the theoretical relations among them, such
as the links between dementia
and depression.
The goal of nomothetic science is to generalize,
and it is inherently
taxonomic; based on deterministic philosophy, it searches for underlying
commonalities and
downplays individual variability. The use of factor analysis to create measures
of the theoretical concept underlying a set of indicators would typify
this approach. In the fields of linguistics and translation, this
corresponds to the etic approach, in which translation is approached from outside and
seeks to derive a non-culture-specific presentation of the underlying ideas, which
are assumed to be universally applicable. A good example of this approach to
translating a
questionnaire is given by Cella et al. who sought to ensure not only semantic, but
also content, conceptual,
criterion, and technical equivalences of an instrument in English and
Spanish versions. By
contrast, the idiographic approach to measurement focuses on assessing individuals;
it particularizes and emphasizes the complexity and uniqueness of each
person being assessed.
It is inherently clinical and corresponds to qualitative research methods.
The idiographic philosophy argues that because
each person is the
unique product of a particular environment, we cannot fully understand people
through the application
of universal principles. Idiographic approaches also mirror the emic approach
to language and translation. The starting point for emics is that
language forms a unique aspect of culture, and that the goal of translation
is to review the
pertinence of an idea (here, a questionnaire item) to the target culture, seeking
a metaphor that is
equivalent. Whereas the nomothetic approach tackles “What?” questions, the
idiographic considers the “Why?” As Millon and Davis point out, the two approaches
need not be in conflict;
the success of theoretical propositions is ultimately judged based on how
well they explain
individual clinical observations, whereas idiographic assessments are
merely descriptive
unless they proceed from some theoretical base.
Applied to designing a measurement, the
nomothetic approach
assumes that a standard set of measurement dimensions or scales is relevant to
each person being measured and that scoring procedures should remain constant for
each. Thus, for example,
in measuring social support, it would not accept the idea that social
isolation might be
perfectly acceptable, even healthy, for certain people, although undesirable
for many. In reaction to
this, the idiographic approach is more flexible and allows differences in
measurement approach
from person to person.
For example, we should not assume that wording a question in
the same way for every respondent provides standardized information: we
cannot assume that the
same phrase will be interpreted identically by people of different cultural
backgrounds. What is important is to ensure that equivalent
stimuli are given to each
person, and this is the
forte of the skilled clinician who can control for differences in the use of
language in rating different
patients. Not only may symptoms of depression vary from person
to person, but the
significance of a given symptom may vary from one patient to another, so
they should not
necessarily receive the same score. This type of approach has, of course, long
been used clinically in
psychiatry; and more formal approaches to developing equivalent measurement
approaches for different subjects include the repertory grid technique.
Briefly, this classifies people’s thoughts
on two dimensions: the elements or topics they think about, and the
constructs, which define
the qualities they use to define and think about the elements. An interview
(e.g., rating a person’s subjective quality of
life) would identify the
constructs the respondents identify in thinking about quality of life,
and then rate each of
these in their current situation. This permits a more fully subjective
assessment of quality of life than is possible using a structured questionnaire.
Methods of this type have been used in quality of life measurement,
for example in the
SmithKline Beecham Quality of Life scale (33), or by Thunedborg et al.
The second methodological classification
refers to two
contrasting approaches to summarizing data collected by generic instruments.
Scores may be presented
separately, to represent the various aspects of health (e.g., physical
emotional), giving a health profile.
Alternatively, the indicators may be combined into an overall score,
termed a health
index. Supporters of the profile
approach argue that
health or quality of life is inherently multidimensional, and scores on the
different facets should be presented separately.
When replies to several contrasting themes are
added together, there
are many ways a respondent can attain an intermediate score, so these do
not provide
interpretable information. This reflects the philosophy of the Rasch
measurement model,
which holds that items to be combined should cover one dimension only,
with separate elements
presented as a profile.
Single scores may be of two kinds: a single
indicator (e.g., serum
cholesterol) or an index, which is an aggregation of separate scores into a
single number like the
Dow Jones Industrial Average or the consumer price index. Single indicators
require no particular discussion; they necessarily cover a limited part of the broader
concept of health. A
health index, however, confronts head on the issue of combining different
facets of health.
Critics argue that this mixes apples and oranges, but proponents argue that
finding connections
between dimensions is necessary in making real life decisions. A single
score is often needed to
address dilemmas such as choosing between two treatments, one of
which prolongs life but
at the cost of significant adverse effects, while the other produces shorter,
but disability-free,
survival. Index scores are commonly used in economic analyses and in
policy decision-making.
The distinction between objective and subjective measures reflects that between mechanical methods based on laboratory tests and those in which a person (e.g., clinician, patient, family member) makes a judgment that forms the indicator of health. Ratings that involve judgments are generally termed “subjective” measurements, and we use the term in this sense here. By contrast, objective measurements involve no human judgment in the collection and processing of information (although judgment may be required in its interpretation). This distinction is often not clear, however. Mortality statistics are commonly considered “objective,” although judgment may be involved assigning a code to the cause of death. Similarly, observing behaviors only constitutes an objective measure if the observations are recorded without subjective interpretation. Thus, climbing stairs may be considered an objective indicator of disability if it is observed and subjective if it is reported by the person.
The distinction between objective and subjective measures reflects that between mechanical methods based on laboratory tests and those in which a person (e.g., clinician, patient, family member) makes a judgment that forms the indicator of health. Ratings that involve judgments are generally termed “subjective” measurements, and we use the term in this sense here. By contrast, objective measurements involve no human judgment in the collection and processing of information (although judgment may be required in its interpretation). This distinction is often not clear, however. Mortality statistics are commonly considered “objective,” although judgment may be involved assigning a code to the cause of death. Similarly, observing behaviors only constitutes an objective measure if the observations are recorded without subjective interpretation. Thus, climbing stairs may be considered an objective indicator of disability if it is observed and subjective if it is reported by the person.
Note that the distinction between
“subjective” and
“objective” measurements does not refer to who makes the rating: objectivity is
not bestowed on a
measurement merely because it is made by an expert. Nor should we assume that
subjective measures are merely “soft”: in longitudinal studies, subjective self-ratings of
health are consistently
found to predict subsequent mortality as well as, or better than, physical
measures. The
questions that comprise many health measures can be worded either in terms of performance
(“I do not walk at all”: Sickness Impact Profile) or in terms of capacity (“I’m
unable to
walk at all”: Nottingham Health Profile). This distinction reflects the
contrast between objective and subjective measurement, in that performance can
be recorded objectively whereas assessments of capacity tend to be subjective.
Active debate continues between those who favor performance
wording and those who favor capacity wording. In general,
capacity wording gives
an optimistic view of health, whereas performance is conservative. Proponents
of performance wording argue that it gives a truer picture of what the person actually
does, and not what they think they might be able to do on a good
day if they try.
Proponents of capacity wording argue that
performance may be restricted by extraneous factors such as opportunities or
personal choice, so that these questions confound
health status with
environmental and other constraints and tend to give a falsely conservative
impression of health
problems.
Thus, old people with equal capacity who live in
institutional care
typically have less freedom than those in the community, so they will tend to
be rated less healthy by
performance wording than capacity. To compensate for this, the
introduction to performance questions typically stresses that responses should
focus solely on limitations
that are due to health problems. This is complex, however, because health problems
commonly interact with
other factors such as the weather, making it hard for the respondent to
figure out which factor
influenced their performance. The general consensus is that both wordings have
merit in particular applications; capacity wording more closely reflects underlying
impairments, whereas performance wording is close to a measure of handicap.
The user must be aware of the potential
distortions of each. A major contribution to enhancing the acceptance of
subjective measures came from the application of numerical scaling techniques
to health indices.
Because subjective reports of health are not inherently quantitative, some form of rating
method was required to
translate statements such as “I feel severe pain” into a form suitable for
statistical analysis.
The scaling techniques originally developed by social psychologists to assess
attitudes soon found application
in health indexes. The use of these, and later of more sophisticated rating
methods, permitted subjective health measurements to rival the quantitative
strengths of the
traditional indicators.
References
Ian McDowell,
MEASURING HEALTH, A Guide to Rating Scales and Questionnaires, third
edition, 2006, Oxford University Press
Read Also
WHODAS: Assessing Disability
Identifying and Controlling Biases in Subjective Judgments in health measurement
Identifying and Controlling Biases in Subjective Judgments in health measurement
No comments:
Post a Comment