By: Ian
McDowell
In the ten years since the second edition of our book was written, marked improvements have been made in the quality, sophistication, and standardization of health measurement scales. Health measurement has increasingly taken advantage of the technical advances in test construction established in the social sciences, although the application of this knowledge to health measurements remains somewhat uneven. In the 1987 edition of Measuring Health, we argued that pain scales were the most successful in exploiting more sophisticated scaling techniques, whereas physical disability measures were for the most part unsophisticated.
Over the intervening years,
measurements of physical disability have begun to catch up, and scales such
as the Functional
Autonomy Measurement System (SMAF) contains some very innovative
approaches. It is also good to be able to report a marked increase in comparisons
between measurement instruments: increasingly we now know how to map one scale
onto another,
illustrating the strengths and limitations of each. Major steps have been made
in scoring
health measures; item response theory and econometric scaling methods are becoming
common and norm-referenced
scoring is beginning to be applied.
Although econometric and psychometric approaches come
from different academic
traditions, their meeting in the health measurement field appears to be encouraging
some melding of the two.
Kaplan et al., for example, illustrated how common ground may be found between them.
All in all, things are moving, and in the right
direction. We continue to hope that the comments contained in reviews
such as this book will
encourage the fuller development of the less adequate methods. Two developments
merit especial mention: the increase in international standardization of measures, and
the development of banks
of items in place of set measurement instruments.
One of the clearest developments in recent
years has been the collaborative
efforts to develop internationally comparable instruments. There is
a growing trend away
from merely translating English-language instruments into other languages
(which was carried to extremes by instruments such as the General Health
Questionnaire or
the Functional Assessment of Cancer Therapy). Instead, the World Health
Organization (WHO)
developed the WHOQOL measure simultaneously in a wide variety of countries and
cultures, in a genuinely
democratic process of discussion and negotiation over the content of
the instrument. The
EuroQol offers a similar, although narrower, example.
In part stimulated by multinational pharmaceutical
trials that require equivalent outcome measures in several languages,
organizations such as the Mapi Research Institute in Lyon, France, have been developed.
The Mapi Institute
includes an information resource center that collects and distributes
information on health measures. It possesses a collection of questionnaires
and can assist users in finding an instrument for their needs. The
Mapi also coordinates
the translation of existing instruments into other languages; their Web site
offers a valuable source
of information on the rapidly evolving field of translation. Mapi produces a
Quality of Life Newsletter that encourages rapid dissemination and exchange of
information on health
outcome measurement (see www.mapi-research-inst.com).
Also in Europe, the Harmonization Project for Instruments
in Dementia
(EURO-HARPID) has undertaken empirical comparisons
across language versions of dementia scales and has established standard
administration
procedures. In the United States, the Medical Outcomes Trust (www.out
comes-trust.org/) was incorporated
in 1992 to promote
the science and application of outcome measures; as part of this mission, it has developed an instrument library
and has proposed quality guidelines for instrument developers.
The other innovation has been to take advantage of computers
to administer tests, not merely to administer existing
instruments, but to customize sets of items drawn from a range of measures. The
goal is to administer items that maximize the information
gained from each: successive items are selected as being optimal in providing the most
information, after considering the responses to previous items.
This customization by computerized adaptive testing
(CAT) can save time and
achieve greater efficiency of test information for a given test length
and can provide
automatic scoring, greater privacy for the respondent, and the possibility of
providing immediate
feedback. The future vision for health measurement is that item banks will
contain information on
hundreds of items, indicating their psychometric and scaling properties
and relationship to
other items. The application of item response theory has been central to the
analysis of item
characteristics across existing measurement instruments. A CAT algorithm
then selects items from
the bank so that redundant questions are not asked (if a person cannot
walk a block, there is
little reason to ask him if he can run 100 meters).
Using the logic of a screening test, sensitive (but
perhaps not specific) filter items can identify areas in which the
respondent reports
problems and these can be probed in more detail, whereas further questions
on areas in which they
report no problems can automatically be skipped. The great gain is in
efficiency: by asking only questions that are pertinent to the respondent’s
current level of health, much greater precision and discriminal ability is
achieved for any set
length of interview. To achieve this goal, we need to systematically assemble
information on item characteristics for every item in our current leading tests; they then
need to be
cross-calibrated to understand the conditional probabilities of responses to items
from different existing
tests.
An example was given by Ware et al. in
separate studies for measures of headache severity that compares items drawn from five different
measures, while Cella
has described a vision of the future of item banking in health measurement
(http://out comes.cancer.gov/conference/irt/cella_et_al.pdf).
Dynamic testing will no
doubt evolve rapidly, but an example of its potential is given on the
website
www.amIhealthy.com (accessed in late 2004).
Although we may complain about the weakness and lack of
coordinated development work in certain areas of health
measurement, it is also true that a universal perfect index can never exist.
It is quite wrong to imagine one set of questions suited to all diseases, all
individuals, and all
applications. Such an instrument would have to make so many compromises it
would probably not be suitable for any particular application.
Fundamentally different scales will be required
for policy analysis and
for individual patient evaluation; we will continue to have generic and
disease-specific scales
or item pools, health indexes and health profiles, subjective and objective
measures. Each has its place, although certain quality control procedures can be followed in
developing health indexes of any type. Given some successes and some areas of weakness, what
should be done to strengthen this field?
References
Ian McDowell, MEASURING
HEALTH, A Guide to Rating Scales and Questionnaires, third edition, 2006, Oxford
University Press
Read Also
WHODAS: Assessing Disability
Evaluating a Health Measurement: The User’s Perspective
Types of Health Measurements
Identifying and Controlling Biases in Subjective Judgments in health measurement
Evaluating a Health Measurement: The User’s Perspective
Types of Health Measurements
No comments:
Post a Comment