Search

Friday, April 26, 2019

The Current Status of Health Measurement


By: Ian McDowell

In the ten years since the second edition of our book was written, marked improvements have been made in the quality, sophistication, and standardization of health measurement scales. Health measurement has increasingly taken advantage of the technical advances in test construction established in the social sciences, although the application of this knowledge to health measurements remains somewhat uneven. In the 1987 edition of Measuring Health, we argued that pain scales were the most successful in exploiting more sophisticated scaling techniques, whereas physical disability measures were for the most part unsophisticated.

Over the intervening years, measurements of physical disability have begun to catch up, and scales such as the Functional Autonomy Measurement System (SMAF) contains some very innovative approaches. It is also good to be able to report a marked increase in comparisons between measurement instruments: increasingly we now know how to map one scale onto another, illustrating the strengths and limitations of each. Major steps have been made in scoring health measures; item response theory and econometric scaling methods are becoming common and norm-referenced scoring is beginning to be applied.

Although econometric and psychometric approaches come from different academic traditions, their meeting in the health measurement field appears to be encouraging some melding of the two. Kaplan et al., for example, illustrated how common ground may be found between them.

All in all, things are moving, and in the right direction. We continue to hope that the comments contained in reviews such as this book will encourage the fuller development of the less adequate methods. Two developments merit especial mention: the increase in international standardization of measures, and the development of banks of items in place of set measurement instruments.

One of the clearest developments in recent years has been the collaborative efforts to develop internationally comparable instruments. There is a growing trend away from merely translating English-language instruments into other languages (which was carried to extremes by instruments such as the General Health Questionnaire or the Functional Assessment of Cancer Therapy). Instead, the World Health Organization (WHO) developed the WHOQOL measure simultaneously in a wide variety of countries and cultures, in a genuinely democratic process of discussion and negotiation over the content of the instrument. The EuroQol offers a similar, although narrower, example.

In part stimulated by multinational pharmaceutical trials that require equivalent outcome measures in several languages, organizations such as the Mapi Research Institute in Lyon, France, have been developed. The Mapi Institute includes an information resource center that collects and distributes information on health measures. It possesses a collection of questionnaires and can assist users in finding an instrument for their needs. The Mapi also coordinates the translation of existing instruments into other languages; their Web site offers a valuable source of information on the rapidly evolving field of translation. Mapi produces a Quality of Life Newsletter that encourages rapid dissemination and exchange of information on health outcome measurement (see www.mapi-research-inst.com).

Also in Europe, the Harmonization Project for Instruments in Dementia (EURO-HARPID) has undertaken empirical comparisons across language versions of dementia scales and has established standard administration procedures. In the United States, the Medical Outcomes Trust (www.out comes-trust.org/) was incorporated in 1992 to promote the science and application of outcome measures; as part of this mission, it has developed an instrument library and has proposed quality guidelines for instrument developers.

The other innovation has been to take advantage of computers to administer tests, not merely to administer existing instruments, but to customize sets of items drawn from a range of measures. The goal is to administer items that maximize the information gained from each: successive items are selected as being optimal in providing the most information, after considering the responses to previous items.

This customization by computerized adaptive testing (CAT) can save time and achieve greater efficiency of test information for a given test length and can provide automatic scoring, greater privacy for the respondent, and the possibility of providing immediate feedback. The future vision for health measurement is that item banks will contain information on hundreds of items, indicating their psychometric and scaling properties and relationship to other items. The application of item response theory has been central to the analysis of item characteristics across existing measurement instruments. A CAT algorithm then selects items from the bank so that redundant questions are not asked (if a person cannot walk a block, there is little reason to ask him if he can run 100 meters).

Using the logic of a screening test, sensitive (but perhaps not specific) filter items can identify areas in which the respondent reports problems and these can be probed in more detail, whereas further questions on areas in which they report no problems can automatically be skipped. The great gain is in efficiency: by asking only questions that are pertinent to the respondent’s current level of health, much greater precision and discriminal ability is achieved for any set length of interview. To achieve this goal, we need to systematically assemble information on item characteristics for every item in our current leading tests; they then need to be cross-calibrated to understand the conditional probabilities of responses to items from different existing tests.

An example was given by Ware et al. in separate studies for measures of headache severity that compares items drawn from five different measures, while Cella has described a vision of the future of item banking in health measurement (http://out comes.cancer.gov/conference/irt/cella_et_al.pdf). Dynamic testing will no doubt evolve rapidly, but an example of its potential is given on the website www.amIhealthy.com (accessed in late 2004).

Although we may complain about the weakness and lack of coordinated development work in certain areas of health measurement, it is also true that a universal perfect index can never exist. It is quite wrong to imagine one set of questions suited to all diseases, all individuals, and all applications. Such an instrument would have to make so many compromises it would probably not be suitable for any particular application.

Fundamentally different scales will be required for policy analysis and for individual patient evaluation; we will continue to have generic and disease-specific scales or item pools, health indexes and health profiles, subjective and objective measures. Each has its place, although certain quality control procedures can be followed in developing health indexes of any type. Given some successes and some areas of weakness, what should be done to strengthen this field?

References

Ian McDowell,  MEASURING HEALTH, A Guide to Rating Scales and Questionnaires, third edition, 2006, Oxford University Press

Read Also

Identifying and Controlling Biases in Subjective Judgments in health measurement

No comments:

Post a Comment