Search

Saturday, April 27, 2019

Types of Health Measurements


By: Ian McDowell

There are several ways to classify health measurements.They may be classified by their function, or the purpose or application of the method; descriptive classifications focus on their scope, whereas methodological classifications consider technical aspects, such as the techniques used to record information.

An example of a functional classification is Bombardier and Tugwell’s distinction between three purposes for measuring health: diagnostic, prognostic, and evaluative. Diagnostic indices include measurements of blood pressure or erythrocyte sedimentation rates and are judged for their correspondence with a clinical diagnosis. Prognostic measures include screening tests, scales such as the Apgar score, and measures such as those that predict the likelihood that a patient will be able to live independently following rehabilitation. Finally, evaluative indexes measure change in a person over time.

Kirshner and Guyatt also gave a functional classification . In this, discriminative indexes distinguish between people, especially when no external criterion exists, as with IQ tests. Predictive indexes classify people according to some criterion, which may exist in the present (hence equivalent to Bombardier’s diagnostic measures) or in the future (equivalent to prognostic measures). A simpler functional classification was proposed by Kind and Carr-Hill. Measurements monitor either health status or change in health status, and they may do this for individuals or for groups. Measuring the health status of individuals is the domain of the clinical interview; measuring change in the individual is the purpose of a clinical evaluation. Measuring health status in a group is the aim of a survey instrument, while measuring group change is the domain of a health index.

Health measurements may also be classified descriptively, according to their scope or the range of topics they cover. A common distinction is drawn according to the breadth of the concept being measured. These range from narrow-focus measures that cover a particular organ system (e.g., vision, hearing); next are scales concerned with a diagnosis (e.g., anxiety or depression scales); then there are those that measure broader syndromes (e.g., emotional well-being); then come measurements of overall health and, broadest of all, measurements of overall quality of life. A common distinction is that between broad-spectrum generic health measures and specific instruments. The latter may be specific to a disease (e.g., a quality of life scale in cancer patients), or to a particular type of person (e.g., women’s health measures, patient satisfaction scales) or to an age group (e.g., child health indicators).

Specific instruments are generally designed for clinical applications and intended to be sensitive to change after treatment. Generic instruments are commonly applied in descriptive epidemiological studies or health surveys. They permit comparisons across disease categories. In addition to specific and generic measures, preference-based health measures may be distinguished. Whereas health status measures, whether generic or specific, record the presence and severity of symptoms or disabilities, preference-based measures record the preferences of individual patients for alternative outcomes; this is relevant in policy analysis and in predicting demands for care.

Preference based measures generally combine several aspects of health into a common numerical index that allows comparisons between different types of health programs. Drawing these categories together, Garratt et al. divided measurements into dimension-specific measures (e.g., a depression scale); disease or population specific measures (e.g., an asthma quality of life scale); generic measures that can be applied to different populations; individualised measures that allow respondents to make their own judgments of the importance of the domains being assessed (as in the Patient-Generated Index), and utility measures developed for economic evaluation that incorporate health state preferences and provide a single index score (e.g., the EuroQol EQ5D).

Many methodological classifications of health measurements exist. There is the distinction, for example, contrasting rating scales and questionnaires; there is the distinction between health indexes and health profiles. Cutting across these categories, there is the more complex distinction between subjective and objective measures. In essence, the contrast between rating scales and questionnaires lies in the flexibility of the measurement process. In a rating scale an expert, normally a clinician, assesses defined aspects of health, but sometimes the precise questions vary from rater to rater and from subject to subject. An example is the Hamilton Rating Scale for Depression: Hamilton gave only a general outline of the types of question to ask and the clinician uses personal judgment in making the rating.

By contrast, in self-competed questionnaires and in interview schedules the questions are preset, and we carefully train interviewers not to alter the wording in any way. The debates over which approach is better generate more heat than light; they also reveal deeper contrasts in how we approach the measurement of subjective phenomena. Briefly, the argument in support of structured questionnaires holds that standardization is essential if assessments are to be compared among individuals; this consistency is seen as a cornerstone of nomothetic science. This is concerned primarily with abstract constructs and the theoretical relations among them, such as the links between dementia and depression.

The goal of nomothetic science is to generalize, and it is inherently taxonomic; based on deterministic philosophy, it searches for underlying commonalities and downplays individual variability. The use of factor analysis to create measures of the theoretical concept underlying a set of indicators would typify this approach. In the fields of linguistics and translation, this corresponds to the etic approach, in which translation is approached from outside and seeks to derive a non-culture-specific presentation of the underlying ideas, which are assumed to be universally applicable. A good example of this approach to translating a questionnaire is given by Cella et al. who sought to ensure not only semantic, but also content, conceptual, criterion, and technical equivalences of an instrument in English and Spanish versions. By contrast, the idiographic approach to measurement focuses on assessing individuals; it particularizes and emphasizes the complexity and uniqueness of each person being assessed. It is inherently clinical and corresponds to qualitative research methods.

The idiographic philosophy argues that because each person is the unique product of a particular environment, we cannot fully understand people through the application of universal principles. Idiographic approaches also mirror the emic approach to language and translation. The starting point for emics is that language forms a unique aspect of culture, and that the goal of translation is to review the pertinence of an idea (here, a questionnaire item) to the target culture, seeking a metaphor that is equivalent. Whereas the nomothetic approach tackles “What?” questions, the idiographic considers the “Why?” As Millon and Davis point out, the two approaches need not be in conflict; the success of theoretical propositions is ultimately judged based on how well they explain individual clinical observations, whereas idiographic assessments are merely descriptive unless they proceed from some theoretical base.

Applied to designing a measurement, the nomothetic approach assumes that a standard set of measurement dimensions or scales is relevant to each person being measured and that scoring procedures should remain constant for each. Thus, for example, in measuring social support, it would not accept the idea that social isolation might be perfectly acceptable, even healthy, for certain people, although undesirable for many. In reaction to this, the idiographic approach is more flexible and allows differences in measurement approach from person to person.

For example, we should not assume that wording a question in the same way for every respondent provides standardized information: we cannot assume that the same phrase will be interpreted identically by people of different cultural backgrounds. What is important is to ensure that equivalent stimuli are given to each person, and this is the forte of the skilled clinician who can control for differences in the use of language in rating different patients. Not only may symptoms of depression vary from person to person, but the significance of a given symptom may vary from one patient to another, so they should not necessarily receive the same score. This type of approach has, of course, long been used clinically in psychiatry; and more formal approaches to developing equivalent measurement approaches for different subjects include the repertory grid technique.

Briefly, this classifies people’s thoughts on two dimensions: the elements or topics they think about, and the constructs, which define the qualities they use to define and think about the elements. An interview (e.g., rating a person’s subjective quality of life) would identify the constructs the respondents identify in thinking about quality of life, and then rate each of these in their current situation. This permits a more fully subjective assessment of quality of life than is possible using a structured questionnaire. Methods of this type have been used in quality of life measurement, for example in the SmithKline Beecham Quality of Life scale (33), or by Thunedborg et al.

The second methodological classification refers to two contrasting approaches to summarizing data collected by generic instruments. Scores may be presented separately, to represent the various aspects of health (e.g., physical emotional), giving a health profile. Alternatively, the indicators may be combined into an overall score, termed a health index. Supporters of the profile approach argue that health or quality of life is inherently multidimensional, and scores on the different facets should be presented separately.

When replies to several contrasting themes are added together, there are many ways a respondent can attain an intermediate score, so these do not provide interpretable information. This reflects the philosophy of the Rasch measurement model, which holds that items to be combined should cover one dimension only, with separate elements presented as a profile.

Single scores may be of two kinds: a single indicator (e.g., serum cholesterol) or an index, which is an aggregation of separate scores into a single number like the Dow Jones Industrial Average or the consumer price index. Single indicators require no particular discussion; they necessarily cover a limited part of the broader concept of health. A health index, however, confronts head on the issue of combining different facets of health. Critics argue that this mixes apples and oranges, but proponents argue that finding connections between dimensions is necessary in making real life decisions. A single score is often needed to address dilemmas such as choosing between two treatments, one of which prolongs life but at the cost of significant adverse effects, while the other produces shorter, but disability-free, survival. Index scores are commonly used in economic analyses and in policy decision-making.

The distinction between objective and subjective measures reflects that between mechanical methods based on laboratory tests and those in which a person (e.g., clinician, patient, family member) makes a judgment that forms the indicator of health. Ratings that involve judgments are generally termed “subjective” measurements, and we use the term in this sense here. By contrast, objective measurements involve no human judgment in the collection and processing of information (although judgment may be required in its interpretation). This distinction is often not clear, however. Mortality statistics are commonly considered “objective,” although judgment may be involved assigning a code to the cause of death. Similarly, observing behaviors only constitutes an objective measure if the observations are recorded without subjective interpretation. Thus, climbing stairs may be considered an objective indicator of disability if it is observed and subjective if it is reported by the person.

Note that the distinction between “subjective” and “objective” measurements does not refer to who makes the rating: objectivity is not bestowed on a measurement merely because it is made by an expert. Nor should we assume that subjective measures are merely “soft”: in longitudinal studies, subjective self-ratings of health are consistently found to predict subsequent mortality as well as, or better than, physical measures. The questions that comprise many health measures can be worded either in terms of performance (“I do not walk at all”: Sickness Impact Profile) or in terms of capacity (“I’m unable to walk at all”: Nottingham Health Profile). This distinction reflects the contrast between objective and subjective measurement, in that performance can be recorded objectively whereas assessments of capacity tend to be subjective.

Active debate continues between those who favor performance wording and those who favor capacity wording. In general, capacity wording gives an optimistic view of health, whereas performance is conservative. Proponents of performance wording argue that it gives a truer picture of what the person actually does, and not what they think they might be able to do on a good day if they try. Proponents of capacity wording argue that performance may be restricted by extraneous factors such as opportunities or personal choice, so that these questions confound health status with environmental and other constraints and tend to give a falsely conservative impression of health problems.

Thus, old people with equal capacity who live in institutional care typically have less freedom than those in the community, so they will tend to be rated less healthy by performance wording than capacity. To compensate for this, the introduction to performance questions typically stresses that responses should focus solely on limitations that are due to health problems. This is complex, however, because health problems commonly interact with other factors such as the weather, making it hard for the respondent to figure out which factor influenced their performance. The general consensus is that both wordings have merit in particular applications; capacity wording more closely reflects underlying impairments, whereas performance wording is close to a measure of handicap.

The user must be aware of the potential distortions of each. A major contribution to enhancing the acceptance of subjective measures came from the application of numerical scaling techniques to health indices. Because subjective reports of health are not inherently quantitative, some form of rating method was required to translate statements such as “I feel severe pain” into a form suitable for statistical analysis. The scaling techniques originally developed by social psychologists to assess attitudes soon found application in health indexes. The use of these, and later of more sophisticated rating methods, permitted subjective health measurements to rival the quantitative strengths of the traditional indicators.

References

Ian McDowell,  MEASURING HEALTH, A Guide to Rating Scales and Questionnaires, third edition, 2006, Oxford University Press

Read Also


No comments:

Post a Comment