Search

Sunday, April 28, 2019

Identifying and Controlling Biases in Subjective Judgments in health measurement


By: Ian McDowell

Psychophysical experiments have shown that people can make accurate and internally consistent judgments of phenomena. This is the case, at least, with laboratory experiments concerned with lights or noises in which the person has no particular stake. Judgments about health may not be so dispassionate: in real life, people often have a personal stake in the estimation of their health. Bias refers to ratings that depart systematically from true values.

We should, however, be careful to discriminate between two influences in the judgment process. There is the underlying and consistent perceptual tendency to exaggerate or underestimate stimuli described by the exponent b of the psychophysical experiments. This may also be applicable to health; we know little about this as yet, although several studies have compared subjective responses with physical or laboratory measurements of health status.

A tendency also exists to alter response to a stimulus across time or under different situations, and this is termed bias. One person may exaggerate symptoms to qualify for sick leave or a pension, whereas another may show the opposite bias and minimize ailments in the hope of returning to work. Subjective ratings of health blend an estimate of the severity of the health problem with a personal tendency to exaggerate or conceal the problem—a bias that varies among people and over time.

Biases in subjective measurement can arise from the respondents’ personalities, from the way they perceive questionnaires, or from particular circumstances of their illnesses. Illustrative examples will be given here, rather than an exhaustive list, for the main question concerns how to reduce the extent of response bias. Personality traits that may bias responses include stoicism, defensiveness, hypochondriasis, or a need for attention. The drive to portray oneself in a good light by giving socially desirable responses illustrates a bias that reflects social influences. Goldberg cites the example of a person, regarded by outside observers as fanatically tidy, but who
judged himself untidy.

These biases are unconscious, rather than a deliberate deception, and are typically more extreme where questions concern socially undesirable acts, such as sexual behavior or the illicit use of drugs. Several scales have been proposed to measure a person’s tendency to give socially desirable responses, but these scales appear to show rather low intercorrelations. A correlation of 0.42 between the Crowne-Marlowe and Edwards scales was reported in one study, for example. Attitudes may also bias responses and this has long been studied.

Biases can also arise from the way people interpret questionnaire response scales: some prefer to use the end-position on response scales, whereas others more cautiously prefer the middle. Other biases may be particular to the health field and reflect the anxiety that surrounds illness. One example is named the “hellogoodbye” effect, in which the patient initially exaggerates symptoms to justify their request for treatment. Subsequently, the person minimizes any problems that remain, either to please the clinician or out of cognitive dissonance. Similarly, in the rebound effect, a patient recovering from a serious illness tends to exaggerate reported well-being. A related bias is known as “response shift,” whereby patients with a chronic condition may shift perception as the disease progresses—typically they lower expectations and thereby score better on health measures despite physical deterioration.

Two general approaches are used to deal with bias in health measurement.

The first bypasses the problem and argues that health care should consider symptoms as presented by the patient, bias and all, given that this forms a part of the overall complaint: consideration of “the whole patient” is a hallmark of good care. From this viewpoTypes of Health Measurementsint, it can be argued that the biases inherent in subjective judgments do not threaten the validity of the measurement process: health, or quality of life, is inherently subjective and is as the patient perceives it.

The second viewpoint argues that this is merely a convenient simplification and that the interests of diagnosis and patient management demand that health measurements should disentangle the objective estimate from any personal response bias. As an example, different forms of treatment are appropriate for a person who objectively reports pain of an organic origin and for another whose pain is exacerbated by psychological distress; several pain scales we review make this distinction.

Most health indexes do not disentangle subjective and objective components in the measurement and thereby tacitly (or overtly) assume that the mixture of subjective and objective data is inevitable. Among the relatively few indexes that do try to separate these components,

We discern several different tactics.

The simplest is to try to mask the intent of the questions, either by giving them a misleading title, or by phrasing questions so as to hide their intent. This is commonly done with psychological measurement scales. For example, the “Health Opinion Survey” has nothing to do with opinions; it is designed to identify psychoneurotic disorders. Several of its questions appear to refer to physical symptoms (e.g., upset stomach, dizziness) but are intended as markers of psychological problems.

A second approach is to have the questionnaire completed by someone who is familiar with the patient. Examples may be found in ratings of social adjustment, and in ratings of mental abilities.

A third way of handling response bias is to make an explicit assessment of the patient’s emotional response to their condition. Examples may be found in measurements of pain.

health that one wishes to detect; the noise represents measurement error over which the signal must be detected. They then link these ideas to the purpose of the measurement, noting that for evaluative instruments, the relevant signal concerns change over time, so the signal-to-noise ratio is represented by a measure of responsiveness. For a discriminative measure, signal represents the ability to distinguish between people, so that a signal-to-noise ratio is represented by a reliability coefficient.



A fourth approach is a statistical method of analyzing patterns of responses that provides two scores. The first is concerned with perception and indicates the patient’s ability to discriminate low levels of the stimulus, a notion akin to estimating the size of “just noticeable” differences. The second score reflects the person’s decision whether to report a stimulus; under conditions of uncertainty, this reflects a personal response bias.

This field of analysis derived from the problem of distinguishing signals from background noise in radio and radar, where it is called signal detection analysis. The same analysis may also be applied to other types of decision (e.g., the behavior of baseball players in deciding whether to swing at the ball, or of drivers in deciding when it is safe to merge into traffic) and is here called decision analysis or sensory decision theory. Where it is difficult to judge whether or not a stimulus is present (e.g., whether a radiograph shows a small fracture), two types of error are possible: falsely reporting a fracture, or missing one. Where the radiograph is unclear, the decision is influenced by factors such as the frequency of seeing fractures of this type, clinical conservatism, and the relative importance of avoiding each type of error. The analytic technique uses the notions of “hits” and “false alarms.” A hit occurs where a stimulus is present and I rate it as present; a false alarm occurs where I report a signal that is in fact absent. When it is important to detect a signal, I may set my decision criterion to raise the number of hits, even at the expense of also increasing false alarms. Thus, my performance is characterized by my trade-off of hits against false alarms, which can be shown graphically by the receiver operating characteristic curve (ROC), which is a plot of the probability of detection (hits) against the probability of false alarms.

Guyatt et al. have applied this type of thinking to health measurements; the signal represents true differences in Signal detection theory (SDT) has been applied to analyzing responses to health measures . For example, detecting pain involves the patient’s ability to perceive the painful stimulus and the tendency to describe the feeling as “painful.” These can both be evaluated experimentally: two types of stimulus are presented in random order—noise alone or noise plus low levels of signal—and the ability of an individual to identify the presence of a signal against the noise is recorded. Applied to pain research, the stimulus is usually an electric shock and the “noise” is a low level of fluctuating current. For each trial, the respondent judges whether the shock was present and from the resulting pattern of true and false-positive responses, two indexes are calculated: discriminal ability and response bias.

Using some basic assumptions, it is possible to estimate these two parameters from a person’s rate of hits and false alarms; this is well described by Hertzog. In pain research, this analysis has been used to study whether analgesics influence pain by altering discriminability (i.e., by making the stimulus feel less noxious), or by shifting the response bias (i.e., making the respondent less willing to call the feeling “painful”). Presented in the form of ROC curves, the results may show the influence of varying rewards or penalties for making correct or incorrect decisions. SDT analysis has also been applied in studying the effect of age on test scores: may declines in memory scores among old people reflect changes in approach to taking a test (e.g., cautiousness), rather than real reductions in memory?

Although this is the original application of ROCs, similar curves are often drawn to summarize the validity of screening tests; this is because hits and false alarms are equivalent to sensitivity and 1-specificity. In this application, the area under the ROC curve indicates the discriminal ability of the instrument, ranging from 0.5 (indicating no discrimination) to 1.0 (indicating perfect discrimination).

References

Ian McDowell,  MEASURING HEALTH, A Guide to Rating Scales and Questionnaires, third edition, 2006, Oxford University Press

Read Also



No comments:

Post a Comment