On the Neglect of Verisimilitude and Critical Multiplism in Laboratory Experiments on Advertising: A Scientific Realist Critique

Edward F. McQuarrie*

January 1996

NOTE: Under review at the Journal of Consumer Research 1/17/96. Tables not included (due to my still limited knowledge of HTML). Tables and anything else that's missing can be obtained upon request.

*Department of Marketing, Santa Clara University, Santa Clara, CA 95053; phone (408) 554-6960; fax (408) 554-5056; email emcquarrie@scuacc.scu.edu; home page. The author would like to acknowledge the helpful comments of Thomas Cook, Shelby Hunt, John Lynch, David Mick, Marsha Richins, and William D. Wells on earlier drafts of this manuscript, and Chis Afarian and Eric Manners for their coding efforts.

Abstract

This paper introduces verisimilitude as a new perspective on the construct validity of laboratory experiments. Verisimilitude obtains when an experiment successfully reproduces those characteristics that distinctively constitute the real world phenomenon the experiment purports to address. Critical multiplism provides a complementary focus that considers the extent to which multiple convergent procedures have been applied across the experiments conducted by a discipline over the history of investigation of some phenomenon. In turn, scientific realism provides the philosophy of science underpinning for arguments concerning the importance of verisimilitude and critical multiplism. A content analysis of 220 laboratory experiments on advertising appearing in 154 articles published in JCR, JMR, and JM between 1980 and 1994 supports the critique. The paper discusses the threats to inference posed by the absence of verisimilitude and critical multiplism, and then considers the implications for the scientific status of consumer research.


Critical scrutiny of laboratory experiments in consumer research goes back many years (Calder, Phillips and Tybout 1981, 1982, 1983; Lynch 1982, 1983; McGrath and Brinberg 1983). The terms of the debate between Calder et al. and Lynch can in turn be traced back at least to the crisis in social psychology of the early 1970s (Gergen 1978; Harre' and Secord 1972) and the enduring methodological controversies it spawned (see, e.g., Berkowitz and Donnerstein 1982; Henshel 1980; Mook 1983; Neisser 1991). With respect to those controversies this paper will stake out a middle ground, arguing that laboratory experimentation can make important contributions to theoretical understanding of advertising, but can also fail to realize its potential if verisimilitude is lacking. The specific goals are: 1) to explicate the concept of verisimilitude and argue for its importance in consumer advertising research; 2) to anchor this argument in recent debates within the philosophy of science concerning scientific realism (e.g., Greenwood 1989; Hunt 1991); and 3) to further support the argument for verisimilitude with reference to the methodological canon known as critical multiplism (Cook 1985; cf. Campbell and Fiske 1959).

VERISIMILITUDE

A laboratory experiment has verisimilitude when it successfully reproduces the distinguishing characteristics of the real phenomenon under study. Verisimilitude is a property that can be possessed to a greater or lesser degree. A substantial failure of verisimilitude would imply that the experimenter actually studied something other than the phenomenon s/he supposed was under investigation. Of course, any single laboratory experiment will typically fall short of total verisimilitude because considerations of feasibility will restrict the scope of the study so that only some of the theoretically important aspects of the real advertising phenomenon are modeled and reproduced. However, a more stringent standard can be applied to the extant population of experiments; it is at this level that the achieved verisimilitude of a discipline's efforts can be most fairly assessed.

Introduction of the new term verisimilitude is intended to highlight a specific aspect of the more fundamental idea of construct validity (Cook and Campbell 1979; Cronbach and Meehl 1955). Most discussions of construct validity in consumer research in recent years have focused on measurement models; i.e., whether some particular item, set of items, or type of response option does or does not index a theoretical construct of interest (e.g., Bagozzi 1980; Peter 1981). By contrast, verisimilitude refers to whether one's experimental procedures in fact capture, isolate, and reproduce the consumption phenomena one claims to have studied. Thus, verisimilitude is assessed by examining those portions of the methods section of an experimental paper labeled by headings such as "Stimulus" and "Procedure" rather than those labeled as "Measures." Investigation of verisimilitude implies a focus on the construct validity of cause and context as opposed to the more conventional focus on the construct validity of effects (Cook and Campbell 1979; Cook 1993). The idea of a failure of verisimilitude can thus be thought of as broadening and extending the notion of an experimental confound.

The key issue in any attempt to assess the verisimilitude of advertising experiments is whether the experimental procedures successfully reproduced, in the laboratory, the distinguishing characteristics of the advertising phenomenon as found in the world outside. To ask the question in this way is to presuppose that it is possible to fail in the effort to reproduce advertising. Note for purposes of contrast that it would be very difficult to fail to create a "message," if the experimenter's goal were only to study persuasive messages, or to fail to create "information," if the goal were only to study information processing, or to fail to create a "verbal stimulus," if the goal were only to study the response of organisms to verbal stimuli.

The more specific and concrete the real phenomenon under study, the more pertinent concerns about verisimilitude become. Because advertisements are a particular kind of persuasive message embedded in a specific constellation of social relations (buying and selling), a failure to reproduce essential aspects is quite possible. If the stimuli are not ads, or if other aspects of the task, stimuli, and context do not instantiate advertising, then to just that extent the experiment will lack verisimilitude as an advertising experiment. It may of course possess quite a bit of verisimilitude when considered from the standpoint of experimental attempts to induce compliance or puzzle-solving behavior (Sears 1986).

How can we know whether an experimental stimulus is actually an ad or really something else? This may be termed the identity problem (Greenwood 1989). Alternatively, why should this matter? Everything that is an ad is also a message, and whatever else an ad is, it is certainly information in the form of a verbal stimulus. Why not just study messages and information, since whatever is found to be true of messages and information processing will perforce be true of advertising? This uncovers a uniformity problem which is the obverse of the identity problem: specifically, which differences between phenomena are superficial and can be ignored, and which are fundamental and must be taken into account by a scientific explanation? In other words, the more uniform the domain of persuasion, the less important it is to reproduce any specific persuasion phenomenon in the course of an experimental test of persuasion theory; so long as the domain is held to be uniform, any convenient instance of a persuasive message will suffice.

These connected problems of identity and uniformity raise the question of how much importance should be placed on the particulars of phenomena when a scientific explanation is sought. If instead the common properties that unite a wide range of particular phenomena are really the key to scientific explanation, then it matters little whether experimenters reproduce "advertising" in the laboratory, so long as they reproduce the fact of a message, or information, or whatever that common property may be. But if in actuality we live in a world of "powerful particulars," in Harre' and Madden's (1975) terms, and if the goal of science is to explain phenomena with respect to the causal powers of these particulars, then it matters a great deal whether one successfully reproduces in the laboratory that distinctive particular known as advertising. For if what is reproduced in the lab is not advertising, one is no better off than the drunk looking for his keys under the lamppost -- not because he lost them there, but because the light is better there. Success in creating information or messages or verbal stimuli in the laboratory is nearly guaranteed; but a good light helps us little if what we seek lies elsewhere.

Verisimilitude and Philosophy of Science

Verisimilitude only makes sense from within the perspective known as scientific realism (Hunt 1991), and within scientific realism builds most directly on social constitution theory (Greenwood 1989). Many experimentally inclined consumer researchers hold to a different philosophy of science -- empiricism of some kind-- which is not realist at all, but heavily shaped by the idealism and skepticism of Hume (Greenwood 1989; Harre' and Madden 1975; Hunt 1991). While space does not permit a full exposition of this controversy (see in addition Suppe 1977), it is important to ground discussion of the issues of identity and uniformity in the underlying debate between philosophical schools.

Consumer researchers trained in traditional approaches to experimentation will often hold to a number of basic views: 1) There is a deep divide between theory and observation; 2) theoretical terms are abstract universals employed in the delineation of general laws, of which E = MC2 is an exemplar; 3) causality is essentially unknowable -- per Hume, all we have is a flux of events (Harre' and Madden 1975); 4) from observation of regularities within this flux of events, we can glean the material for general laws; and 5) as a rule it will be desirable to decompose these events into their smallest and most basic elements, for this is where laws of the greatest generality are likely to be found. Given this constellation of views, especially the unobservable and highly abstract nature of theoretical terms and the atomistic nature of phenomena, it is easy to see how the particulars of what happens in the laboratory might be considered of secondary importance. From this perspective the core focus of science is on abstract ideas, which when combined with other such ideas into a nomological net generate predictions about observations which experimenters attempt to falsify (Popper 1959). Because a scientific theory applies universally, any message, any assemblage of information, any observation within the theory's domain suffices to provide an opportunity for falsification.

Scientific realists tend to hold rather different views: 1) Theoretical entities refer to real things; 2) those real things -- powerful particulars -- act by natural necessity to cause events to occur; 3) causal explanations, and not lawful regularities, are the proper focus of science (Greenwood 1989); 4) science advances by intervening to make things happen and then developing a parsimonious explanation for what happened (Hacking 1983); and 5) it is the long run success of science as evidenced by changes in the world, and not any formal or logical property of scientific statements (i.e., statement in the form of a general law) that warrants acceptance of scientific theories (Hunt 1991). The implications of these views for experimentation are also clear: one has to get the particulars right. The purpose of experimentation is to construct a closed system in which a cause and effect relationship can be unambiguously identified, and as part of this effort one must reproduce within that closed system that powerful particular that is believed to function as a cause. Moreover, the proper level of aggregation--whether the power of the particular resides at the level of basic element or complex whole--is wholly contingent and has to be decided on a case-by-case basis.

Social constitution theory is the result of applying scientific realism within the domain of social psychological phenomena broadly conceived. Greenwood (1989) delineates the general requirements for the successful reproduction of a powerful social psychological particular within the laboratory in terms of representations and social relations, each of which carries a specific technical meaning. Thus, "to represent" does not mean "to consciously recognize or correctly classify." Better, albeit inexact synonyms for representation would be "the schema the subject applies", or "how the subject construes the experimental situation." Sternthal, Tybout and Calder (1987, p. 118) give an example of the power of representations in laboratory experiments when they discuss how a no-treatment control group can fail its purpose because "in the absence of an intervention, subjects may administer their own treatment"--in other words, construct a representation of the situation to which they then respond. No conscious recognition of the representations being made need be assumed in either Greenwood's account or Sternthal et al.'s example. However, Greenwood would go further and argue that experimental treatments are always self-administered in an important sense. It is the treatment as represented by the subject, and not the treatment as defined by the experimenter, that constitutes the treatment actually implemented. To continue, "social relations" refers not so much to actual interactions with other people, as to the social and cultural meanings attached to situations. These may be thought of as collectively held representations concerning repetitively encountered situations--a set of situational schema.

From the standpoint of social constitution theory, we do not have an experiment on advertising unless participants represent the stimuli as ads, and this in turn generally requires a context of relations that supports that representation. In simpler terms, to have a valid social psychological experiment we must control as much as possible the meanings of the experimental events as represented by the participants in the experiment; and, we are unlikely effectively to control those meanings without the aid of the kind of social relations that undergird those meanings outside the laboratory. To recapitulate the identity problem: the phenomenon isolated in the laboratory may not be the phenomenon we set out to study, if, in our attempt to isolate this phenomenon, a neglect of representations leads to its alteration, contamination, or even replacement by other phenomena.

What then identifies an element of the experimental apparatus as an advertisement? Everything hinges on how this question is answered, so let us consider an obviously incorrect reply that might be termed the Humpty Dumpty fallacy: "An ad is anything an experimenter says is an ad." This cannot be right, because some things in this world are ads and some are not; advertising is a very concrete and specific sort of persuasion phenomenon. Now compare this next statement, obtained by substituting 'stimulus' for 'ad': "a stimulus is anything an experimenter says is a stimulus." This can be made false, but only by a stretch; it is most of the time true. Here again verisimilitude may not be a problem for scientists who conceive their task as the study of the responses of organisms to stimuli; for wherever they are and whatever they are doing, organisms are encountering stimuli to which they respond.

Next, consider yet another substitution, applied now to the preceding statement about organisms, stimuli, and responses: "people are always acting as consumers processing information from their environment so as to learn about brands in preparation for consumption behavior"--hence, whenever we present advocacy information from a brand sponsor we automatically have a valid instance of advertising. Is this a true statement? A social constitution theorist must answer no. A stimulus is only an ad if the person who encounters it represents it to be an ad. If the subject in an experiment instead represents the situation (whether consciously or not) as "one more stupid task some professor wants me to comply with," or "what an interesting puzzle," or "how can I play this game without embarrassing myself," then there is no ad, for that subject. Advertising is a socially constituted phenomenon.

Here we must navigate between a shoal and a deep abyss. Consider the truth value of this next statement: "anything a participant in an experiment represents as an ad, is an ad." This may be true but is certainly disquieting. Taken in isolation, it suggests a thoroughgoing subjectivity incompatible with the objective search for truth that motivates the conduct of experiments in the first place (Hunt 1994). A great increase in comfort results if we add a qualifier, to wit: "some stimuli are more likely to be represented by subjects as ads because representations are acquired within a context of social relations, and these social relations include heuristics for determining what kind of stimuli ought to be represented as an ad." Hence, the probability that subjects will in fact represent some element of the experiment as an ad can be estimated by examining the stimulus array, response task, context, and other factors not tied up with the subjectivity of each individual participant in the experiment. In other words, representations are constrained by factors outside the person; specifically, representations are socially situated, partly learned, and not arbitrary products of whim and caprice.

We can ask a person in an experiment to represent something as an ad and themselves as consumers, but whether s/he will succeed in making these representations is not entirely up to him or her (or us). This is where the need to assess verisimilitude arises. By examining stimulus materials and experimental procedures, we can come to a judgment of the likelihood that the participants in that experiment did or did not make the kinds of representations they make when in reality they are acting as consumers within an advertising encounter; or whether indeed they made some other representations, perhaps more akin to those they make when acting as students completing required University assignments. Of course, if what consumers in the world do is fundamentally the same as what students in the classroom do, or if the ideas that explain classroom behavior apply universally so that they explain consumer behavior as well, then there is no verisimilitude problem for consumer advertising research.

Clearly, every behavioral context differs to some degree from every other (Calder et al. 1983), but superficial differences should not concern us as scientists (Berkowitz and Donnerstein 1982). Mundane realism is in any case impossible to achieve (Aaronson, Brewster and Carlsmith 1985). Hence, based on an underlying commitment to abstract universals as the core of science, the empiricist consumer researcher is inclined to assume that differences in subjects, stimuli, tasks, and contexts are superficial and that the phenomena of interest will be uniform throughout a broad domain. The tacit model here, as for many psychologists throughout this century, is the physical sciences, in particular classical mechanics. For example, although billiard balls will move very differently when placed on an asphalt driveway rather than a pool table, the fundamental laws governing what happens when a cue ball strikes a rack, stated in terms of abstract universals such as friction, mass, and velocity, are not different in the two cases. So also, it is held, with the behavior of adult consumers choosing to read a magazine at home after work, as compared to student subjects following instructions to read a booklet in a classroom. Both cases can be subsumed under "information processing" or "message reception," and the same fundamental laws will apply uniformly.

By contrast, the scientific realist is entirely agnostic about whether differences in subjects, stimuli, tasks, or contexts will be consequential. It all depends on what particular happens to possess the causal power in the domain under study. There is no rule that says the smallest element or widespread common property will most often be the causal agent, nor any rule that states the holistic context or surrounding field will be the causal agent. The whole point of experimentation is to find out. The social constitution theorist goes a bit further, and is deeply suspicious of any dismissal of an obvious disparity in social relations. In direct opposition to the empiricist, the social constitutionist has a strong distaste for contentions that people behave uniformly whoever they are or wherever they are, and wants always to know what specific representations were made in the circumstances at hand by the individuals under study.

With these contrasting philosophical positions laid out, we can inquire more deeply into the justification for pursuing experimental verisimilitude. More pointedly, because critical discussion of laboratory experimentation has a long and tangled history, the risk is very real that verisimilitude will be misunderstood as simply a recapitulation of stale arguments concerning external validity, representative sampling, the desirability of field experiments, the value of naturalistic observations, and sundry other distractions from the central point of this paper, which concerns the construct validity of experimental procedures. The distinctive features of the present critique may come more clearly into view if we revisit an earlier debate between Calder et al. (1981, 1982, 1983) and Lynch (1982, 1983), wherein Calder et al. propounded a vigorous defense of both the use of student subjects and the use of artificial stimuli, tasks, and contexts that bear little resemblance to the real phenomena of advertising and consumption.

THE ARTIFICIALITY THESIS

Defense of Artificiality in Laboratory Experiments

The basic premise underlying the Artificiality Thesis rests on a distinction between theory research and effects or applied research (Calder et al. 1981; Mook 1983). A falsificationist position provides the underlying justification for the special prerogatives accorded to theory research (Calder et al. 1981; Hunt 1991; Popper 1959). That is, a scientific theory consists of statements that are in principle falsifiable. Because theories are universal in their application (Calder et al. 1981), it follows that any population, stimulus or task within their domain can be employed in falsification procedures. A failure to find the relationship postulated by the theory in the case of students processing unrealistic stimuli in artificial contexts still amounts to a falsification, and every such falsification allows for advances in the scientific study of consumer behavior.

From this perspective artificial contexts are not only adequate for but actually preferred in tests of theory (Calder et al. 1982). Thus, the more homogenous the sample, the lower will be the error variance and the better the statistical conclusion validity. Hence, student samples are attractive precisely because of their supposed high degree of homogeneity (Sternthal, Tybout and Calder 1994). Similarly, the more artificial the environment, the fewer the extraneous sources of variance, and the more artificial the stimulus or task, the greater the probability that these elements will instance exactly the theoretical constructs at issue. Because rigor is a function of control, and because control is often greater in artificial environments, it follows that laboratory experiments are the preferred vehicle for theory tests (Mook 1983).

The strong Artificialist rejects most criticisms of laboratory experiments as misunderstandings that rest on an inappropriate concern with external validity. To the Artificialist, external validity is irrelevant to experimental design for theory research; the point of such experiments is not to predict what will happen in some part of the world--that is the task of applied research--but to provide the most rigorous opportunity for unambiguous falsification that one can arrange (for an opposing view, see Wells 1993). In the broadest sense, Artificialists hold that it is not the measurement values or data relations found in laboratory experiments that generalize, but the theory itself (Mook 1983). If the theory has survived repeated rigorous attempts at falsification, then in view of its universal nature the relationships it postulates can be expected to have general applicability.

Calder et al. (1982, 1983) specifically reject a position they attribute to Lynch (1982), whereby a laboratory experimenter should attempt to identify and include background factors that might interact with theoretical variables in order to improve the external validity of the experiment by identifying the boundaries or the robustness of the theoretical prediction. Calder et al. find such a position to be either incoherent or a "counsel of despair." The number of background factors that could potentially interact with theoretical variables is unlimited. If the rejoinder is that one picks only those background variables that one has theoretical reason to believe might be important, Calder et al. retort that in that case the "background" variable is really a theoretical variable, and their position stands. Absent a theory concerning the effect of some "background" variable, it is best to ignore it. In fact, as soon as one introduces the idea of generalization from the laboratory to a real world setting, or considers how to assess the robustness of a theory, one has abandoned the enterprise of theory testing--and designing good theory tests is the only concern of the Artificialist.

Critique of the Artificiality Thesis

Advocates of verisimilitude share a belief in the superiority of laboratory experiments for testing causal explanations (see, e.g., Greenwood 1989). Moreover, advocates of verisimilitude have no difficulty with artificiality per se, inasmuch as every laboratory experiment must be artificial in part (Aaronson et al. 1985). The key issue is simply whether, in creating an artificial environment in pursuit of experimental closure, one has abstracted essentials or instead substituted a different phenomenon altogether. By analogy, experiments using a purified sample of iron isolated within a test tube will generally be much more revealing than experiments restricted to using iron ore exactly as found on the ground; but experiments where copper has been unwittingly substituted generally won't reveal much about iron at all.

Rather, the fundamental weakness that undermines the Artificiality Thesis lies in the claim that theories are universal in their domain. Lynch (1983) made a telling point when he pointed to the history of research on memory, where for many years the only stimuli used were lists of nonsense syllables. Such stimuli would appear to be ideal from the standpoint of theory research, inasmuch as they correspond to the desiderata set forth by Calder et al., being homogenous, free of extraneous sources of variation, etc. And by the logic outlined earlier, a universal theory of memory has perforce to apply to nonsense syllables and can be considered falsified if its predictions fail to be supported when tested using nonsense syllables. Lynch's point was that a whole series of breakthroughs in memory theory only became possible when meaningful stimuli were admitted to the laboratory. In our terms, the memory domain proved not to be uniform.

The artificiality of laboratory experiments is thus beside the point; it is the assumption of uniformity that exerts a corrupting influence. For the scientific realist, all that matters is whether the powerful particular was successfully reproduced under conditions of experimental closure (i.e., conditions that maximize the ability to eliminate rival explanations of results). If the powerful particular happens to be a basic element, then so long as that element is reproduced it matters little if it has been stripped out of its natural context; the artificial context of the laboratory doesn't matter because context doesn't matter, in this case. If, on the other hand, the powerful particular resides at the contextual level--if it is only when rhymes are embedded in meaningful text that they cause superior recall, as recently found by Rubin and Wallace (1989)--then that context must be reproduced in the laboratory. The mistake that bedeviled research on rhyme recall for decades was not the use of an artificial setting, but the gratuitous assumption that mechanisms underlying recall for rhymes would function uniformly across both meaningful and nonsense rhymes. Similarly, what disturbs the social constitution theorist is not the artificiality of the laboratory experiment, but the assumption that representations and social relations need not be considered in the design of such experiments. With regard to the earlier discussion of background vs. theoretical variables, the social constitutionist holds to the general theory that participant representations and experienced social relations will typically be powerful particulars in any social psychological experiment.

CRITICAL MULTIPLISM

The falsificationist argument in the particular form pressed by the Artificialist thus rests crucially on assumptions about the uniformity of behavioral domains. As Ulric Neisser put it, "They obviously believe, as almost everyone once did, that psychology should seek universal context-free laws like those of the classical sciences" (Neisser 1991, p. 35). With the problematic nature of uniformity assumptions made salient, the perspective of critical multiplism can be introduced to supplement the verisimilitude critique (McGrath and Brinberg 1983; Cook 1985).

No one today could reasonably defend the use of a single paper-and-pencil item as a superior approach to measuring a theoretical construct. Ever since Campbell and Fiske (1959) the desirability of multiple measures has been a given. We accept that a single measurement is likely to represent to some degree the construct under study, to some degree random error, and to some degree systematic error consequent to response format, word choice, assimilation to other constructs, etc. Unfortunately, as one moves from the "Measures" section of an experimental paper to the "Subjects" and "Procedures" sections the commitment to multiplism disappears (O'Grady 1982). One task, one population, and one context are now regarded as perfectly adequate for a rigorous theory test. When in turn most laboratory experiments in an area rely on the same singular procedural choices, then from the perspective of critical multiplism it becomes increasingly probable that research in the field will achieve only a local optimum, resulting in scientific stagnation. Because the evidentiary base stays narrow and restricted, we fall short of the "ingenious and severe attempts at refutation" called for by Popper. The theories at issue actually become sheltered from falsification rather than exposed, as investigators unknowingly optimize constructs and mechanisms for the systematic variance specific to the stimuli (e.g., nonsense syllables) that become enshrined in "narrow paradigmatic conventions," to use Lynch's (1983) phrase.

Apropos of Neisser's remark, an historical explanation can be given for the tendency toward narrow paradigmatic conventions in laboratory research. A long critical tradition has diagnosed the recurrence, in psychological research, of inappropriate extrapolation from, and even misunderstanding of practices in the physical sciences (Campbell 1984; Cattell 1988; Danziger 1990; Toulmin and Leary 1986). The very tempting analogy runs as follows: just as iron ore first smelted and then isolated in a test tube becomes far more amenable to tests of chemical theory, so also 20 year old humans--screened for high levels of intelligence, separated from their families, having little in the way of household expenditures and few financial responsibilities, isolated in a white walled room, shown typed pages stripped of context containing strange names, and exhorted to think carefully--should similarly represent purified instances of "consumers" and "advertising." Now without a doubt iron isolated in a test tube is more purely iron than the ore found on the ground. The fateful extrapolation was to assume that the student holding a booklet in the classroom was a more pure and less contaminated instance of advertising. Unfortunately overlooked during the pursuit of purification, because of the compelling analogy to the physical sciences, has been the opposing risk of impoverishment--narrow paradigmatic conventions that neglect the role of representations and social relations.

From the perspective of critical multiplism the student, the black and white printed page, and the high involvement setting are not purified exemplars but simply fallible single-item measures of the reality we seek to comprehend. The risk with carelessly designed social psychological experiments (and it is a risk, not an inevitability) is that what the experimenter achieves is not test tube isolation but simply another kind of raw ore of a different composition embedded in another sort of matrix (Danziger 1990). The dilemma for consumer researchers and other social psychologists is sharp: isolation of some kind is crucial to effective causal explanation because it facilitates the elimination of plausible rival explanations. In response, the critical multiplist suggests that if isolation cannot be achieved at the micro level, in terms of a single "purified" laboratory setting, then one should seek (logical) isolation at the macro level (Kruglanski and Kroy 1975). That is, one should proceed to create multiple distinct and deliberately chosen tasks, stimuli, and contexts, all contaminated--but in different respects--which in combination allow one to derive a unique explanation for the causal particular at work. Support for such an endeavor comes from recent work by Brinberg, Lynch and Sawyer (1992), which provides a formal statement of how the findings of multiple confounded experiments can be synthesized.

In one sense the critical multiplist takes an argument deployed in the Sternthal, Tybout and Calder (1987) paper and turns it against the Artificialist position laid out in the earlier Calder et al. (1981, 1982, 1983) papers, diagnosing the earlier papers as subject in part to the fallacies associated with a confirmatory approach to theory-testing. Under the comparative approach advanced in Sternthal et al. (1987), manipulation checks, confound checks and similar experimental accoutrements are strictly speaking unnecessary to a good theory test. All that matters is that data be gathered that permit a unique explanation. Good theory tests are those that allow one theory to emerge as the best available explanation, while poor tests are those that admit of several viable alternative explanations. By extension, thinks the critical multiplist, laboratory experiments in general, and even more so the highly constrained sorts that now populate the literature, cannot be a necessary feature of theory testing. Artificial settings will sometimes be useful because they can be constructed so as to minimize the threat from certain kinds of rival explanations. But when only one sort of artificial setting is used in a discipline, and when it is confounded by a lack of verisimilitude, then one has failed to achieve the overarching goal in theory testing, which is to collect data so as to eliminate or minimize rival explanations. Hence, highly artificial laboratory experiments in the context of theory testing can be equated to manipulation checks within experimental design: they have no privileged place but can be useful. The problem begins, as critical multiplism reminds us, when such experiments predominate to the exclusion of others.

The comparative approach to theory testing of Sternthal et al. shows the limits of any confirmatory approach that would valorize specific procedures such as manipulation checks (or artificial laboratory experiments, in the present analysis), arguing instead that the only general desideratum is rigor--the capacity to eliminate rival explanations. Now rigor in theory tests simply means that all rival explanations except one are rendered untenable, given the data available. Critical multiplism in turn shows the limits of the comparative approach: it is only as rigorous as the data are multiplist. If the datastream within a discipline is systematically censored then "rigorous" theory tests cannot prevent stagnation in the form of a local optimum. Rigor is a means to an end--to provide an "explanation for phenomena," as Sternthal et al. (1987, p. 115) put it--and critical multiplism puts the focus back on the phenomena by highlighting the diversity therein and examining how much of that diversity has come into the laboratory.

The Artificialist Thesis depends on the proposed unique conjunction of highly artificial laboratory settings with crucially revealing tests of theory. The destruction of this conjunction clears the way for an assessment of verisimilitude. Once it is grasped that laboratory experiments are simply more grist for the theory comparison mill, and not a transparent window onto theoretical entities (as follows from Sternthal et al. 1987, p. 118), one can get on with the task of designing the best possible experiments. One has to explode the myth of laboratory purification before one can intelligently design experiments whose treatments and conditions, albeit all irredeemably contaminated, nonetheless in combination allow for a unique explanation, because the contaminants are multiply diverse.

Had experimental methodology in consumer advertising research evolved to be diverse and multi-faceted, then worries about verisimilitude might only be hypothetical. If consumer researchers routinely followed a strategy of triangulation or critical multiplism, so that sometimes experiments used students in a classroom, and at other times adults invited to an office building; and sometimes used immediate attitude ratings and at other times delayed measures of behavior; and sometimes used fictitious brands and at other times familiar real brands; then, we could empirically determine over time which differences in subjects, stimuli, tasks, and contexts were superficial and which fundamental, and which possible violations of verisimilitude were trivial and which serious. Note, however, that verisimilitude is not the same concept as critical multiplism. Even if we were to begin to conduct advertising experiments in which adults were invited to an office building, we would still have to ask about representations and social relations--else, instead of reproducing "consumers" and "advertising" in working with these adults, we may only produce citizens dutifully completing bureaucratic paperwork, or temporary employees complying with the demands of their employers.

In sum, critical multiplism invokes construct validity via a consideration of subjects, stimuli, tasks, and contexts as measures subject to the Campbell and Fiske (1959) dicta. Hence, the problem with a discipline--consumer research--in which laboratory experiments fixate on artificial stimuli, fictitious brands, immediate attitude measures and high levels of attention is the same problem as when a single agree-disagree item is used to measure a complex construct. In turn, verisimilitude points to one of the most important confounds to which "single item" artificial settings etc. have generally been prone in consumer experimentation: a neglect of participant representations and social relations.

HOW VERISIMILITUDE DIFFERS FROM EXTERNAL VALIDITY

Artificialists will want to rebut the argument of this paper by claiming that verisimilitude is just external validity in another guise. Hence, a more explicit discussion of the contrast between external validity and verisimilitude--understood as an aspect of construct validity--is required. First, however, it has to be understood that in the original Campbell and Stanley (1966) treatment, and once again in Campbell (1986), construct validity and external validity are lumped together, so that their clean separation only appears in Cook and Campbell (1979; see also Cook 1991, 1993). As a further complication, note that while Cronbach (1982) uses the terms 'internal' and 'external' validity in quite different and sometimes opposite ways from the Campbellian tradition, he also does not want to separate external validity from construct validity. Last but not least, Campbell's (1986) most recent discussion of these matters "suggests growing dissatisfaction about the clarity or content of his validity types. However worthy the underlying ideas, the terms themselves cannot be considered as unproblematic and as sacrosanct as they once were" (Cook and Shadish 1994, p. 552; cf. Cook 1991).

The following account perforce assumes an Artificialist who subscribes to the restricted sense of external validity given in Cook and Campbell (1979); this appears to accurately characterize the position of Calder et al. (1981, 1982, 1983). The question heading this section cannot even be asked, much less answered, if one adheres to the position of Cronbach or that of the early and late Campbell, for whom verisimilitude, qua construct validity, is part of external validity. The gist of the argument presented below is that the concept of construct validity, while closely entangled with that of external validity, can be effectively and usefully distinguished. It emerges first that the presence of construct validity guarantees a minimum degree of external validity; second, that greater construct validity promotes on average greater external validity; but third, that the presence of adequate construct validity says nothing about the degree of external validity possessed by an experiment in absolute terms, or whether external validity will obtain for some particular case in the world outside the experiment. More crucially, it also emerges that construct validity cannot be separated from internal validity when theory-testing is the goal of the experiment. Overall, the argument proceeds on the assumption that verisimilitude is just a special case of construct validity, so that a successful distinction of construct validity from external validity will automatically apply to verisimilitude as well. On the one hand, it is more powerful to proceed at this general level, while on the other, this more general approach will allow critical multiplism to be integrated into the argument at its conclusion.

Differentiation of Construct Validity from External Validity

Within the Cook and Campbell account a convenient way of distinguishing construct validity from external validity is to conceive of each as involving a kind of generalization, in contrast to internal validity and statistical conclusion validity, which concern more the reliability, certainty or trustworthiness of some specific finding of causal influence. Specifically, to have construct validity we must be able to generalize from some particular operation to the underlying theoretical construct of interest. To have in addition external validity, we have to be able to generalize from the particular operation studied across other operations, with the understanding that these other operations may: 1) incorporate a different set of background (i.e., non-theoretical) variables in addition to the theoretical construct shared with the initial operation; or 2) index only a portion of the theoretical construct studied, or reflect in addition other theoretical constructs linked to the focal construct; or even 3) reflect other, separate theoretical constructs altogether.

Now if internal validity has been secured, this means that we accept as true that "treatment (t) causes observation (o)." If construct validity also obtains, we may further accept that t, the treatment operation, effectively indexes T, the theoretical construct of interest. When external validity is also claimed, then we are asked to accept one or more of the following:

1) t1 will cause o, t2 will cause o, ... tk will cause o, where these treatment variants are also held to index the construct T, varying only in the constellation of background factors present;

2) talpha will cause o, tbeta will cause o, ... tkappa will cause o, where these treatment variants either index only a portion of the construct T, or simultaneously index an associated theoretical construct S;

3) x1 will cause o, x2 will cause o, ... xk will cause o, where x indexes a different theoretical construct altogether.

The first of these senses of external validity can be thought of as generalizability in the simple sense that the experimental finding is not some kind of curiosity or strictly local phenomenon, while the second and third senses can be thought of as generalizability in the more complex sense of robustness (McGrath and Brinberg 1983).

In point of fact construct validity is nearly impossible to separate from external validity in the first sense (which may be why Campbell and Cronbach do not wish to separate them). Theories are supposed to be general (a more modest word than universal). If a theory only explains one entirely local and exact operation--if the test of it has no external validity at all in the first sense--how good a theory can it be? Similarly, if there is no external validity in this first sense, it is unclear how the treatment operation can even be said to index a theoretical construct, as opposed to simply indexing itself. Conversely, if a treatment operation does successfully index a construct, then a finding of causal efficacy for that particular treatment warrants the inference that any other operation that also indexes the theoretical construct will also be causally efficacious (i.e., some minimal amount of external validity must obtain). In short, it appears that if we are to successfully distinguish construct validity from external validity, we must focus on the second and third meanings given above.

Even when robustness is the focus, it is still easy to commingle notions of construct validity and external validity, inasmuch as a failure of construct validity will almost always entail lower levels of, or spottier performance in terms of, external validity, as compared to a more construct-valid experiment. Thus, when there has been a failure of construct validity, the finding produced in the laboratory is not really "t (T) causes o," but only "talpha causes o," or even "xn causes o." Generality is lost when poor indicators of a construct--operations that are overly contaminated or too distantly related--are used. There is no reason to expect the same degree of generality or robustness from tests of talpha operations or x operations as for test of treatments that more effectively index the construct T--unless, of course, there is a high degree of uniformity across the domains of T, T-associated constructs S, and other-than-T constructs X. If the phenomenon we are studying is "stimulus-response linkages" then, perhaps, anything goes; else we must judge the fitness of the treatment operations chosen to serve as indicators of the construct under study.

Now when construct validity fails to obtain it is generally because of either "construct underrepresentation" or "surplus construct irrelevancies" (Cook and Campbell 1979, p. 64). Either our treatment operation did not capture enough of the theoretical construct, or it captured too many other things as well. This would include both laboratory operations that are not quite ads or advertising, along with operations that are not just advertising, but incorporate other constructs as well. We may also conceive of a third source of construct invalidity, which might be termed construct misrepresentation, as when classroom behavior is referred to as 'advertising,' thus conflating the phenomena of education and persuasion.

Construct under-representation probably has its most baneful effect when the experiment produces some null results; this may lead to Type II errors as the theoretical construct, which was only fragmentarily instantiated, becomes unjustly associated with a negative test result. For instance, if any of the theoretically interesting stimulus characteristics investigated in advertising research require nonfocal attention, repetition over time, and delay in measurement in order to produce their effects, then theories about such stimulus characteristics would probably be "falsified" by experiments that fail to instantiate such features. Surplus construct irrelevancies will also mislead as results that were intended to be general over some range t1 to tk*, and talpha to tkappa*, are in fact only general over that particular subset of operations that also happen to index the quite distinct theoretical construct S (say, tgamma and tepsilon), or that also happen to incorporate a certain background variable (say, t5 and t7 only). For example, causal explanations that are claimed to apply to advertising generally may in fact only apply to those cases where a message recipient also both really focuses on a message, and needs to make only the sort of minimum commitment that checking a box on a paper form entails. Finally, construct misrepresentation makes it impossible to accumulate knowledge at all, because it is no longer clear whether any particular study is addressing the same phenomenon as any other study.

Implications for Theory-Testing

The Artificialist is dedicated to the enterprise of theory testing. For that specific kind of research, the analysis just given suggests that construct validity and internal validity are both inseparable and exactly equal in importance. More precisely, it does no good to establish true knowledge of causal operations (internal validity) if these causal operations are misspecified with respect to the theory under test (construct validity). The insufficiency of internal validity in fashioning a good theory test is a consequence of the completely atheoretical character of internal validity as conceptualized by Campbell (a point emphasized in both Campbell 1986 and Cronbach 1982). We can of course imagine kinds of basic research other than theory testing where internal validity would take precedence over construct validity (cf. Cook and Campbell 1979, p. 83), but this doesn't help the Artificialist. Theory tests must label treatment operations, and in the act of labeling such tests compel questions about construct validity. To close the loop: verisimilitude becomes a concern from the moment an experimental stimulus is labeled as an advertisement.

A concomitant of this analysis is that verisimilitude cannot secure external validity for an experiment; for although a lack of construct validity can impede external validity, the presence of adequate construct validity says nothing specific about the extent of external validity secured. That is because no design methodology can do for external validity what random assignment can (almost) do for internal validity (cf. Cronbach 1982). As clearly articulated in McGrath and Brinberg (1983), no single experiment, laboratory or field, high or low in verisimilitude, carries within itself any information about its external validity in the restricted sense of Cook and Campbell (1979). Range and robustness can only emerge from multiple experiments--and do so most efficiently when the canon of critical multiplism is applied.

Finally, what about nomological validity, positioned by Cronbach and Meehl (1955) as the ultimate ground or warrant for judgements of construct validity? On this analysis a large scale failure of verisimilitude makes it impossible to assess the nomological validity of a research tradition such as experimental work on advertising. Nomological validity is built as diverse, partially valid operations converge and cross-validate in such a manner that rival explanations are minimized. But if the stimuli aren't really ads, the procedures don't really create an advertising encounter, the observations don't measure what advertising really does, and all of this takes place in an uncritically singular context, then nomological validity, with respect to advertising, has no chance to emerge.

In summary, verisimilitude concerns construct validity as opposed to external validity in the specific sense of Cook and Campbell (1979). For the scientific study of everyday social psychological phenomena such as advertising, verisimilitude controls many threats to construct validity in the same way that random assignment controls many threats to internal validity. In turn, critical multiplism acts still more broadly to control threats to both construct validity and external validity, by actually instantiating multiple treatment operations. Both are imperative if consumer research is to accumulate true knowledge of advertising.

DATA COLLECTION

Research Question

A content analysis of published experiments on advertising was undertaken to assess the achieved verisimilitude of consumer advertising research and the extent to which critical multiplism has characterized the efforts of the field. The content analysis incorporates laboratory experiments on advertising published in the Journal of Consumer Research, Journal of Marketing Research, and Journal of Marketing from 1980 to 1994 inclusive (in view of the critical intent of this paper, it seemed appropriate to restrict the analysis to these three journals, inasmuch as they represent the leading publications dealing with behavioral aspects of marketing and consumer research). The implicit claim of the preceding sections has been that high verisimilitude is rare and critical multiplism nearly absent in published studies. If examination of a large sample of studies were to show otherwise, then the critique would lose much of its force. Conversely, a demonstration of exactly how rare verisimilitude and critical multiplism are in consumer advertising research would specify the magnitude of the task facing future research, and would also clarify where the greatest threats to the validity of current research may lie.

A laboratory experiment was defined broadly to include any study where a treatment was administered or manipulated within circumstances controlled by the investigator (thus including some quasi-experimental designs but excluding field experiments). A total of 220 unduplicated experiments in 154 articles were identified. To be included an article had to assert that the stimuli in the experiments were "advertisements." Studies where the advertising was in no way the focus of the study but simply a delivery vehicle for information (as evidenced by no description of the ads) were excluded. In addition, any study that scrupulously claimed only to study messages or product descriptions, and not advertising, was also excluded (perhaps two dozen cases in all).

Criteria for Assessment of Verisimilitude and Multiplism

To specify the reality against which experiments will be compared, we would ideally begin by consulting a body of studies recording naturalistic observations of real advertising encounters; unfortunately, no such literature exists. In its absence, how do we avoid the criticism leveled by Calder et al. (1982, 1983) against Lynch: that the number of potential departures from reality is essentially unlimited? This dilemma can be resolved in part by comparing methodological choices across experiments and noting differences. Because procedure sections only discuss what experimenters believe to be important, a grounded account of those aspects of the real phenomenon that some experimenters have attempted to reproduce, and that others have ignored, is at least possible. This comparative assessment has the added advantage of simultaneously measuring the extent of critical multiplism. More generally, the content analysis of verisimilitude requires simply that the everyday knowledge of advertising which anchors the comparison to experiments be made explicit (see Cattell 1988, p.15).

In keeping with the emphases in the literature, this account will focus on mass-media advertising via print, radio, and tv. Everyday knowledge of mass-media advertising in the world outside the laboratory supports the following statements:

1. Print, radio and tv ads are encountered embedded in an editorial or programming context.

2. The surrounding editorial or programming context, and not the ads, is the primary focus of the audience's attention and motivation.

3. Most ads are repeated with the intent of delivering multiple exposures to the audience.

4. Advertising occurs in a competitive context where other brands are also advertising.

5. The desired effect of advertising occurs after a delay.

6. Advertising, alone or in concert with other promotions, is intended to alter choice and behavior.

7. Most advertising is targeted at certain people situated a particular way (i.e., with respect to their usage and familiarity, or lack thereof, with the product and brand). The responses of people outside the target are of no scientific concern unless the theories in question address societal impacts of advertising.

Specific Scoring Criteria

The organization of the scoring criteria is guided by an adaptation of Cronbach's (1982) analysis, wherein any experiment can be understood as the conjunction of choices made on four factors: the units, treatments, observations and settings.

Verisimilitude. The units were deemed targeted if screening criteria were applied to obtain subjects that were homogenous in terms of brand and product usage or knowledge (credit was also given if post hoc analyses showed that variations in knowledge and usage did not affect the results). Treatment procedures were distinguished as: 1) involving an embedded or isolated presentation of the ad; 2) in terms of whether the instructions made the ad the focus of attention; 3) as involving single or multiple exposure; and 4) as including or not including advertising from competitive brands. As to observations, studies that measured choice or behavior were considered high in verisimilitude. Behavioral measures include actual purchase, physical selection within the experimental context, simulated shopping, eye movements, time on task, and response latency. Whether observations were immediate (taken within the experimental session) or measured after a delay of a day or more was also scored.

As to settings, a violation of verisimilitude was noted: 1) if the product was generally not bought by the subject population (e.g., lawn tractors, students); or 2) if the incidence of non-usage of the product was likely to be substantial (e.g., outboard motors, adults); or 3) if the product was gendered in the sense that different ad appeals would ordinarily be directed at male versus female users (e.g., razor).

Critical Multiplism. Units were identified as college students or other. Treatment stimuli were distinguished as actual ads (this includes ads altered by the experimenter) or as something constructed by the experimenter, and as containing fictitious or real brands (whether familiar or unfamiliar).

Reliability of scoring. All studies were first coded by the author. Two student judges were then trained to apply the seven verisimilitude and three critical multiplism criteria using 12 advertising experiments drawn from a different journal (the author alone judged whether products were to be considered violations of verisimilitude; these judgements are available for inspection in Table 4). The 154 articles were systematically split into two groups (odd sequence numbers versus even), so that each student judge, working independently of the author, rated one half of the articles. To control for learning effects, one judge worked forward in time and the other backwards. Results indicated a high degree of interrater reliability (Table 1). When there was a disagreement between judge and author the other judge, blind to the author's position, rated that article to provide a tie breaker. Only summary results of the scoring are reported here.

-------------------

Table 1 about here

-------------------

RESULTS

Trends in the Literature

During the fifteen years under study the incidence of articles reporting laboratory experiments on advertising increased markedly, both in absolute terms and as a proportion of all articles (Figure 1). By 1990-94 reports of such experiments represented over 17% of JCR output. By contrast, during this period field experiments on advertising were almost absent from JCR and from JM (2 each), and infrequent in JMR (10). Laboratory experiments on advertising have clearly constituted a major thrust within consumer and marketing research, appear to be growing in favor, and are overwhelmingly preferred to field experiments.

-------------------

Figure 1 about here

-------------------

A rough but useful index to experimental sophistication is the number of articles that report multiple experiments, inasmuch as this is one of the most effective means of eliminating rival explanations for results (Sternthal et al. 1987). Figure 2 shows that this index of sophistication does improve for experimental advertising research over time. Nonetheless, in terms of this standard, research on advertising began the period far behind social and cognitive psychology, and continues to fall considerably short of their standard today (the importance of this comparison will become clear in the Discussion).

-------------------

Figure 2 about here

-------------------

Incidence of Verisimilitude

As shown in Table 2, each of the seven verisimilitude criteria was far more likely to be absent from a given experiment than present. For instance, more than 80% of all advertising experiments forced attention to the ad, in direct contravention of most natural viewing conditions; and more than 90% measured the effect of advertising immediately after exposure. In fact, about one third of these experiments proved to be innocent of all seven aspects of verisimilitude. However, there were some marked differences by media, with experiments using tv advertising showing on average a considerably higher level of verisimilitude than experiments with print ads, roughly half of which ignored all seven criteria.

-------------------

Table 2 about here

-------------------

A preliminary and very rough attempt to assess the overall verisimilitude of the accumulation of laboratory experiments on advertising was derived by creating a summed score based on the seven criteria and examining the distribution of this sum (Table 3). The average verisimilitude score across the 220 experiments proved to be 1.26. About one third of these experiments met two or more verisimilitude criteria; several dozen met three or more; only three experiments met five or more criteria. Finally, verisimilitude does not increase over time (Tables 2 & 3). Specifically, even as the number of advertising experiments was rising dramatically, most individual verisimilitude criteria were being met somewhat less often, and the number of experiments innocent of any verisimilitude criteria went up.

-------------------

Table 3 about here

-------------------

Violations of Verisimilitude

There were 79 experiments where the product being advertised was either not bought by students (e.g., lawn tractor), or far less than universally used by the population studied (e.g., outboard motor), or which normally are advertised in rather different ways to men and women (e.g., razor) but which were rated in the experiment by a mixed audience (Table 5). These cases are deemed violations of verisimilitude because if buying is not a possibility, then advertising is not being studied, except perhaps for its social and cultural effects. Experiments where the product advertised is consumed near universally were not considered violations, but were not counted as targeted either (since usage levels can vary widely).

-------------------

Table 4 about here

-------------------

In summary, if the 77 experiments that met none of the verisimilitude criteria are added to the 37 other experiments meeting at least one verisimilitude criterion but compromised because they used an inappropriate product, then the construct validity of 114 experiments, or 52% of this literature, may be said to be in question, in so far as a theory of advertising is concerned.

Failures of Critical Multiplism

These differ from failures of verisimilitude in being methodological choices that are acceptable at the level of individual experiments, only becoming problematic when they unduly dominate the literature as a whole. The paradigmatic example is the use of student subjects. It bears repeating that a laboratory experiment in which college students encounter advertisements for an appropriate product (e.g., a pen) does not per se violate verisimilitude. We may well worry, however, when 77% of all lab experiments on advertising use student subjects, as found in the present study. It is striking to note how closely this percentage corresponds to those reported by Sears (1986) and others who have examined general social psychological journals. It would appear that for purposes of experimental work, consumer behavior with respect to advertising has been approached as a human universal, analogous to interpersonal influence, attribution, and kindred social psychological topics, and not as a phenomenon fundamentally situated with respect to life cycle stage and income level.

With respect to the brands advertised, 67% of these experiments used either a fictitious brand (approximately 56%) or a real but unfamiliar brand (11%). Clearly attitude formation via advertising, and not attitude change, has been the dominant focus. This is worrisome in light of the meta-analysis of field experiments recently published by Lodish et al. (1995), which shows that advertising for new brands works very differently from advertising for existing brands.

With respect to the ad stimuli used, 62% were constructed by the experimenters. The threat to inference posed by constructed ads can be understood by analogy to the weakness that vitiates matching as a strategy for obtaining equivalent treatment groups. Whereas random assignment controls for both known and unknown threats, matching only controls for known threats to group equivalence. Similarly, constructed ads are likely only to contain those elements known to the investigator to be characteristic of ads. If, for instance, investigators do not recognize that print ad visuals tend to be highly figurative and stylized (as claimed by Scott 1994), then constructed stimuli will lack those properties or possess them haphazardly, thus undermining inference.

Most discouraging was the steady increase over time in the use of student subjects, fictitious brands, and constructed stimuli. By the most recent period, 96% of all ad experiments made use of at least one of these, and 47% made use of all three (Figure 3). Like verisimilitude, critical multiplism has not become any more common as the volume of advertising studies has increased.

-------------------

Figure 3 about here

-------------------

An Exemplary Misspecification: The 'Ad Processing' Manipulation

Very common in this literature was a processing manipulation termed "ad processing," to be opposed to "brand processing." The purpose of the ad processing manipulation is to create non-focal attention with respect to brand attribute information. However, consider for a moment the completely artificial and unnatural nature of the processing created by instructions such as these: "read the advertisement and evaluate its style based on alliteration, onomatopoeia, rhyme, hyperbole, and the use of 'you' and 'your'." Such instructions surely create an extreme degree of focal attention to the ad, however effective they may be in forestalling attention to attribute information. Hence, the treatment contrast is between two conditions both of which are characterized by highly focused attention, and this raises the issue of construct validity: specifically, does a treatment operation that directs highly focused attention on some other aspect of the ad (i.e., non-brand information) effectively index the construct of non-focal attention to the ad taken in its entirety?

It might be argued that ad processing manipulations are at least successful in instantiating "non-brand processing" and thus do not undermine inferences of the form, "Not-X when brand-processing is absent" (i.e., rational arguments are not persuasive when attribute information is not attended). True enough; but consider what such manipulations most definitely cannot test: any prediction of the form, "X, when (real alternative to brand processing) is present." Hence, in evaluating the contribution of experiments that use ad processing and similar manipulations the essential question becomes, How common is a brand-processing orientation in the world outside? If it is rare (it is certainly not absent), then research using only a brand processing treatment and its unreal negation has studied a special case to the neglect of the main phenomenon.

The true foil to brand processing--and brand processing, however rare it might be in real life, is something we might well want to reproduce under laboratory conditions--would appear to be processing focused on the editorial or programming context; and in fact, numerous television studies, in sharp contrast to the typical print study, use a cover story in which subjects are asked to focus on a news broadcast, without ever mentioning ads. Higher verisimilitude, hence higher construct validity, has to be accorded such studies. Even better, of course, would be a waiting room situation with a television playing, where subjects have to spend time before the "experiment" begins. Such a waiting room manipulation would reproduce an exemplary and typical case for television advertising--viewers who are killing time. If we do not know what works here (as opposed to only knowing what does not work), how much causal knowledge of advertising have we really gained?

DISCUSSION

The results of the content analysis might be compared to one of those ambiguous Gestalt figures that can be perceived as two quite different objects, depending on whether one hews to the Artificialist thesis or has been persuaded by the argument for verisimilitude. Either the content analysis shows laboratory research on advertising to be largely free of the shackles of mistaken notions concerning the need to be superficially similar to the phenomena to be explained, or it shows this stream of research increasingly committed to an unnatural isolation from those phenomena. The trend is clear but the meaning is subject to interpretation. Similarly, either laboratory research on advertising is quite sensibly concentrating more and more on student subjects (because good theories of advertising must perforce apply universally across life cycle stages and income groups), fictitious brands (because varying degrees of brand familiarity might impair statistical conclusion validity) and constructed ads (because ad qualities are both important to control and quite easy to instantiate), or this research is steadily falling away from the premise on which consumer research was founded, viz., that standard topics and procedures in social and cognitive psychology were never going to produce an effective explanation of consumer behavior. Whether for better or for worse, the content analysis reveals laboratory research on advertising to be highly artificial, narrow rather than multiplist, and growing more so with time.

Tacit Presuppositions Governing Experimental Work on Advertising

However the results of the content analysis be interpreted, we can at least identify and hold up for critical examination shared assumptions that have shaped experimental practice. Only when tacit presuppositions are made explicit can they become objects of investigation in their own right. For while an individual investigator must make many simplifying assumptions in order to conduct an experiment at all, the discipline that encompasses these investigators does not have this easy out. It is the field of consumer research which can and must be held responsible for systematic omissions and distortions of the phenomena it claims for its own (McGrath and Brinberg 1983, p. 123). Below are some of the more important governing assumptions to emerge, along with a brief mention of contradictory evidence where appropriate.

1. Advertising message effects are uniform regardless of how audience attention is achieved. Whether the experimenter forced attention, or the ad itself compelled it, is deemed irrelevant once attention is secured. By extension, on the evidence of this work the achievement of attention would appear to be of little theoretical interest, at least in comparison with post-exposure processes. Although advertising may not be unique among communication types in having no guarantee of attention (and even the presumption of audience inattention), that is surely one of its most distinguishing characteristics within the communication domain. Why would scientific work in consumer behavior want to ignore this salient fact?

2. Advertising message effects are uniform across contexts in which the ad may be embedded, to the extent that no context whatsoever is necessary for testing theories associated with these effects. More precisely, television ads may be sensitive to context, but print ads most definitely are not. Krugman's distinction between high and low involvement media appears to have become hypostatized. Investigators have assumed that consumers really do focus on and study print ads in isolation from their surrounding material. This assumption cries out to be tested through naturalistic observation of people reading magazines containing ads.

3a. Repetition of advertising only matters when memory theories are at issue, or when tedium (wearout) is the focus. Moreover, cognitive and attitudinal effects can be accurately reproduced via a single exposure. Here the contradictory evidence comes from studies within the resource matching tradition (e.g., Anand and Sternthal 1990).

3b. Repetition of advertisements within an experimental session effectively indexes repetition over a time period. This directly contradicts a long tradition in memory research that indicates that spaced practice provides a much more powerful boost to recall than massed rehearsal, by a factor of two or three (Landauer 1989). Type II errors are likely to have been incurred because of experimenters' predilection for relatively weak, single-session, massed repetition procedures.

4. Delayed measurement is deemed necessary only when theories of forgetting are at issue. Cognitive and attitudinal effects are held to be invariant across delay intervals, so that they can just as well be assessed immediately--even simultaneously--with exposure. Chattopadhyay and Alba (1988) and Chattopadhyay and Nedungadi (1992) call this assumption into question by showing that attitudinal effects of different kinds of stimuli in fact decay at different rates.

5. Advertising effects of theoretical interest are uniform across contexts where competitive advertising is or is not present. Studies by Keller (1987) and Burke and Srull (1988) cast doubt on this assumption, and a long tradition in social psychological research going back to Hovland and the sleeper effect has been concerned precisely with the problem of message impact under competitive interference. While the presence of such interference may not be unique to it, advertising has to be one of the most fruitful arenas in which to study interference effects, and it is again curious to see these effects so regularly ignored.

6. Behavior, and even choice, hold no special interest as outcomes of advertising. The former assumption, of course, is shared by most commercial copy testing services, and appears to have been adopted wholesale by academic researchers. Certainly it is expensive and time consuming, in real life, to measure purchase behavior; but for theory testing purposes, this excuse wears thin. If the plea be that advertising does not really cause behavior, but only affects intermediary factors which are ultimately the cause of behavior, as in some hierarchy of effects model, fine. What is then striking in this context is that only one of the 220 experiments measured the effect of advertising on willingness to use a coupon. If advertising's role is to alter consumer response to specific other factors that actually cause behavior, one would expect to see these other factors modeled in theory testing research. As it is, one would never know that this is a science of consumer behavior from reading this literature. Even less forgivable is the neglect of choice, as evidence piles up that choice effects do not correspond in any simple way to attitudinal effects (Baker 1991; Feldman and Lynch 1988; Shiv, Edell and Payne 1995).

7. The theoretically pertinent effects of advertising are invariant across differences in the consumption history of the audience. A striking contradiction is evident here. Student samples are often justified by arguing that sample homogeneity is important in order to maximize statistical conclusion validity (e.g., Sternthal et al. 1994). But this homogeneity argument is almost never applied to product usage, differing degrees of familiarity, and the like. The point is not that experiments must only include product users (many products are sold predominately to non-users); it is that theoretical predictions must specify who will be responsive to the treatment, and then the experimenter must screen for exactly that population.

In the ensemble, the above represent the "theory in use" held by the bulk of contemporary consumer researchers engaged in experimentation with advertising. The recommendation of this paper is two-fold. First, experimenters should strive to model more of these factors as part of their experimental procedure, so that the heroic assumptions just listed need no longer be made. Were the average verisimilitude score to increase to 2, 3, or 4, the discipline could much more quickly discover which assumptions are untenable. Second, experimenters at least need to routinely test one or two of the most relevant verisimilitude factors by incorporating them as treatment conditions. As multi-experiment papers gain favor, a particularly felicitous strategy would be to vary a different verisimilitude factor in each succeeding experiment. This is not a call for the disdained strategy of assessing the robustness of a theory; rather, this would be a resolute attempt to rule out alternative explanations by determining whether causal factors have been correctly labeled (see below). Can it really be maintained that the verisimilitude criteria contravened by the tacit presuppositions are background variables to be ignored until proven pertinent? They seem to me rather to be constituent features of that particular persuasion phenomenon known as advertising.

Consequences of Neglect of Verisimilitude

In criticizing an individual experiment it is generally not enough to point out a potential flaw; the critic must show how this flaw supports an alternative explanation of the results. However, in a content analysis of 220 experiments this is scarcely feasible. Instead, I will outline in general terms the impact to be expected when experiments neglect verisimilitude. The emphasis will be on interactions rendered invisible by uncritically singular methods and systematically censored datastreams (Cronbach 1982).

Embedded, non-focal, against competition, with delayed measurement. Non-focal attention is crucial because the whole character of key constructs such as motivation and depth of processing changes when ads are not the focus. Embedded matters largely because true non-focal attention is difficult to conceive without an alternative that has autonomous interest to the subject. When copresent, these two factors force the ad to both compete for attention and convey a message (Kover 1995). This joint goal has an inherent potential for conflict, inasmuch as attention-getting stimuli may also be message distractors. For any stimulus factor (i.e., celebrity endorsement, comparative appeal, two-sided message, etc.), the predicted interaction, under a multiplist methodology incorporating high and low verisimilitude conditions, is as shown in Figure 4a.

-------------------

Figure 4 about here

-------------------

Anecdotal evidence suggests that it is in fact much more difficult to obtain a significant treatment contrast under conditions of non-focal attention to embedded stimuli. Note that from a philosophical perspective it makes a great deal of difference whether the treatment contrast is merely attenuated in size or actually eliminated in the high verisimilitude cell (Humphries 1988). A surviving small but real contrast would leave the Artificialist position intact; no one has ever claimed that laboratory experiments can estimate the data magnitudes that would obtain in the real world. Were the treatment contrast to disappear in a subsequent high verisimilitude experiment, however, then it is simply false to conclude, from a low verisimilitude experiment, that "T causes o." The low verisimilitude experiment only warrants the inference that "T(the focal construct), conjoined with construct S1 (isolated stimulus), conjoined with S2 (focal attention), caused o." Interactions of the type shown in Figure 4a provide a nice example of the inseparability of construct validity and internal validity in the context of theory testing.

The same sort of interaction is predicted when conditions of delayed measurement and competitive interference are added to the design. Now to achieve a successful treatment contrast, an advertisement manipulation must be able to win attention, and convey a message, and withstand competitive attack, and endure. The probability of Figure 4a interactions seems very high under such multiple and diverse demands. Two conclusions might then be drawn. The despairing one would be that it does no good to conduct high verisimilitude experiments--i.e., to search for keys in the dark; hence, we had better stay under the lamppost, where we just might find, if not the car keys, then at least a bus token. The more confident conclusion would be that our present stock of advertising manipulations has simply overlooked the most important--i.e., causally efficacious--factors. We have to look elsewhere to discover what works. New theories are required.

As an aside, the prevalence of low verisimilitude experiments suggests one reason why consumer researchers continue to be criticized as borrowers of pre-existing psychological theories. From the standpoint of an evolutionary epistemology (Campbell 1974), it should not surprise us if pre-existing psychological theories continue to be successful when the data environments (isolated stimuli, focal attention, immediate measurement, and so on) also do not change. If consumer researchers have held every aspect of the experimental apparatus constant, except that brands have been substituted for the social objects studied by conventional psychologists, why should any new theory be required? If this is an accurate characterization, then such consumer research, which merely substitutes brand for person or for issue, cannot be labeled theory testing at all--it is simply an inquiry into the robustness and applicability of existing theory, of the sort disdained by Artificialists. To anticipate a later point, we do not need a separate discipline of consumer research to study message effects under conditions of focal attention and so on--conventional psychologists do that quite well.

Repetition and targeting. Here a different sort of interaction can be postulated. Repetition and target audience selection can be expected to work in opposite directions from the verisimilitude factors just discussed. As shown in Figure 4b, a treatment contrast which had no effect upon single exposure to a heterogenous audience might well produce a substantial effect under more favorable conditions in which a homogenous group of people who do actually buy the product come to rather different conclusions about an ad as their exposure increases. For example, research in the resource matching tradition has obtained complex cross-over interactions when levels of repetition have been varied over but a small fraction of the range that prevails in the world outside (Anand and Sternthal 1990).

Choice and Behavior. The neglect of choice and behavior data has perhaps the most complex effect. The problem with any paper and pencil attitude or rating scale is that it measures an intrinsically low commitment response that does not foreclose any options, in contrast to actual purchase behavior, where one must yield both other options and scarce money and resources. An additional problem with cognitive response and recall measures is the very real prospect that they alter the phenomena under study by changing salience relations (Feldman and Lynch 1988). All of these factors together suggest that cross-over interactions are a real possibility when attitude measures are compared to choice measures as outcomes of ad exposure, as recently found by Shiv et al. (1995).

In short, our thought experiment shows that the current corpus of results from advertising experiments may be riddled with both Type I errors, in which highly local and specific treatment contrasts have been illegitimately generalized, and Type II errors, in which true effects of substantial importance have been suppressed. Absent a more multiplist approach, the nomological validity of this work cannot be known.

Now it may be objected that some advertisements are the object of focal attention, and are not embedded, and so on, so that something can be salvaged from the literature. While this is surely true, two questions present themselves: 1) Is advertising that receives focal attention and the like the larger or the smaller part of the domain which consumer research is charged to explain? 2) Is there any reason to suppose that those ads that do receive focal attention and the like require new theories beyond those already developed by conventional psychologists for social objects other than brands? Consideration of these questions raises the stakes at issue, as we move from concern over progress in a sub-field--advertising theory--to concern about the accomplishments of the discipline of consumer research taken as a whole.

Verisimilitude and Consumer Research

Entertainment of one final objection will ease the transition to these larger concerns. That objection runs as follows: inasmuch as advertisements are a kind of message and do constitute an attempt at persuasion, must not truly scientific research, in its drive toward universality, inevitably gravitate toward constructs such as message as the central focus of their lawlike generalizations? Seen in that light, experiments that really try to reproduce the defining and characteristic features of the advertising encounter will tend to devolve toward applied research, and thus fail to be truly theoretical. More generally, it might be claimed that there is not and can never be a theory of advertising--only a problem (of understanding) advertising. For there surely is no theory of billiards, but only a general theory of classical mechanics, which can be shown to apply to specific problems in billiards, vehicle crashes, artillery trajectories, etc. Likewise, perhaps only such topics as message reception and information processing can be properly theoretical.

Whatever the appeal of this position in the abstract, and however acceptable it may be to scholars who primarily identify themselves as psychologists, a strict adherence to this view would make a mockery of the existence of the Journal of Consumer Research. If we who publish in and read this journal are fundamentally scientists of persuasion and communication, why do we not publish our best work in the same outlets as other persuasion scientists (e.g., Journal of Personality and Social Psychology), particularly inasmuch as such outlets are older, more established, with a wider audience and arguably more rigorous standards (as seen in Figure 2, which compares the average number of experiments per article)? How can the science of persuasion advance if investigators (ourselves) segregate into subdisciplines (consumer research) on the basis of trivial and superficial distinctions in content? Of course, the rebuttal, and the reason JCR was founded in the first place, is precisely the belief that consumer behavior constitutes a distinct subject matter in which emergent properties and laws, peculiar to consumption and not common across the entire domain of persuasion, communication, and so forth, may have to be considered in order to forge an adequate scientific account. But this rebuttal (to which I subscribe) imposes a responsibility: consumer researchers, in developing and testing their theories, have an obligation to collect evidence that is clearly and conspicuously within the domain of consumer behavior.

Once consumer behavior is acknowledged to be distinct, then investigators must ask, Have I captured consumer behavior in my lab, or something else? That is, every laboratory experiment whatsoever reports nothing less than the real behavior of real people. The question for consumer research must always be, Which real? If I do no more than verbally describe to you something that I deem to be a "product," may I then deem all your ensuing responses to be "consumer behavior"--or is this the Humpty Dumpty fallacy again?

Consumer behavior is definitely more specific, limited and partial than human behavior. If science must strive to be as general, broad and complete as possible--in a word, universal--then consumer science is a contradiction in terms and should be replaced by psychological science, or sociological science, or economics, etc. If on the other hand, endeavors at varying levels of abstraction and universality can all be considered to be scientific, by virtue of their use of the scientific method, then whether or not there should be a separate consumer science is simply a pragmatic question of how heuristically valuable it is to focus on one specific category of human action. The sharper that focus -- the more consumer researchers concentrate on behavior that is distinctively consumption related -- the sooner the question will be answered. Ultimately the neglect of verisimilitude is important because it threatens the claim of consumer research to be a distinct science.

References

Aaronson, Elliot, Marilynn Brewer, and J. Merrill Carlsmith (1985), "Experimentation in Social Psychology," in Handbook of Social Psychology, eds. Gardner Lindzey and Elliot Aaronson, New York: Random House, 441-489.

Anand, Punam and Brian Sternthal (1990), "Ease of Message Processing as a Moderator of Repetition Effects in Advertising," Journal of Marketing Research, 27 (August), 345-353.

Bagozzi, Richard P. (1980), Causal Models in Marketing, New York: Wiley.

Baker, William E. (1991), "The Relevance-Accessibility Model of Advertising Effectiveness," unpublished Doctoral Dissertation, Department of Marketing, University of Florida.

Berkowitz, Leonard and Edward Donnerstein (1982), "External Validity Is More Than Skin Deep: Some Answers to Criticisms of Laboratory Experiments," American Psychologist, 37 (March), 245-257.

Bloor, D. (1981), "The Strengths of the Strong Programme," Philosophy of the Social Sciences, 11, 199-213.

Brinberg, David, John G. Lynch and Alan G. Sawyer (1992), "Hypothesized and Confounded Explanations in Theory Tests: A Bayesian Analysis," Journal of Consumer Research, 19 (September), 139-154.

Burke, Raymond R. and Thomas K. Srull (1988) "Competitive Interference and Consumer Memory for Advertising," Journal of Consumer Research, 15 (June), 55-68.

Calder, Bobby J. (1977), "Focus Groups and the Nature of Qualitative Marketing Research,"Journal of Marketing Research, 14, 353-364.

Calder, Bobby J., Lynn W. Phillips, and Alice M. Tybout (1981), "Designing Research for Application," Journal of Consumer Research, 8 (September), 197-207.

Calder, Bobby J., Lynn W. Phillips, and Alice M. Tybout (1982), "The Concept of External Validity," Journal of Consumer Research, 9 (December), 240-244.

Calder, Bobby J., Lynn W. Phillips, and Alice M. Tybout (1983), "Beyond External Validity," Journal of Consumer Research, 10 (June), 112-114.

Campbell, Donald T. (1974), "Evolutionary epistemology," in The Philosophy of Karl Popper, ed. P. A. Schilpp. LaSalle, IL: Open Court Publishing.

Campbell, Donald T. (1984), "Can We Be Scientific in Applied Social Science," in Evaluation Studies Review Annual, Vol. 9, eds. Ross Connor, David G. Altman and Christine Jackson, Beverly Hills: Sage, pp. 26-48.

Campbell, Donald T. (1986), "Relabeling Internal and External Validity for Applied Social Scientists," in Advances in Quasi-Experimental Design and Analysis, ed. William M. K. Trochim, San Francisco: Jossey-Bass, pp. 67-78.

Campbell, Donald T. and J. C. Stanley (1966), Experimental and Quasi-Experimental Designs for Research, Chicago: Rand McNally.

Campbell, Donald T. and Donald W. Fiske (1959), "Convergent and Discriminant Validation by the Multitrait-Multimethod Matrix," Psychological Bulletin, 56, 81-105.

Cattell, Raymond B. (1988), "Psychological Theory and Scientific Method," in Handbook of Multivariate Experimental Psychology, eds John R. Nesselroade and Raymond B. Cattell, New York: Plenum Press, 3-20.

Chattopadhyay, Amitava and Joseph W. Alba (1988), "The Situational Importance of Recall and Inference in Consumer Decision Making," Journal of Consumer Research, 15 (June), 1-12.

Chattopadhyay, Amitava and Prakash Nedungadi (1992), "Does Attitude toward the Ad Endure? The Moderating Effects of Attention and Delay," Journal of Consumer Research, 19 (June), 26-33.

Cohen, Joel B. (1992), "Meta Is Not Always Betta: Further Thoughts About Dual Mediation," unpublished manuscript, Department of Marketing, University of Florida.

Cook, Thomas D. (1985), "Post-Positivist Critical Multiplism," in Social Science and Social Policy, eds. R. L. Shotland and M. M. Mark, Beverly Hills: Sage, pp. 21-62.

Cook, Thomas D. (1991), "Clarifying the Warrant for Generalized Causal Inferences in Quasi-Experimentation," in Evaluation and Education at Quarter Century, eds. M. W. McLaughlin and D. Phillips, Chicago: NSSE.

Cook, Thomas D. (1993), "A Quasi-Sampling Theory of the Generalization of Causal Relationships," in Understanding Causes and Generalizing About Them, ed. L. Sechrest and A. G. Scott, San Francisco: Jossey-Bass.

Cook, Thomas D. and Donald T. Campbell (1979), Design and Analysis of Quasi-Experimental Designs for Research, Chicago: Rand McNally.

Cook, Thomas D. and William R. Shadish (1994), "Social Experiments: Some Developments over the Past Fifteen Years," Annual Review of Psychology, 45, 545-580.

Cronbach, Lee J. (1982), Designing Evaluations of Educational and Social Programs, San Francisco: Jossey-Bass.

Cronbach, Lee J. and Paul E. Meehl (1955), "Construct Validity in Psychological Texts," Psychological Bulletin, 52, 281-302.

Danziger, Kurt (1990), Constructing the Subject: Historical Origins of Psychological Research, New York: Cambridge University Press.

Feldman, Jack M. and John G. Lynch (1988), "Self-Generated Validity and Other Effects of Measurement on Belief, Attitude, Intention and Behavior," Journal of Applied Psychology, 1988, 73 (3), 421-435.

Gergen, Kenneth J. (1978), "Experimentation in Social Psychology: A Reappraisal," European Journal of Social Psychology, 8, 507-27.

Greenwood, John D. (1989), Explanation and Experiment in Social Psychological Science, New York: Springer-Verlag.

Hacking, Ian (1983), Representing and Intervening: Topics in the Philosophy of Natural Science, New York: Cambridge University Press.

Harre, Rom and E. H. Madden (1975), Causal Powers: A Theory of Natural Necessity, Totowa, NJ: Rowman and Littlefield.

Harre, Rom and Paul Secord (1972), The Explanation of Social Behavior, Oxford: Basil Blackwell.

Henshel, Richard L. (1980), "The Purposes of Laboratory Experimentation and the Virtues of Deliberate Artificiality," Journal of Experimental Social Psychology, 16, 466-478.

Humphries, Paul (1988), The Chances of Explanation: Causal Explanation in the Social, Medical, and Physical Sciences, Princeton, NJ: Princeton University Press.

Hunt, Shelby D. (1991), Modern Marketing Theory: Critical Issues in the Philosophy of Marketing Science, Cincinnati, OH: Southwestern.

Hunt, Shelby D. (1994), "A Realist Theory of Empirical Testing: Resolving the Theory-Ladenness/Objectivity Debate," Philosophy of the Social Sciences, 24(June), 133-157.

Keller, Kevin Lane (1991), "Memory and Evaluation Effects in Competitive Advertising Environments," Journal of Consumer Research, 17 (March), 463-476.

Kover, Arthur J. (1995), "Copywriters' Implicit Theories of Communication: An Exploration," Journal of Consumer Research, 21 March), 596-611.

Kruglanski, A. W. and M. Kroy (1975), "Outcome Validity in Experimental Research: A Reconceptualization," Journal of Representative Research in Social Psychology, 7, 168-178.

Landauer, Thomas K. (1989), "Some Bad and Some Good Reasons for Studying Memory and Cognition in the Wild," in Everyday Cognition in Adulthood and Late Life, eds. Leonard W. Poon, David C. Rubin and Barbara A. Wilson, New York: Cambridge University Press, pp. 116-125.

Lodish, Leonard M. et al. (1995), "How T.V. Advertising Works: A Meta-Analysis of 389 Real World Split Cable T.V. Advertising Experiments," Journal of Marketing Research, 32(May), 125-139.

Lynch, John G. (1982), "On the External Validity of Experiments in Consumer Research," Journal of Consumer Research, 9 (December), 225-239.

Lynch, John G. (1983), "The Role of External Validity in Theoretical Research," Journal of Consumer Research, 10 (June), 109-111.

McGrath, Joseph E. and David Brinberg (1983), "External Validity and the Research Process: A Comment on the Calder/Lynch Dialogue," Journal of Consumer Research, 10 (June), 115-124.

Meehl, Paul E. (1978), "Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology," Journal of Consulting and Clinical Psychology, 46, 806-834.

Meehl, Paul E. (1986), "What Social Scientists Don't Understand," in Metatheory in Social Science, eds. Donald W. Fiske and Richard Shweder, Chicago: University of Chicago Press.

Mook, Douglas G. (1983), "In Defense of External Invalidity," American Psychologist, 38 (April), 379-387.

Neisser, Ulric (1991), "A Case of Misplaced Nostalgia," American Psychologist, 46 (January), 34-36.

O'Grady, Kevin E. (1982), "Measures of Explained Variance: Cautions and Limitations," Psychological Bulletin, 92 (3), 766-777.

Perrault, William D. and Laurence E. Leigh (1989), "Reliability of Nominal Data Based on Qualitative Judgements," Journal of Marketing Research, 26 (May), 135-48.

Peter, J. Paul (1981), "Construct Validity: A Review of Basic Issues and Marketing Practices," Journal of Marketing Research, 18 (May), 133-145.

Popper, Karl (1959), The Logic of Scientific Discovery, New York: Basic Books.

Popper, Karl (1976), "A Note on Verisimilitude," British Journal for the Philosophy of Science, 27, 147-195.

Rubin, David C. and Wanda T. Wallace (1989), "Rhyme and Reason: Analyses of Dual Retrieval Cues," Journal of Experimental Psychology: Learning, Memory and Cognition, 15 (4), 698-709.

Scott, Linda M. (1994b), "Images in Advertising: The Need for a Theory of Visual Rhetoric," Journal of Consumer Research, 21(September), 252-273.

Sears, David O. (1986), "College Sophomores in the Laboratory: Influences of a Narrow Database on Social Psychology's View of Human Nature," Journal of Personality and Social Psychology, 51(3), 515-530.

Shiv, Baba, Julie A. Edell, and John W. Payne (1995), "The Effects of Message Framing on Aad, Ab, and Brand Choice: When Is What You Dislike What You Choose?" paper presented at the Association for Consumer Research Conference in Minneapolis.

Sternthal, Brian, Alice M. Tybout, and Bobby J. Calder (1987), "Confirmatory vs. Comparative Approaches to Judging Theory Tests, Journal of Consumer Research, 14 (June), 114-125.

____________, ____________, and ____________ (1994), "Experimental Design: Generalization and Theoretical Explanation," in Principles of Marketing Research, ed. Richard P. Bagozzi, New York: Basil Blackwell, pp. 195-220

Suppe, Frederick (1977), The Structure of Scientific Theories, 2nd ed., Chicago: University of Illinois Press.

Toulmin, Stephen and David E. Leary (1986), "The Cult of Empiricism in Psychology, and Beyond," in A Century of Psychology as a Science, eds. Sigmund Koch and David E. Leary, New York: McGraw-Hill, pp. 594-617.

Umesh, U. N., Robert A. Peterson, and Matthew H. Sauber (1989), "Interjudge Agreement and the Maximum Value of Kappa," Educational and Psychological Measurement, 49, 835-850.

Wells, William D. (1993), "Discovery-Oriented Consumer Research," Journal of Consumer Research, 19 (March), 489-504.

Endnotes