fbpx
Wikipedia

Best–worst scaling

Best–worst scaling (BWS)[1] techniques involve choice modelling (or discrete choice experiment – "DCE") and were invented by Jordan Louviere in 1987 while on the faculty at the University of Alberta. In general with BWS, survey respondents are shown a subset of items from a master list and are asked to indicate the best and worst items (or most and least important, or most and least appealing, etc.). The task is repeated a number of times, varying the particular subset of items in a systematic way, typically according to a statistical design. Analysis is typically conducted, as with DCEs more generally, assuming that respondents makes choices according to a random utility model (RUM). RUMs assume that an estimate of how much a respondent prefers item A over item B is provided by how often item A is chosen over item B in repeated choices. Thus, choice frequencies estimate the utilities on the relevant latent scale. BWS essentially aims to provide more choice information at the lower end of this scale without having to ask additional questions that are specific to lower ranked items.

History edit

Louviere attributes the idea to the early work of Anthony A. J. Marley in his PhD thesis, who together with Duncan Luce in the 1960s produced much of the ground-breaking research in mathematical psychology and psychophysics to axiomatise utility theory. Marley had encountered problems axiomatising certain types of ranking data and speculated in the discussion of his thesis that examination of the "inferior" and "superior" items in a list might be a fruitful topic for future research. The idea then languished for three decades until the first working papers and publications appeared in the early 1990s. The definitive textbook describing the theory, methods and applications was published in September 2015 (Cambridge University Press) by Jordan Louviere (University of South Australia), Terry N Flynn (TF Choices Ltd.) and Anthony A. J Marley (University of Victoria and University of South Australia).[1] The book brings together the disparate research from various academic and practical disciplines, in the hope that replication and mistakes in implementation are avoided. The three authors have (individually and together) already published many of the key academic peer-reviewed articles describing BWS theory,[2][3][4] practice,[5][6] and a number of applications in health,[5] social care,[7] marketing,[6] transport, voting,[8] and environmental economics.[9] However, the method has now become popular in the wider research and practitioner communities, with other researchers exploring its use in areas as diverse as student evaluation of teaching,[10] marketing of wine,[11] quantification of concerns over ADHD medication,[12] the importance of environmental sustainability,[13] and priority-setting in genetic testing.[14]

Purposes edit

There are two different purposes of BWS – as a method of data collection, and/or as a theory of how people make choices when confronted with three or more items. This distinction is crucial, given the continuing misuse of the term maxdiff to describe the method. As Marley and Louviere note, maxdiff is a long-established academic mathematical theory with very specific assumptions about how people make choices:[2] it assumes that respondents evaluate all possible pairs of items within the displayed set and choose the pair that reflects the maximum difference in preference or importance.

As a theory of process (theory of decision-making) edit

Consider a set in which a respondent evaluates four items: A, B, C and D. If the respondent says that A is best and D is worst, these two responses inform us about five of six possible implied paired comparisons:

A > B, A > C, A > D, B > D, C > D

The only paired comparison that cannot be inferred is B vs. C. In a choice among five items, MaxDiff questioning informs on seven of ten implied paired comparisons. Thus BWS may be thought of as a variation of the method of Paired Comparisons.

Yet respondents can produce best-worst data in any of a number of ways. Instead of evaluating all possible pairs (the maxdiff model), they might choose the best from n items, the worst from the remaining n-1, or vice versa. Or indeed they may use another method entirely. Thus it should be clear that maxdiff is a subset of BWS. The maxdiff model has proved to be useful in proving the properties of a number of estimators in BWS.[2][3][4] However, its realism as a description of how humans might actually provide best and worst data can be questioned for the following reason. As the number of items increases, the number of possible pairs increases in a multiplicative fashion: n items produces n(n-1) pairs (where best-worst order matters). To assume that respondents do evaluate all possible pairs is a strong assumption and in 14 years of presentations, the three co-authors have virtually never found a course or conference participant who admitted to using this method to decide their best and worst choices.[1] Virtually all admitted to using sequential models (best then worst or worst then best).[15]

Early work (including that of Louviere himself) did use the term maxdiff to refer to BWS, but with the recruitment of Marley to the team developing the method, correct academic terminology has been disseminated throughout Europe and Asia-Pacific (if not North America, which continues to use the maxdiff term). Indeed, it is an open question whether major software manufacturers of discrete choice maxdiff routines actually implement maxdiff models in estimating parameters, despite this continuing advertising of maxdiff capabilities.

As a method of data collection edit

The second use of BWS is as a method of data collection (rather than as a theory of how humans produce a best and a worst item). BWS can, particularly in the age of web-based surveys, be used to collect data in a systematic way that (1) forces all respondents to provide best and worst data in the same way (by, for instance, asking best first, greying out the chosen option, then asking worst); (2) Enables collection of a full ranking, if repeated BWS questioning is implemented to collect the "inner rankings". In many contexts, BWS for data collection has been regarded merely as a way to obtain such data in order to facilitate data expansion (to estimate conditional logit models with far more choice sets) or to estimate conventional rank ordered logit models.[16]

Types ("cases") edit

The renaming of the method, to make clear that maxdiff scaling is BWS but BWS is not necessarily maxdiff, was decided by Louviere in consultation with his two key contributors (Flynn and Marley) in preparation for the book, and was presented in an article by Flynn.[17] That paper also took the opportunity to make clear that there are, in fact, three types ("cases") of BWS: Case 1 (the "object case"), Case 2 (the "profile case") and Case 3 (the "multi-profile case"). These three cases differ largely in the complexity of the choice items on offer.

Case 1 (the "object case") edit

Case 1 presents items that may be attitudinal statements, policy goals, marketing slogans or any type of item that has no attribute and level structure. It is primarily used to avoid scale biases known to affect rating (Likert) scale data.[18][19] It is particularly useful when eliciting the degree of importance or agreement that respondents ascribe from a set of statements and when the researcher wishes to ensure that the items compete with each other (so that respondents cannot easily rate multiple items as being of the same importance).

Case 2 (the "profile case") edit

Case 2 has predominated in health and the items are the attribute levels describing a single profile of the type familiar to choice modellers. Instead of making choices between profiles, the respondent must make best and worst (most and least) choices within a profile. Thus, for the example of a mobile (cell) phone, the choices would be the most acceptable and least acceptable features of a given phone. Case 2 has proved to be powerful in eliciting preferences among vulnerable groups, such as the elderly,[20][21] older carers,[22] and children,[23] who find conventional multi-profile discrete choice experiments difficult. Indeed, the first comparison of Case 2 with a DCE in a single model found that whilst the vast majority of (older) respondents provided usable data from the BWS task, only around one half do so for the DCE.[20]

Case 3 (the "multi-profile case") edit

Case 3 is perhaps the most familiar to choice modellers, being merely an extension of a discrete choice model: the number of profiles must be three or more, and instead of simply choosing the one the respondent would purchase, (s)he chooses the best and worst profile.

Designs for studies edit

Case 1 BWS studies typically use balanced incomplete block designs (BIBDs). These cause every item to appear the same number of times and also force every item to compete with every other the same number of times. These features are attractive since the respondent is prevented from inferring erroneous information about the items (what items the designer is "really" interested in).[1] They also ensure that there can be no "ties" in importance/salience at the very top or bottom of the scale.

Case 2 BWS studies can use Orthogonal Main Effects Plans (OMEPs) or efficient designs, although the former has predominated to date.

Case 3 BWS studies may use any of the types of design typically used for a DCE, with the proviso that the number of profiles (alternatives) in a choice set must be three or more for the BWS task to make sense.

Recent history edit

Steve Cohen introduced BWS to the marketing research world in a paper presented at an ESOMAR Conference in Barcelona in 2002 entitled, "Renewing market segmentation: Some new tools to correct old problems." This paper was nominated for Best paper at that conference. In 2003 at the ESOMAR Latin America Conference in Punta del Este, Uruguay, Steve and his co-author, Dr. Leopldo Neira, compared BWS results to those obtained by rating scale methods. This paper won Best Methodological Paper at that conference. Later the same year, it was selected as winner of the John and Mary Goodyear Award for Best Paper at all ESOMAR Conferences in 2003 and then it was published as the lead article in "Excellence in International Research 2004," published by ESOMAR. At the 2003 Sawtooth Software Conference, Steve Cohen's paper, "Maximum Difference Scaling: Improved Measures of Importance and Preference for Segmentation," was selected as Best Presentation. Cohen and Sawtooth Software president Bryan Orme agreed that MaxDiff should be part of the Sawtooth package and it was introduced later that year. Later in 2004, Cohen and Orme won the David K. Hardin Award from the AMA for their paper which was published in Marketing Research Magazine entitled, "What's your preference? Asking survey respondents about their preferences creates new scaling decisions."

In parallel to this, Emma McIntosh and Jordan Louviere introduced BWS (case 2) to the health community at the 2002 Health Economists' Study Group conference. This prompted the collaboration with Flynn and ultimately the link-up with Marley, who had begun working with Louviere independently to prove the properties of BWS estimators. The popularity of the three cases has largely varied by academic discipline, with case 1 proving popular in marketing and food research, case 2 largely being adopted in health, and case 3 being used across a variety of disciplines that already use DCEs. It was partly this lack of understanding in many disciplines that there are actually three cases of BWS that prompted the three main developers to write the textbook.

The book contains an introductory chapter summarising the history of BWS and the three cases, together with why the respondent must think whether (s)he wishes to use it to understand theory (processes) of decision-making and/or merely to collect data in a systematic way. Three chapters, one for each case, follow, detailing the intuition and application of each. A chapter bringing together Marley's work proving the properties of the key estimators and laying out some open issues then follows. After laying out open issues for further analysis, nine chapters (three per case – describing applications from a variety of disciplines) then follow.[citation needed]

Conducting a study edit

The basic steps in conducting all types of BWS study are:

  • Conduct proper qualitative or other research to properly identify and describe all items of interest.[24]
  • Construct a statistical design that indicates what items are to be presented in each set of items ("choice set") – designs may come from publicly available catalogues, be constructed by hand, or produced from commercially available software.
  • Use the design to construct the choice sets, which contain the actual relevant items (textually or visually).
  • Obtain response data where respondents choose the best and worst from each task; repeat best-worst (to obtain second best, second worst, etc.) may be conducted if the analyst wishes for more data.
  • Input the data into a statistical software program and analyse. The software will produce utility functions for each of the features. In addition to utility scores, you can also request raw counts which will simply sum the total number of times a product was selected as best and worst. These utility functions indicate the perceived value of the product on an individual level and how sensitive consumer perceptions and preferences are to changes in product features.[citation needed]

Analysis edit

Estimation of the utility function is performed using any of a variety of methods.

  1. multinomial discrete choice analysis, in particular multinomial logit (strictly speaking the conditional logit, although the two terms are now used interchangeably). The multinomial logit (MNL) model is often the first stage in analysis and provides a measure of average utility for the attribute levels or objects (depending on the Case).[citation needed]
  2. In many cases, particularly cases 1 and 2, simple observation and plotting of choice frequencies should actually be the first step, as it is very useful in identifying preference heterogeneity and respondents using decision-rules based on a single attribute.
  3. Several algorithms could be used in this estimation process, including maximum likelihood, neural networks, and the hierarchical Bayes model. The Hierarchical Bayes model is beneficial because it allows for borrowing across the data, although since BWS often allows the estimation of individual level models, the benefits of Bayesian models are heavily attenuated. Response time models have recently been shown to replicate the utility estimates of BWS, which represents a major step forward in the validation of stated preferences generally, and BWS preferences specifically.[25][26]

Advantages edit

BWS questionnaires are relatively easy for most respondents to understand. Furthermore, humans are much better at judging items at extremes than in discriminating among items of middling importance or preference[citation needed]. And since the responses involve choices of items rather than expressing strength of preference, there is no opportunity for scale use bias.

Respondents find these ratings scales very easy but they do tend to deliver results which indicate that everything is "quite important", making the data not especially actionable.[citation needed] BWS on the other hand forces respondents to make choices between options, while still delivering rankings showing the relative importance of the items being rated. It also produces:

  • Distributions of "the scores" (calculated as the best frequency minus the worst frequency) for all items which allow the researcher to observe the empirical distribution of estimated utilities. This produces information on how realistic the results from traditional analysis methods assuming standard continuous distributions are likely to be. Consumers tend to form distinct groups with often very different preferences, giving rise to multi-modal distributions.
  • Data that allow investigation of the decision rule (functional form of the utility function) at various ranking depths (most simply, the "best decision rule vs the worst decision rule"). Emerging research is suggesting that in some contexts respondents do not use the same rule, which calls into question the use of estimation methods such as the rank ordered logit model.
  • Estimation of attribute impact, a measure of the overall impact of an attribute upon choices that is not available from conventional discrete choice models.
  • More data, that allow greater insights into choices, for a given number of choice sets. The same information could be obtained by simply presenting more choice sets but this runs the risk that respondents become bored and disengage with the task.
  • Quantifying the phenomena of response shift and adaptation to poor health states.[20]

Disadvantages edit

Best–worst scaling involves the collection of at least two sets of data: at a minimum, first-best and first-worst, and in some cases additional ranks (second best, second worst, etc...) The issue of how to combine these data is pertinent. Early work assumed best was simply the inverse of worst: that respondents had an internal ranking of all items and just chose the highest/lowest ranked item in a given question. More recent work has suggested that in some contexts this is not the case: a person might (for instance) choose according to traditional economic theory for best (trading across attributes) but choose worst using an elimination by attributes strategy (choosing as worst the item that is simply unacceptable on one attribute). In the presence of such different decision rules it becomes impossible to know how to combine the data: at what point does the person, when moving down the rankings, move from "economic trading" to "elimination by aspects".[citation needed]

This presents a clear problem for the data augmentation motivation for BWS but not necessarily for BWS when used as a way to understand process (decision-making). Psychologists in particular would be particularly interested in the different types of decision-making. Marketers, also, might wish to know if a given product had an unacceptable feature. Work is ongoing to investigate when different decision rules arise, and whether/how data from such different sources may be combined.[citation needed]

BWS also suffers from the same disadvantages of all stated preference techniques. It is unknown if the preferences are consistent with choices made in the real world (revealed preferences). In some instances revealed preferences (typically real market decisions) are available, providing a test of the BWS choices. In others, quite often health, there are no revealed preference data and validation appears impossible. More recently attempts have been made to validate SP data using physiological data, such as eye-tracking and response times.[25] Early work suggests that response time models are consistent with results from BWS models in health care but more research is required in other contexts.

References edit

  1. ^ a b c d "Best-Worst Scaling". Cambridge University Press. Retrieved 2015-10-01.
  2. ^ a b c Marley, A. A. J.; Louviere, J. J. (2005-12-01). "Some probabilistic models of best, worst, and best–worst choices". Journal of Mathematical Psychology. Special Issue Honoring Jean-Claude Falmagne: Part 1Special Issue Honoring Jean-Claude Falmagne: Part 1. 49 (6): 464–480. doi:10.1016/j.jmp.2005.05.003.
  3. ^ a b Marley, A. A. J.; Flynn, Terry N.; Louviere, J. J. (2008-10-01). "Probabilistic models of set-dependent and attribute-level best–worst choice". Journal of Mathematical Psychology. 52 (5): 281–296. doi:10.1016/j.jmp.2008.02.002. hdl:10453/8292.
  4. ^ a b Marley, A. A. J.; Pihlens, D. (2012-02-01). "Models of best–worst choice and ranking among multiattribute options (profiles)". Journal of Mathematical Psychology. 56 (1): 24–34. doi:10.1016/j.jmp.2011.09.001.
  5. ^ a b Flynn, Terry N.; Louviere, Jordan J.; Peters, Tim J.; Coast, Joanna (2007-01-01). "Best–worst scaling: What it can do for health care research and how to do it". Journal of Health Economics. 26 (1): 171–189. doi:10.1016/j.jhealeco.2006.04.002. PMID 16707175.
  6. ^ a b Louviere, Jordan; Lings, Ian; Islam, Towhidul; Gudergan, Siegfried; Flynn, Terry (2013-09-01). "An introduction to the application of (case 1) best–worst scaling in marketing research" (PDF). International Journal of Research in Marketing. 30 (3): 292–303. doi:10.1016/j.ijresmar.2012.10.002.
  7. ^ Potoglou, Dimitris; Burge, Peter; Flynn, Terry; Netten, Ann; Malley, Juliette; Forder, Julien; Brazier, John E. (2011-05-01). "Best–worst scaling vs. discrete choice experiments: An empirical comparison using social care data" (PDF). Social Science & Medicine. 72 (10): 1717–1727. doi:10.1016/j.socscimed.2011.03.027. PMID 21530040. S2CID 10594387.
  8. ^ García-Lapresta, José Luis; Marley, A. a. J.; Martínez-Panero, Miguel (2009-09-12). "Characterizing best–worst voting systems in the scoring context". Social Choice and Welfare. 34 (3): 487–496. doi:10.1007/s00355-009-0417-1. ISSN 0176-1714. S2CID 18334695.
  9. ^ Scarpa, Riccardo; Notaro, Sandra; Louviere, Jordan; Raffaelli, Roberta (2011-06-19). "Exploring Scale Effects of Best/Worst Rank Ordered Choice Data to Estimate Benefits of Tourism in Alpine Grazing Commons". American Journal of Agricultural Economics. 93 (3): 813–828. doi:10.1093/ajae/aaq174. ISSN 0002-9092.
  10. ^ Huybers, Twan (2014-05-19). "Student evaluation of teaching: the use of best–worst scaling". Assessment & Evaluation in Higher Education. 39 (4): 496–513. doi:10.1080/02602938.2013.851782. ISSN 0260-2938. S2CID 144637200.
  11. ^ Cohen, Eli (2009-03-20). "Applying best‐worst scaling to wine marketingnull". International Journal of Wine Business Research. 21 (1): 8–23. doi:10.1108/17511060910948008. ISSN 1751-1062.
  12. ^ Ross, Melissa; Bridges, John F. P.; Ng, Xinyi; Wagner, Lauren D.; Frosch, Emily; Reeves, Gloria; dosReis, Susan (2014-11-17). "A Best-Worst Scaling Experiment to Prioritize Caregiver Concerns About ADHD Medication for Children". Psychiatric Services. 66 (2): 208–211. doi:10.1176/appi.ps.201300525. ISSN 1075-2730. PMC 5294953. PMID 25642618.
  13. ^ Mueller Loose, Simone; Lockshin, Larry (2013-03-01). "Testing the robustness of best worst scaling for cross-national segmentation with different numbers of choice sets". Food Quality and Preference. Ninth Pangborn Sensory Science Symposium. 27 (2): 230–242. doi:10.1016/j.foodqual.2012.02.002.
  14. ^ Severin, Franziska; Schmidtke, Jörg; Mühlbacher, Axel; Rogowski, Wolf H. (2013-11-01). "Eliciting preferences for priority setting in genetic testing: a pilot study comparing best-worst scaling and discrete-choice experiments". European Journal of Human Genetics. 21 (11): 1202–1208. doi:10.1038/ejhg.2013.36. ISSN 1018-4813. PMC 3798841. PMID 23486538.
  15. ^ Flynn, Terry N.; Louviere, Jordan J.; Peters, Tim J.; Coast, Joanna (2008-11-18). "Estimating preferences for a dermatology consultation using Best-Worst Scaling: Comparison of various methods of analysis". BMC Medical Research Methodology. 8 (1): 76. doi:10.1186/1471-2288-8-76. ISSN 1471-2288. PMC 2600822. PMID 19017376.  
  16. ^ Louviere, Jordan J.; Street, Deborah; Burgess, Leonie; Wasi, Nada; Islam, Towhidul; Marley, Anthony A. J. (2008-01-01). "Modeling the choices of individual decision-makers by combining efficient choice experiment designs with extra preference information". Journal of Choice Modelling. 1 (1): 128–164. doi:10.1016/S1755-5345(13)70025-3. hdl:10453/9977.
  17. ^ Flynn, Terry N. (2010-06-01). "Valuing citizen and patient preferences in health: recent developments in three types of best–worst scaling". Expert Review of Pharmacoeconomics & Outcomes Research. 10 (3): 259–267. doi:10.1586/erp.10.29. ISSN 1473-7167. PMID 20545591. S2CID 39949090.
  18. ^ Baumgartner, Hans; Steenkamp, Jan-Benedict E.M. (2001-05-01). "Response Styles in Marketing Research: A Cross-National Investigation". Journal of Marketing Research. 38 (2): 143–156. doi:10.1509/jmkr.38.2.143.18840. ISSN 0022-2437. S2CID 11304067.
  19. ^ Steenkamp, Jan‐Benedict E. M.; Baumgartner, Hans (1998-06-01). "Assessing Measurement Invariance in Cross‐National Consumer Research". Journal of Consumer Research. 25 (1): 78–107. doi:10.1086/209528. JSTOR 10.1086/209528.
  20. ^ a b c N. Flynn, Terry; J. Peters, Tim; Coast, Joanna (2013-03-01). "Quantifying response shift or adaptation effects in quality of life by synthesising best-worst scaling and discrete choice data". Journal of Choice Modelling. 6: 34–43. doi:10.1016/j.jocm.2013.04.004.
  21. ^ Coast, Joanna; Flynn, Terry N.; Natarajan, Lucy; Sproston, Kerry; Lewis, Jane; Louviere, Jordan J.; Peters, Tim J. (2008-09-01). "Valuing the ICECAP capability index for older people". Social Science & Medicine. Part Special Issue: Ethics and the ethnography of medical research in Africa. 67 (5): 874–882. doi:10.1016/j.socscimed.2008.05.015. hdl:10453/9747. PMID 18572295.
  22. ^ Al-Janabi, Hareth; Flynn, Terry N.; Coast, Joanna (2011-05-01). "Estimation of a Preference-Based Carer Experience Scale". Medical Decision Making. 31 (3): 458–468. doi:10.1177/0272989X10381280. ISSN 0272-989X. PMID 20924044. S2CID 30922199.
  23. ^ Ratcliffe, Professor Julie; Flynn, Terry; Terlich, Frances; Stevens, Katherine; Brazier, John; Sawyer, Michael (2012-12-23). "Developing Adolescent-Specific Health State Values for Economic Evaluation". PharmacoEconomics. 30 (8): 713–727. doi:10.2165/11597900-000000000-00000. ISSN 1170-7690. PMID 22788261. S2CID 21778695.
  24. ^ Coast, Joanna; Al-Janabi, Hareth; Sutton, Eileen J.; Horrocks, Susan A.; Vosper, A. Jane; Swancutt, Dawn R.; Flynn, Terry N. (2012-06-01). "Using qualitative methods for attribute development for discrete choice experiments: issues and recommendations". Health Economics. 21 (6): 730–741. doi:10.1002/hec.1739. ISSN 1099-1050. PMID 21557381.
  25. ^ a b Hawkins, Guy E.; Marley, A.a.j.; Heathcote, Andrew; Flynn, Terry N.; Louviere, Jordan J.; Brown, Scott D. (2014-05-01). "Integrating Cognitive Process and Descriptive Models of Attitudes and Preferences". Cognitive Science. 38 (4): 701–735. doi:10.1111/cogs.12094. hdl:1959.13/1053320. ISSN 1551-6709. PMID 24124986. S2CID 15328149.
  26. ^ "The best of times and the worst of times are interchangeable". APA PsycNET. Retrieved 2015-10-01.

best, worst, scaling, confused, with, maxdiff, techniques, involve, choice, modelling, discrete, choice, experiment, were, invented, jordan, louviere, 1987, while, faculty, university, alberta, general, with, survey, respondents, shown, subset, items, from, ma. Not to be confused with MaxDiff Best worst scaling BWS 1 techniques involve choice modelling or discrete choice experiment DCE and were invented by Jordan Louviere in 1987 while on the faculty at the University of Alberta In general with BWS survey respondents are shown a subset of items from a master list and are asked to indicate the best and worst items or most and least important or most and least appealing etc The task is repeated a number of times varying the particular subset of items in a systematic way typically according to a statistical design Analysis is typically conducted as with DCEs more generally assuming that respondents makes choices according to a random utility model RUM RUMs assume that an estimate of how much a respondent prefers item A over item B is provided by how often item A is chosen over item B in repeated choices Thus choice frequencies estimate the utilities on the relevant latent scale BWS essentially aims to provide more choice information at the lower end of this scale without having to ask additional questions that are specific to lower ranked items Contents 1 History 2 Purposes 2 1 As a theory of process theory of decision making 2 2 As a method of data collection 3 Types cases 3 1 Case 1 the object case 3 2 Case 2 the profile case 3 3 Case 3 the multi profile case 4 Designs for studies 5 Recent history 6 Conducting a study 7 Analysis 8 Advantages 9 Disadvantages 10 ReferencesHistory editLouviere attributes the idea to the early work of Anthony A J Marley in his PhD thesis who together with Duncan Luce in the 1960s produced much of the ground breaking research in mathematical psychology and psychophysics to axiomatise utility theory Marley had encountered problems axiomatising certain types of ranking data and speculated in the discussion of his thesis that examination of the inferior and superior items in a list might be a fruitful topic for future research The idea then languished for three decades until the first working papers and publications appeared in the early 1990s The definitive textbook describing the theory methods and applications was published in September 2015 Cambridge University Press by Jordan Louviere University of South Australia Terry N Flynn TF Choices Ltd and Anthony A J Marley University of Victoria and University of South Australia 1 The book brings together the disparate research from various academic and practical disciplines in the hope that replication and mistakes in implementation are avoided The three authors have individually and together already published many of the key academic peer reviewed articles describing BWS theory 2 3 4 practice 5 6 and a number of applications in health 5 social care 7 marketing 6 transport voting 8 and environmental economics 9 However the method has now become popular in the wider research and practitioner communities with other researchers exploring its use in areas as diverse as student evaluation of teaching 10 marketing of wine 11 quantification of concerns over ADHD medication 12 the importance of environmental sustainability 13 and priority setting in genetic testing 14 Purposes editThere are two different purposes of BWS as a method of data collection and or as a theory of how people make choices when confronted with three or more items This distinction is crucial given the continuing misuse of the term maxdiff to describe the method As Marley and Louviere note maxdiff is a long established academic mathematical theory with very specific assumptions about how people make choices 2 it assumes that respondents evaluate all possible pairs of items within the displayed set and choose the pair that reflects the maximum difference in preference or importance As a theory of process theory of decision making edit Consider a set in which a respondent evaluates four items A B C and D If the respondent says that A is best and D is worst these two responses inform us about five of six possible implied paired comparisons A gt B A gt C A gt D B gt D C gt DThe only paired comparison that cannot be inferred is B vs C In a choice among five items MaxDiff questioning informs on seven of ten implied paired comparisons Thus BWS may be thought of as a variation of the method of Paired Comparisons Yet respondents can produce best worst data in any of a number of ways Instead of evaluating all possible pairs the maxdiff model they might choose the best from n items the worst from the remaining n 1 or vice versa Or indeed they may use another method entirely Thus it should be clear that maxdiff is a subset of BWS The maxdiff model has proved to be useful in proving the properties of a number of estimators in BWS 2 3 4 However its realism as a description of how humans might actually provide best and worst data can be questioned for the following reason As the number of items increases the number of possible pairs increases in a multiplicative fashion n items produces n n 1 pairs where best worst order matters To assume that respondents do evaluate all possible pairs is a strong assumption and in 14 years of presentations the three co authors have virtually never found a course or conference participant who admitted to using this method to decide their best and worst choices 1 Virtually all admitted to using sequential models best then worst or worst then best 15 Early work including that of Louviere himself did use the term maxdiff to refer to BWS but with the recruitment of Marley to the team developing the method correct academic terminology has been disseminated throughout Europe and Asia Pacific if not North America which continues to use the maxdiff term Indeed it is an open question whether major software manufacturers of discrete choice maxdiff routines actually implement maxdiff models in estimating parameters despite this continuing advertising of maxdiff capabilities As a method of data collection edit The second use of BWS is as a method of data collection rather than as a theory of how humans produce a best and a worst item BWS can particularly in the age of web based surveys be used to collect data in a systematic way that 1 forces all respondents to provide best and worst data in the same way by for instance asking best first greying out the chosen option then asking worst 2 Enables collection of a full ranking if repeated BWS questioning is implemented to collect the inner rankings In many contexts BWS for data collection has been regarded merely as a way to obtain such data in order to facilitate data expansion to estimate conditional logit models with far more choice sets or to estimate conventional rank ordered logit models 16 Types cases editThe renaming of the method to make clear that maxdiff scaling is BWS but BWS is not necessarily maxdiff was decided by Louviere in consultation with his two key contributors Flynn and Marley in preparation for the book and was presented in an article by Flynn 17 That paper also took the opportunity to make clear that there are in fact three types cases of BWS Case 1 the object case Case 2 the profile case and Case 3 the multi profile case These three cases differ largely in the complexity of the choice items on offer Case 1 the object case edit Case 1 presents items that may be attitudinal statements policy goals marketing slogans or any type of item that has no attribute and level structure It is primarily used to avoid scale biases known to affect rating Likert scale data 18 19 It is particularly useful when eliciting the degree of importance or agreement that respondents ascribe from a set of statements and when the researcher wishes to ensure that the items compete with each other so that respondents cannot easily rate multiple items as being of the same importance Case 2 the profile case edit Case 2 has predominated in health and the items are the attribute levels describing a single profile of the type familiar to choice modellers Instead of making choices between profiles the respondent must make best and worst most and least choices within a profile Thus for the example of a mobile cell phone the choices would be the most acceptable and least acceptable features of a given phone Case 2 has proved to be powerful in eliciting preferences among vulnerable groups such as the elderly 20 21 older carers 22 and children 23 who find conventional multi profile discrete choice experiments difficult Indeed the first comparison of Case 2 with a DCE in a single model found that whilst the vast majority of older respondents provided usable data from the BWS task only around one half do so for the DCE 20 Case 3 the multi profile case edit Case 3 is perhaps the most familiar to choice modellers being merely an extension of a discrete choice model the number of profiles must be three or more and instead of simply choosing the one the respondent would purchase s he chooses the best and worst profile Designs for studies editCase 1 BWS studies typically use balanced incomplete block designs BIBDs These cause every item to appear the same number of times and also force every item to compete with every other the same number of times These features are attractive since the respondent is prevented from inferring erroneous information about the items what items the designer is really interested in 1 They also ensure that there can be no ties in importance salience at the very top or bottom of the scale Case 2 BWS studies can use Orthogonal Main Effects Plans OMEPs or efficient designs although the former has predominated to date Case 3 BWS studies may use any of the types of design typically used for a DCE with the proviso that the number of profiles alternatives in a choice set must be three or more for the BWS task to make sense Recent history editSteve Cohen introduced BWS to the marketing research world in a paper presented at an ESOMAR Conference in Barcelona in 2002 entitled Renewing market segmentation Some new tools to correct old problems This paper was nominated for Best paper at that conference In 2003 at the ESOMAR Latin America Conference in Punta del Este Uruguay Steve and his co author Dr Leopldo Neira compared BWS results to those obtained by rating scale methods This paper won Best Methodological Paper at that conference Later the same year it was selected as winner of the John and Mary Goodyear Award for Best Paper at all ESOMAR Conferences in 2003 and then it was published as the lead article in Excellence in International Research 2004 published by ESOMAR At the 2003 Sawtooth Software Conference Steve Cohen s paper Maximum Difference Scaling Improved Measures of Importance and Preference for Segmentation was selected as Best Presentation Cohen and Sawtooth Software president Bryan Orme agreed that MaxDiff should be part of the Sawtooth package and it was introduced later that year Later in 2004 Cohen and Orme won the David K Hardin Award from the AMA for their paper which was published in Marketing Research Magazine entitled What s your preference Asking survey respondents about their preferences creates new scaling decisions In parallel to this Emma McIntosh and Jordan Louviere introduced BWS case 2 to the health community at the 2002 Health Economists Study Group conference This prompted the collaboration with Flynn and ultimately the link up with Marley who had begun working with Louviere independently to prove the properties of BWS estimators The popularity of the three cases has largely varied by academic discipline with case 1 proving popular in marketing and food research case 2 largely being adopted in health and case 3 being used across a variety of disciplines that already use DCEs It was partly this lack of understanding in many disciplines that there are actually three cases of BWS that prompted the three main developers to write the textbook The book contains an introductory chapter summarising the history of BWS and the three cases together with why the respondent must think whether s he wishes to use it to understand theory processes of decision making and or merely to collect data in a systematic way Three chapters one for each case follow detailing the intuition and application of each A chapter bringing together Marley s work proving the properties of the key estimators and laying out some open issues then follows After laying out open issues for further analysis nine chapters three per case describing applications from a variety of disciplines then follow citation needed Conducting a study editThe basic steps in conducting all types of BWS study are Conduct proper qualitative or other research to properly identify and describe all items of interest 24 Construct a statistical design that indicates what items are to be presented in each set of items choice set designs may come from publicly available catalogues be constructed by hand or produced from commercially available software Use the design to construct the choice sets which contain the actual relevant items textually or visually Obtain response data where respondents choose the best and worst from each task repeat best worst to obtain second best second worst etc may be conducted if the analyst wishes for more data Input the data into a statistical software program and analyse The software will produce utility functions for each of the features In addition to utility scores you can also request raw counts which will simply sum the total number of times a product was selected as best and worst These utility functions indicate the perceived value of the product on an individual level and how sensitive consumer perceptions and preferences are to changes in product features citation needed Analysis editEstimation of the utility function is performed using any of a variety of methods multinomial discrete choice analysis in particular multinomial logit strictly speaking the conditional logit although the two terms are now used interchangeably The multinomial logit MNL model is often the first stage in analysis and provides a measure of average utility for the attribute levels or objects depending on the Case citation needed In many cases particularly cases 1 and 2 simple observation and plotting of choice frequencies should actually be the first step as it is very useful in identifying preference heterogeneity and respondents using decision rules based on a single attribute Several algorithms could be used in this estimation process including maximum likelihood neural networks and the hierarchical Bayes model The Hierarchical Bayes model is beneficial because it allows for borrowing across the data although since BWS often allows the estimation of individual level models the benefits of Bayesian models are heavily attenuated Response time models have recently been shown to replicate the utility estimates of BWS which represents a major step forward in the validation of stated preferences generally and BWS preferences specifically 25 26 Advantages editBWS questionnaires are relatively easy for most respondents to understand Furthermore humans are much better at judging items at extremes than in discriminating among items of middling importance or preference citation needed And since the responses involve choices of items rather than expressing strength of preference there is no opportunity for scale use bias Respondents find these ratings scales very easy but they do tend to deliver results which indicate that everything is quite important making the data not especially actionable citation needed BWS on the other hand forces respondents to make choices between options while still delivering rankings showing the relative importance of the items being rated It also produces Distributions of the scores calculated as the best frequency minus the worst frequency for all items which allow the researcher to observe the empirical distribution of estimated utilities This produces information on how realistic the results from traditional analysis methods assuming standard continuous distributions are likely to be Consumers tend to form distinct groups with often very different preferences giving rise to multi modal distributions Data that allow investigation of the decision rule functional form of the utility function at various ranking depths most simply the best decision rule vs the worst decision rule Emerging research is suggesting that in some contexts respondents do not use the same rule which calls into question the use of estimation methods such as the rank ordered logit model Estimation of attribute impact a measure of the overall impact of an attribute upon choices that is not available from conventional discrete choice models More data that allow greater insights into choices for a given number of choice sets The same information could be obtained by simply presenting more choice sets but this runs the risk that respondents become bored and disengage with the task Quantifying the phenomena of response shift and adaptation to poor health states 20 Disadvantages editBest worst scaling involves the collection of at least two sets of data at a minimum first best and first worst and in some cases additional ranks second best second worst etc The issue of how to combine these data is pertinent Early work assumed best was simply the inverse of worst that respondents had an internal ranking of all items and just chose the highest lowest ranked item in a given question More recent work has suggested that in some contexts this is not the case a person might for instance choose according to traditional economic theory for best trading across attributes but choose worst using an elimination by attributes strategy choosing as worst the item that is simply unacceptable on one attribute In the presence of such different decision rules it becomes impossible to know how to combine the data at what point does the person when moving down the rankings move from economic trading to elimination by aspects citation needed This presents a clear problem for the data augmentation motivation for BWS but not necessarily for BWS when used as a way to understand process decision making Psychologists in particular would be particularly interested in the different types of decision making Marketers also might wish to know if a given product had an unacceptable feature Work is ongoing to investigate when different decision rules arise and whether how data from such different sources may be combined citation needed BWS also suffers from the same disadvantages of all stated preference techniques It is unknown if the preferences are consistent with choices made in the real world revealed preferences In some instances revealed preferences typically real market decisions are available providing a test of the BWS choices In others quite often health there are no revealed preference data and validation appears impossible More recently attempts have been made to validate SP data using physiological data such as eye tracking and response times 25 Early work suggests that response time models are consistent with results from BWS models in health care but more research is required in other contexts References edit a b c d Best Worst Scaling Cambridge University Press Retrieved 2015 10 01 a b c Marley A A J Louviere J J 2005 12 01 Some probabilistic models of best worst and best worst choices Journal of Mathematical Psychology Special Issue Honoring Jean Claude Falmagne Part 1Special Issue Honoring Jean Claude Falmagne Part 1 49 6 464 480 doi 10 1016 j jmp 2005 05 003 a b Marley A A J Flynn Terry N Louviere J J 2008 10 01 Probabilistic models of set dependent and attribute level best worst choice Journal of Mathematical Psychology 52 5 281 296 doi 10 1016 j jmp 2008 02 002 hdl 10453 8292 a b Marley A A J Pihlens D 2012 02 01 Models of best worst choice and ranking among multiattribute options profiles Journal of Mathematical Psychology 56 1 24 34 doi 10 1016 j jmp 2011 09 001 a b Flynn Terry N Louviere Jordan J Peters Tim J Coast Joanna 2007 01 01 Best worst scaling What it can do for health care research and how to do it Journal of Health Economics 26 1 171 189 doi 10 1016 j jhealeco 2006 04 002 PMID 16707175 a b Louviere Jordan Lings Ian Islam Towhidul Gudergan Siegfried Flynn Terry 2013 09 01 An introduction to the application of case 1 best worst scaling in marketing research PDF International Journal of Research in Marketing 30 3 292 303 doi 10 1016 j ijresmar 2012 10 002 Potoglou Dimitris Burge Peter Flynn Terry Netten Ann Malley Juliette Forder Julien Brazier John E 2011 05 01 Best worst scaling vs discrete choice experiments An empirical comparison using social care data PDF Social Science amp Medicine 72 10 1717 1727 doi 10 1016 j socscimed 2011 03 027 PMID 21530040 S2CID 10594387 Garcia Lapresta Jose Luis Marley A a J Martinez Panero Miguel 2009 09 12 Characterizing best worst voting systems in the scoring context Social Choice and Welfare 34 3 487 496 doi 10 1007 s00355 009 0417 1 ISSN 0176 1714 S2CID 18334695 Scarpa Riccardo Notaro Sandra Louviere Jordan Raffaelli Roberta 2011 06 19 Exploring Scale Effects of Best Worst Rank Ordered Choice Data to Estimate Benefits of Tourism in Alpine Grazing Commons American Journal of Agricultural Economics 93 3 813 828 doi 10 1093 ajae aaq174 ISSN 0002 9092 Huybers Twan 2014 05 19 Student evaluation of teaching the use of best worst scaling Assessment amp Evaluation in Higher Education 39 4 496 513 doi 10 1080 02602938 2013 851782 ISSN 0260 2938 S2CID 144637200 Cohen Eli 2009 03 20 Applying best worst scaling to wine marketingnull International Journal of Wine Business Research 21 1 8 23 doi 10 1108 17511060910948008 ISSN 1751 1062 Ross Melissa Bridges John F P Ng Xinyi Wagner Lauren D Frosch Emily Reeves Gloria dosReis Susan 2014 11 17 A Best Worst Scaling Experiment to Prioritize Caregiver Concerns About ADHD Medication for Children Psychiatric Services 66 2 208 211 doi 10 1176 appi ps 201300525 ISSN 1075 2730 PMC 5294953 PMID 25642618 Mueller Loose Simone Lockshin Larry 2013 03 01 Testing the robustness of best worst scaling for cross national segmentation with different numbers of choice sets Food Quality and Preference Ninth Pangborn Sensory Science Symposium 27 2 230 242 doi 10 1016 j foodqual 2012 02 002 Severin Franziska Schmidtke Jorg Muhlbacher Axel Rogowski Wolf H 2013 11 01 Eliciting preferences for priority setting in genetic testing a pilot study comparing best worst scaling and discrete choice experiments European Journal of Human Genetics 21 11 1202 1208 doi 10 1038 ejhg 2013 36 ISSN 1018 4813 PMC 3798841 PMID 23486538 Flynn Terry N Louviere Jordan J Peters Tim J Coast Joanna 2008 11 18 Estimating preferences for a dermatology consultation using Best Worst Scaling Comparison of various methods of analysis BMC Medical Research Methodology 8 1 76 doi 10 1186 1471 2288 8 76 ISSN 1471 2288 PMC 2600822 PMID 19017376 nbsp Louviere Jordan J Street Deborah Burgess Leonie Wasi Nada Islam Towhidul Marley Anthony A J 2008 01 01 Modeling the choices of individual decision makers by combining efficient choice experiment designs with extra preference information Journal of Choice Modelling 1 1 128 164 doi 10 1016 S1755 5345 13 70025 3 hdl 10453 9977 Flynn Terry N 2010 06 01 Valuing citizen and patient preferences in health recent developments in three types of best worst scaling Expert Review of Pharmacoeconomics amp Outcomes Research 10 3 259 267 doi 10 1586 erp 10 29 ISSN 1473 7167 PMID 20545591 S2CID 39949090 Baumgartner Hans Steenkamp Jan Benedict E M 2001 05 01 Response Styles in Marketing Research A Cross National Investigation Journal of Marketing Research 38 2 143 156 doi 10 1509 jmkr 38 2 143 18840 ISSN 0022 2437 S2CID 11304067 Steenkamp Jan Benedict E M Baumgartner Hans 1998 06 01 Assessing Measurement Invariance in Cross National Consumer Research Journal of Consumer Research 25 1 78 107 doi 10 1086 209528 JSTOR 10 1086 209528 a b c N Flynn Terry J Peters Tim Coast Joanna 2013 03 01 Quantifying response shift or adaptation effects in quality of life by synthesising best worst scaling and discrete choice data Journal of Choice Modelling 6 34 43 doi 10 1016 j jocm 2013 04 004 Coast Joanna Flynn Terry N Natarajan Lucy Sproston Kerry Lewis Jane Louviere Jordan J Peters Tim J 2008 09 01 Valuing the ICECAP capability index for older people Social Science amp Medicine Part Special Issue Ethics and the ethnography of medical research in Africa 67 5 874 882 doi 10 1016 j socscimed 2008 05 015 hdl 10453 9747 PMID 18572295 Al Janabi Hareth Flynn Terry N Coast Joanna 2011 05 01 Estimation of a Preference Based Carer Experience Scale Medical Decision Making 31 3 458 468 doi 10 1177 0272989X10381280 ISSN 0272 989X PMID 20924044 S2CID 30922199 Ratcliffe Professor Julie Flynn Terry Terlich Frances Stevens Katherine Brazier John Sawyer Michael 2012 12 23 Developing Adolescent Specific Health State Values for Economic Evaluation PharmacoEconomics 30 8 713 727 doi 10 2165 11597900 000000000 00000 ISSN 1170 7690 PMID 22788261 S2CID 21778695 Coast Joanna Al Janabi Hareth Sutton Eileen J Horrocks Susan A Vosper A Jane Swancutt Dawn R Flynn Terry N 2012 06 01 Using qualitative methods for attribute development for discrete choice experiments issues and recommendations Health Economics 21 6 730 741 doi 10 1002 hec 1739 ISSN 1099 1050 PMID 21557381 a b Hawkins Guy E Marley A a j Heathcote Andrew Flynn Terry N Louviere Jordan J Brown Scott D 2014 05 01 Integrating Cognitive Process and Descriptive Models of Attitudes and Preferences Cognitive Science 38 4 701 735 doi 10 1111 cogs 12094 hdl 1959 13 1053320 ISSN 1551 6709 PMID 24124986 S2CID 15328149 The best of times and the worst of times are interchangeable APA PsycNET Retrieved 2015 10 01 Retrieved from https en wikipedia org w index php title Best worst scaling amp oldid 1214553614, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.