Attribute | Levels |
---|---|
Duration | 1 day, 3 days, 5 days |
Social Activity | Bowling, Mini-Golf, Walking Tour |
Location | London, Barcelona, Paris |
Topic | Preference elicitation, Economic evaluation, Economic modelling |

Comparison of preference elicitation methods suitable for small sample sizes
Putnam White Paper
Patient preference studies play an increasingly important role in healthcare research, particularly in the development, evaluation, and delivery of interventions for rare diseases and other small-sample contexts. This white paper presents a structured comparison of four stated preference elicitation methods: Discrete Choice Experiment (DCE), Best-Worst Scaling (BWS), Multidimensional Thresholding (MDT) and Online Personal Utility Functions (OPUF). Each method is assessed in terms of its suitability for small-sample applications, considering statistical power, cognitive burden, methodological maturity, design complexity, ease of analysis and ability to explore preference heterogeneity. Drawing on both theoretical frameworks and empirical studies, the paper explores the relative advantages and limitations of each method. It also includes two interactive examples: a DCE and a profile-case BWS survey, which demonstrate the participant experience and provide practical insight into how these methods are implemented. These embedded tools offer a more intuitive and practical understanding of survey design and preference elicitation methods in practice. While DCE and BWS are well-established in health economics research, MDT and OPUF represent promising newer approaches that offer distinct benefits in small-sample contexts. MDT allows for precise, individual-level preference elicitation through structured trade-off tasks, while OPUF offers a simple and efficient compositional method that can be used even with single respondents. The paper concludes that method selection should always be guided by the specific research context, including the characteristics of the target population, resource availability and study goals. Putnam’s longstanding expertise in rare disease strategy and patient preference research positions the team to support clients in selecting and implementing these methods effectively. By tailoring study design to the unique needs of each project, Putnam helps ensure that preference evidence is both robust and relevant in supporting clinical development and access strategy.
1 Introduction
Patient preference studies are increasingly adopted in the development, evaluation, and delivery of healthcare interventions, services and policies. The popularity of these studies has grown in response to the development of methodological guidance from key regulatory and health technology Assessment (HTA) bodies. Initiatives such as the IMI PREFER project (1) endorsed by European Medicines Agency, as well as guidance from the Professional Society for Health Economics and Outcomes Research (ISPOR) and the Food and Drug Administration (FDA), have highlighted the value of using high-quality preference studies in benefit-risk assessment and treatment development (2–4).
Preference research in healthcare consists of the systematic elicitation and quantification of individuals’ priorities and trade-offs when choosing among different healthcare options (5). Patient preference research is a subset of this field which is focused on understanding patient’s values and priorities regarding their healthcare. Patient preferences are commonly elicited through revealed or stated-preference methods. Stated preference (SP) methods explore patient preferences using responses to surveys. This article focuses on SP methods which are particularly valuable methods during the early stages of development of healthcare interventions, as they can be used to assess preferences for healthcare interventions that are not yet available.
Proxy respondents (e.g., caregivers, family members, or healthcare professionals) are commonly used to obtain valuable insights into the patient's preferences in studies targeting population who are scarce (e.g., rare diseases) or difficult to reach (e.g., children who cannot communicate their preferences directly or adults with cognitive impairments or severe illnesses (6–8)). Beyond their use as proxies, the assessment of other stakeholders’ preferences can improve health decision-making, because physicians' and caregivers’ preferences often provide treatment recommendations that may influence the adoption of new medical technologies. Preference research is also increasingly adopted to support benefit-risk assessments, inform shared decision-making, and guide reimbursement and coverage decisions in both clinical and policy contexts (9). These studies also support the movement toward patient centred care (10) and personalised medicine (11) by integrating individual needs and preferences in healthcare systems and by increasing the understanding of what patients are willing to accept or forego in exchange for improvements in outcomes (12–14).
Discrete choice experiments (DCEs) remain one of the most commonly used stated-preference methods in health economics (15–17). Despite methodological advances, recruitment challenges often limit the feasibility of standard approaches like DCEs (18,19). To enable broader application of preference studies in small-sample settings (e.g., rare diseases and end-of-life care research), researchers must identify or develop methods that yield valid, reliable, and interpretable results without relying on large respondent pools. Several alternative approaches are gaining traction due to their efficiency, cognitive simplicity, and suitability for specific populations (17,20). A recent review (21) identified 23 quantitative stated preference elicitation methods and identified 10 methods which were considered “promising for preference elicitation” (9). In this white paper, we selected four methods that have been used with varying effectiveness on small samples: discrete choice experiment (DCE), best-worst scaling (BWS), multidimensional thresholding (MDT), and online personal utility functions (OPUF). This research contributes to the literature by offering a structured comparison of four preference elicitation methods with a specific focus on their suitability for small-sample health research.
Previous applied research has evaluated one or two methods, however, empirical evaluations of these methods in low-sample contexts remain sparse and underreported. Within this body of literature, van Dijk et al. (2016) (22) compared DCE and BWS in a surgical decision context and observed that both methods yielded similar attribute rankings but differed in inferred risk tolerance. Whichello et al. (2023) (23) compared DCE with swing weighting (SW) and found discrepancies in the magnitude of attribute importance. Heidenreich et al. (2024) (24), on the other hand, conducted a head-to-head comparison of DCE and MDT and reported that the two methods produced statistically indistinguishable preference weights and similar treatment recommendations in 82.3% of cases. Meanwhile, Schneider et al. (2022) demonstrated that the OPUF approach can generate reliable individual-level utility functions in small samples and predicted DCE-based choices with moderate-to-high accuracy (25). Unlike these studies, we provide a conceptual and methodological synthesis of all four methodological approaches, with the aim of informing future study design in rare disease, end-of-life, and other low-sample contexts by highlighting the strengths and limitations of each method with respect to statistical power, cognitive burden, design complexity, and feasibility of analysis.
The remainder of the white paper is organised as follows. Section 2 introduces the theoretical underpinnings and elicitation structure of each method. Example surveys for both DCE and BWS are presented in Section 2.1.1 and Section 2.2.1 respectively. Afterwards, Section 3 critically presents their relative advantages and limitations based on methodological principles and empirical applications, with an emphasis on small sample contexts. We conclude with Section 4 and provide recommendations for method selection in small-sample settings and outline future directions for research and guideline development. Finally, Section 6 outlines Putnam’s expertise in rare disease contexts and highlights the value in partnering with Putnam.
2 Methods
2.1 Discrete Choice Experiment
DCEs are widely used methods in health economics used to estimate a range of outcomes, including willingness to pay (WTP), predicted uptake rates of interventions, trade-offs, and overall utility scores for health states (26). Results from a systematic literature review of patient preference studies showed DCEs were one of the most used methods, and reported that its use increased from 38% in 2002–2006 to 58% in 2012–2016 (21).
In a DCE respondents select their preferred option from alternative goods characterised by varying attributes and levels. In the health research context, "goods" typically refer to healthcare interventions or treatment options, and the levels are the characteristics used to describe them. The theoretical foundation of DCEs are based on Random Utility Theory (RUM) and Lancaster’s theory of value (27). RUM assumes that individuals choose the option that maximises their utility, whereas Lancaster’s theory considers healthcare interventions or treatment options as bundles of attributes that yield utility. See further information about DCEs in Hensher et al. (2015) (28) and Train (2009) (29).
Section 2.1.1 presents an interactive example DCE which asks individuals to select which conference they would prefer to attend from two hypothetical options. Each of the options is described by the attributes and levels presented in Table 1.
2.1.1 Example: Conference DCE
2.2 Best Worst Scaling
BWS offers an extension to the traditional DCE, as two choices (i.e., a best and a worst) are obtained for each choice task rather than one. Thus, BWS obtains additional preference information than DCEs with minimal additional cognitive burden (30). Similarly to DCE, the theoretical underpinnings for the method are based on random utility theory as formalised by McFadden (1974) (27).
BWS can be broadly subset into three different variants: the object case (case 1), the profile case (case 2) and the multi-profile case (case 3) (31–33). The object case takes a list of items (objects) and constructs a number of different subsets containing objects from the list according to an experimental design. The subsets generated are then presented to respondents as a set of choice tasks who are asked to select the best and worst objects from the subset. The profile case presents each choice task as a set of three or more attributes which each contain at least two levels. Therefore, each choice task can be referred to as a profile which consists of a particular combination of attribute-levels. Profiles are presented according to an experimental design, which typically is either an orthogonal main effects plan, random sample from the full factorial or full factorial design (31). Respondents are asked to select the best and worst attribute-level in the profile for each choice task. Finally, in the multi profile case, respondents are asked to select the best and worst profile (combination of attribute-levels) in a choice task that is composed of three or more profiles. The multi profile case BWS is akin to the DCE in that the alternatives correspond to full profiles, however, the key difference is that BWS requires the selection of a best and worst profile and thus requires a minimum 3 profiles.
When analysing BWS data, one can either adopt a counting approach or a modelling approach. In a counting based approach, summative descriptions of best, worst and best-worst counts are presented. In comparison, in a modelling-based approach, a particular model of decision making must be specified in order to estimate a choice model. In specifying a modelling approach, some underlying assumptions must be made about how choices are made by the population of interest. Each model type assumes that individuals derive utility from attribute-levels, though the assumed decision-making process differs between the model types (31). The three main modelling approaches are the paired, marginal and sequential models (30,31). Further details surrounding modelling approaches are available elsewhere (28,30,33,34). After adoption of a modelling approach, choice models are applied to the BWS data to determine the marginal contributions of attributes and attribute levels in determining choices. Frequently adopted choice models are similar to those used in DCE more broadly and include multinomial logit models, mixed logit models and latent class models.
Section 2.2.1 presents an interactive example profile case BWS which asks individuals to select the most important and least important attribute level from a hypothetical conference option. The attributes and levels presented are equivalent to those presented in the prior example DCE in Table 1.
2.2.1 Example: Conference BWS
2.3 Multidimensional Thresholding
MDT, also referred to as adaptive swing weighting or choice-based matching, is a recently developed methodological approach for eliciting health-related preferences across multiple attributes at the individual level (35). MDT is an extension of one-dimensional “threshold” techniques (i.e., classical thresholding) which differs in the structure and scope of the trade-off information elicited.
Classical thresholding elicits individual trade-offs by varying a single attribute level across repeated binary choices to identify a participant's point of indifference (36). This method has been criticised for providing an incomplete picture of multi-attribute preferences. MDT, on the other hand, integrates multi-attribute trade-off assessments and uses a structured two-step elicitation format (35). In an MDT, respondents first rank a set of attributes (commonly considering the largest plausible improvement in each attribute) in order of importance. In the second step, the respondent completes a series of systematically designed threshold questions that involve trade-offs among multiple attributes simultaneously. This process allows a feasible preference weight space for each individual. A specialised algorithm (i.e., hit-and-run Monte Carlo sampling) is then applied to estimate the individual’s utility weight vector, typically by identifying the centroid of the feasible region. Using this approach, MDT yields an individualised multi-attribute utility function for each participant and characterises the trade-offs that the individual is willing to accept among the attributes of interest (37,38).
2.4 Online Personal Utility Functions
Preference elicitation methods typically fall into two categories: compositional and decompositional (39–41). Methods such as DCE and BWS elicit preference orderings from individuals for an entire choice set (composed of a combination of attributes and levels) and then responses are decomposed to identify marginal contributions of each attribute and level in each health state. Conversely, compositional methods seek to identify preferences for each attribute weighting and level rating individually for the number of attributes and levels. Therefore, statistical models to elicit coefficients for each individual attribute and level are not required and responses to each attribute weighting and level rating are combined to obtain marginal contributions.
Personal utility functions (PUF) were first used in the context of preference elicitation by Devlin et al. (2019) (42) to estimate the feasibility for using this approach to estimate a value set for the EQ-5D-5L. Since the feasibility for the underlying PUF methods were established, the approach has been expanded by Schneider et al. (2022) (25) and converted into an OPUF survey built initially using RShiny and subsequently using JavaScript (available at https://eq5d5l.me). Since the development of the OPUF, a number of different researchers have begun using this method to elicit value sets (43,44). While the OPUF methodology is primarily designed for health state valuation, this approach could be adapted for use in patient preference contexts.
Structurally, the OPUF can be broken down into three components which enable the estimation of personal utility functions: attribute weighting, level rating and anchoring. The anchoring component is only required if the researcher wishes to anchor the latent coefficients onto the 0-1 QALY scale and as such will not be discussed in this white paper. Further details on the anchoring task are available elsewhere (25,45). The attribute weighting component involves a ranking and weighting task in which participants first rank their most important attribute and then determine the relative importance of the remaining attributes using their most important attribute as a reference point. The level rating component asks participants to rank the levels within each attribute on a visual analogue scale where the worst level and the best level are fixed at 0 and 100 respectively. Further methodological details on the OPUF are available elsewhere (25).
3 Discussion
3.1 Discrete Choice Experiment
DCEs typically require on the order of n > 100 respondents to estimate preferences with acceptable precision and therefore are challenging to implement when sample sizes are limited (18). In rare disease contexts, recruiting such large samples is often challenging, resulting in the development of potentially underpowered DCE studies (18,46).
Statistical power in DCEs increases with sample size, as increasing respondent numbers reduce the width of confidence intervals and make estimates more precise. Some strategies have been suggested to improve the robustness of the results that could be adopted when conducting DCEs with small samples. For instance, researchers have suggested optimising the DCE design by reducing the number of attributes and/or attribute levels included in the choice tasks, increasing the number of choice tasks per respondent, increasing the number of alternatives in each task, or by conducting adaptive DCEs (47). Bayesian D-efficient designs have also been suggested as particularly valuable in small-sample contexts as their algorithms can maximise the information yield per choice task (48). These experimental designs leverage prior knowledge of likely parameter estimates and enhance model stability by incorporating uncertainty and improving the precision of estimates (48). For instance, the research by Soekhai et al. (2023) (49,50) used adaptive designs that use pilot data to update priors in a DCE study targeting a hard-to-reach patient population. More recently, adaptive DCEs have been used effectively to elicit robust preferences in smaller samples (51,52). Kaizen tasks are one such approach which maximise the amount of information collected per choice task by eliciting an entire ‘preference path’ (51). This approach simultaneously maximises the amount of preference information collected and reduces the respondent burden (51). Adaptive methods offer a promising avenue for methodological innovation within the DCE field.
DCEs provide valuable preference information to guide early-stage healthcare innovation in rare disease research when real-world data is scarce (19,46). This method can incorporate hypothetical attributes and therefore allow researchers to explore preferences for novel treatments. Nonetheless, when developing DCE applications for rare diseases it is crucial for researchers to conduct thorough cognitive testing and ensure clear framing of hypothetical attributes used to describe new treatments. This will enhance respondents’ understanding and consequently the validity of findings, as respondents may find it challenging to understand and value unfamiliar goods (53). Although the use of hypothetical goods in DCEs has been criticised for introducing hypothetical bias, recent research suggests that the predictive accuracy of DCEs has improved over time. A meta-analysis by Zhang et al. (2025) (54) evaluated the predictive validity of DCEs in health-related research and found a moderate to high predictive accuracy, suggesting that DCEs are effective tools for capturing preferences and forecasting decision-making behaviours in healthcare settings.
Finally, DCEs are advantageous when compared with other methods, because they can effectively capture social influences on health decisions by incorporating social factors like peer or physician recommendations into decision-making. For instance, a study by de Bekker-Grob et al. (2018) (55) on maternal vaccination choices found that social influencers had a stronger impact on decisions than vaccine attributes themselves. Since including multiple social-context variables can lead to statistical challenges and potential biases in studies with small samples, researchers could focus on the most salient social attribute and report the results of a sensitivity analysis by comparing scenarios with and without social input. Studies including qualitative research to improve the DCE design, could also use interviews to explore the effect of social networks on preference and aim to identify the most relevant social attribute to test in the modelling stage.
3.2 Best Worst Scaling
When compared with DCE, BWS is considered to be more statistically efficient because double the amount of information is collected per choice task (56). Their efficiency enables the use of BWS in smaller samples than would be required for a traditional DCE. Additionally, the profile case in particular has demonstrated promising qualitative results in paediatric populations (57,58). BWS methods are being increasingly used in rare disease contexts, with researchers effectively eliciting preferences in smaller samples. Soekhai et al. (2023) (49) conducted a profile case BWS study in a sample of 140 patients with neuromuscular disorders (NMD). By applying latent class models, the authors were able to identify distinct patterns of preference heterogeneity within the sample, demonstrating the utility of BWS for exploring preference heterogeneity by segmenting patient populations based on their treatment priorities.
As noted previously, DCEs typically require relatively large sample sizes (often greater than 100 participants) to ensure sufficient statistical power and model stability. In contrast, BWS designs, particularly profile and multi-profile case types, are often better suited to studies with smaller sample sizes. This is due in part to the efficiency of the BWS approach, which gathers richer information per task by asking respondents to identify both the most and least preferred attribute levels within a given profile or scenario. For example, Muhlbacher et al. (2021) (59) successfully implemented a multi-profile case BWS study in the context of haemophilia A, a rare disease with a naturally constrained patient population. Despite a modest sample of 57 participants, the study was able to effectively capture nuanced patient preferences for different treatment attributes, underscoring the suitability of BWS methods in rare disease research, where recruiting large samples is often infeasible. Together, these examples highlight the flexibility and efficiency of BWS as a preference elicitation method, particularly in contexts where the sample size requirements of traditional DCEs pose practical challenges.
While BWS offers several methodological advantages that make it particularly well suited to patient preference research, there are important considerations that must be addressed when using BWS. A key assumption is that respondents apply equivalent cognitive processes when selecting their most and least preferred options. This assumption may not always hold true, as identifying the least preferred alternative can involve distinct psychological mechanisms, such as loss aversion or negativity bias, rather than simply being the inverse of choosing the most preferred (56). Within the different types of BWS, there are merits and demerits of each approach. One of the primary merits of profile-case BWS is its lower cognitive burden in comparison to DCEs and multi-profile case BWS (49). Respondents are only required to identify the most and least preferred attribute levels within a single profile, the tasks are generally simpler and less mentally taxing than the pairwise comparisons demanded in DCEs. Furthermore, while profile-case BWS imposes relatively low cognitive demands, multi-profile case BWS is suggested to be significantly more complex, requiring participants to evaluate and compare several profiles simultaneously. A study by Xie et al. (2014) (60) compared DCE and the multi-profile case BWS in eliciting preferences for the EQ-5D-5L. The study found that DCE tasks were considered relatively easier and took a shorter time to complete. However, the study also reported that for the same prediction accuracy and precision, multi-profile case BWS had the advantage of reducing respondent burden. All these factors should be carefully considered during study design, particularly in populations that may be more susceptible to cognitive fatigue or misinterpretation. It is important to note that while multi-profile case BWS imposes a higher cognitive burden than profile-case BWS, this is balanced by its enhanced statistical power relative to profile-case BWS.
Despite the structural and format differences between profile-case BWS and multi-profile case BWS empirical research has found that preferences obtained from both methods are similar in relative terms, with scaled up utility coefficients for non-monetary attributes in the profile-case (61,62). Authors associate this difference to the greater complexity of multi-profile case BWS tasks, which lead respondents to answer with less certainty and this is reflected in larger variance in the stochastic component of the model. Moreover, research suggests that attribute non-attendance may play a significant role, as attributes are more likely to be ignored in the multi-profile case, resulting in coefficients with larger magnitude relative to the profile-case. However, a study by Yoo and Doiron (2013) (63) reported that including a monetary attribute alters the comparability of the estimated preferences obtained from both methods significantly. Their findings indicated that respondents tend to value monetary gains more and non-monetary gains less when completing the multi-profile case BWS. This is information should also be taken into account when choosing the appropriate BWS type to use, as the method selection should align with the study objectives (e.g., whether the goal is to estimate WTP estimates or not), as well as consider other aspects of the study design and target population.
3.3 Multidimensional Thresholding
One of the key advantages of MDT’s is its suitability for studies with small sample sizes. Unlike DCEs, which require larger samples to infer population-level preferences, MDT estimates individual utility weights independently and can yield reliable individual and aggregate estimates even with smaller samples (35). For instance, a study employing Dirichlet regression to aggregate MDT data demonstrated that population-level preferences can be obtained from samples as small as 100 respondents (64). Thus, MDT is a valuable tool for research in rare diseases (or other small sample contexts) since it allows accurate individual-level preference elicitation with limited sample data (35).
MDT differs from DCE in how trade-offs are elicited. DCEs present full treatment profiles and capture trade-offs among all attributes simultaneously, whereas MDTs collects trade-off information sequentially through a series of pairwise comparisons (36). Conducting MDT may require more extensive setup than DCEs, as the survey instrument requires careful design of each step, to ensure clarity and effectiveness. The analytical implementation of MDT also demands greater technical expertise, as it relies on specialised algorithms. Nonetheless, a well-designed MDT is able to reduce cognitive burden compared to DCEs, particularly in settings where a full-profile task is likely to overwhelm participants, such as those involving unfamiliar or complex attributes (e.g., novel treatment attributes, risk attributes). A recent study by Heidenreich et al. (2024) (24) compared the results and policy advise that resulted from MDT and DCEs. The study reported that both methods yield similar insights into the relative importance of attributes, as they produced similar preference weight distributions and yielded identical treatment recommendations in over 80% of cases. The study findings showed minor differences in individual-level responses which suggests slight variation in preference heterogeneity, however, these differences were minimal and not statistically significant.
The present methodological limitations of MDT present opportunities for further refinement. For instance, the initial attribute ranking step adds complexity to the method and may be simplified or omitted in future designs. Moreover, the method assumes a linear additive utility model with continuous attributes, however, this may be unsuitable for categorical or non-linear preferences (35).
As MDT gains popularity in health preference research, recent applications have demonstrated its utility in both patient and clinician preference studies. For instance, (64) used MDT to assess patient preferences for first-line treatments of locally advanced or metastatic urothelial carcinoma. Similarly, Heidenreich et al. (2024) (24) used MDT in a clinician preference study on aneurysmal subarachnoid haemorrhage. Beyond these examples of applied research, interest in MDT is growing in areas such as rare disease research where its ability to generate robust preference data from limited samples is particularly valuable.
3.4 Online Personal Utility Functions
The OPUF’s key merit is its ability to elicit personal utility functions on an individual basis. This advantage is shared with MDT and closely aligns with personalised medicine approaches. Individual preference elicitation dictates that the theoretical minimum sample size for a patient preference study is n=1 (25). This minimum sample size requirement sets this method apart from decompositional approaches like DCE and BWS which require larger samples to obtain precise point estimates (18). For rare disease and hard-to-reach populations, the OPUF provides an innovative and efficient approach to eliciting preferences where conventional methods may not be appropriate.
As a result of individual level preference elicitation, this OPUF approach enables an in depth, more granular exploration of preference heterogeneity. This allows for multiple subgroup analyses even in scenarios where there are small numbers of participants in each subgroup. The OPUF is undoubtedly better suited to conduct subgroup analyses and examine preference heterogeneity than decompositional methods like DCE and BWS which often require the experimental design to be powered to examine interaction effects (48).
Another advantage of the OPUF approach is the relatively straightforward computational approach to survey design and analysis of the data. Experimental designs and complex choice models that are required in DCE and BWS are not necessary in analysis of OPUF data and analysis can, therefore, be conducted very quickly and efficiently. A key strength of the OPUF is this relatively low mathematical and computational complexity which allows for more straightforward dissemination with stakeholders.
Despite these strengths, the OPUF approach also presents several limitations. As a relatively new method, it lacks a substantial theoretical foundation and established precedent in the literature in comparison to other more established preference elicitation methods. There is also some debate over whether OPUF can truly be considered preference-based, as it does not require participants to make explicit trade-offs between attributes. The absence of trade-offs has led to criticism from leading researchers (65) who raised concerns about this aspect of the method.
Additionally, emerging recent evidence has questioned the test-retest reliability of the OPUF approach (66). Recent qualitative research has highlighted participant difficulties in understanding several of the survey tasks, with findings indicating substantial challenges in comprehension across different groups (67). These issues raise concerns about the cognitive burden of the method, particularly in populations such as children or individuals with limited cognitive capacity (66,67). Further research is needed to assess the validity of the OPUF method in paediatric or limited cognitive capacity populations.
4 Recommendations for Researchers
This section provides a narrative overview to support future researchers in selecting an appropriate preference elicitation method for studies involving small sample sizes. The choice of method should be informed by the study design, characteristics of the target population, cognitive demands, analytical requirements, and the extent to which preference heterogeneity needs to be explored. Table 2 summarises the key findings of this white paper and serves as a practical decision aid, illustrating how each of the four methods addresses the specific challenges commonly encountered in small-sample preference elicitation.
Characteristic | DCE | BWS | MDT | OPUF |
---|---|---|---|---|
Small samples | ★ | ★★ | ★★★ | ★★★ |
Low cognitive burden | ★★ | ★★★ | ★★★ | ★★ |
Method acceptability/validity | ★★★ | ★★ | ★ | ★ |
Low computational/analytical burden | ★ | ★ | ★★ | ★★★ |
Preference heterogeneity analysis | ★★ | ★★ | ★★★ | ★★★ |
Large number of attributes | ★ | ★★ | ★★★ | ★★ |
Note: The number of stars (maximum ★★★) indicates the method's capacity to be used in studies with the referred characteristics (e.g., studies with low samples), or the characteristics associated with the method (e.g., low cognitive burden). | ||||
Abbreviations: DCE, Discrete Choice Experiment; BWS, Best-Worst Scaling; MDT, Multidimensional Thresholding; OPUF, Online Personal Utility Functions. |
4.1 Sample size considerations
For studies involving fewer than 50 participants, MDT and OPUF offer promising options due to their ability to estimate individual-level preferences with a relatively high degree of statistical precision. These methods are particularly well-suited to settings where recruiting large samples is not feasible, such as in rare disease contexts. For studies with sample sizes exceeding 50, more established methods such as DCE and BWS become increasingly appropriate, with BWS being the better choice for studies with less than 100 participants. Multi-profile case BWS, in particular, has shown promising statistical precision within this range (59). DCEs can also be tailored for use in smaller samples through modifications such as reducing the number of attributes or levels, increasing the number of choice tasks per respondent, or including a third alternative to enhance design efficiency. Additionally, adaptive DCEs, such as Kaizen tasks (51), offer innovative modifications to the conventional DCE to optimise their statistical performance in small samples.
4.2 Validity and methodological maturity
In terms of methodological maturity, DCE is the most extensively validated and widely used approach among the four discussed in this study, with a strong presence in the health economics literature (15–17). BWS, as an extension of DCE, also enjoys growing empirical support but with some concern around whether the decision heuristics for best and worst are equivalent (56). By contrast, MDT and OPUF remain relatively novel and currently lack the same level of validation and precedent in the academic literature.
4.3 Preference heterogeneity and subgroup analyses
Where the exploration of preference heterogeneity or subgroup differences is a key study objective, MDT and OPUF provide distinct advantages compared to RUM-based methods which require advanced modelling to assess preference heterogeneity and are reliant on strict model assumptions about utility distributions. Both methods are designed to capture individual-level variation in preferences, which enables detailed subgroup analyses even within small samples. This is especially valuable in studies seeking to identify nuanced differences across patient subgroups or demographic cohorts.
4.4 Computational complexity and software requirements
The availability of specialised software plays a critical role in the adoption and implementation of preference methods in health research. Significant effort has been dedicated to developing tools that support the development of DCEs, including both commercial (e.g., Sawtooth, Ngene) and open-source R packages to develop DCEs. The available software supports different phases from the experimental design, survey development, data analysis, and modelling. While DCEs are well-supported by software, there is still limited availability of tools to conduct alternative preference methods.
From a computational standpoint, OPUF offers a clear advantage compared with other methods due to its simplicity. It relies on a straightforward multiplicative scoring approach that does not require the estimation of statistical models, thereby enabling rapid, real-time analysis. This makes it particularly advantageous in time-sensitive or resource-constrained settings. In comparison, DCE and BWS generally require more advanced econometric modelling and specialised software, which may pose challenges for research teams with limited technical capacity.
4.5 Cognitive burden
Cognitive burden is a critical factor, particularly when engaging with vulnerable or low-literacy populations. While the cognitive demands of MDT and OPUF are still being assessed, early qualitative research indicates that OPUF may present comprehension challenges among both adolescent and adult populations. Likewise, DCE tasks, particularly those involving large numbers of attributes, may be cognitively demanding for younger participants. In contrast, BWS has been shown to be more intuitive and less burdensome, potentially making it more suitable for populations with limited decision-making capacity.
5 Conclusion
There is no one-size-fits-all preference elicitation method in small sample size contexts. Each approach discussed in this study (i.e., DCE, BWS, MDT, OPUF) offers unique advantages and limitations depending on the research context. Method selection should be guided by study design, sample size, participant characteristics, cognitive burden, and the need for exploration of preference heterogeneity. While DCE and BWS are well-established with extensive precedent in the literature, newer methods such as MDT, OPUF and Kaizen tasks show promise, particularly in eliciting precise preferences in small samples. However, their use is still emerging, and further research is needed to strengthen the validity and robustness of these innovative approaches. Ultimately, researchers should adopt a context-specific approach, selecting the method that best fits their specific objectives and constraints, rather than defaulting to a single preferred technique. The broader application of emerging preference elicitation methods in health research relies on both methodological advancement and the development of software to support their use across all stages of study design and analysis. Without this, the practical implementation, testing, and refinement of such methods will be hindered. Therefore, continued investment in both areas is essential to support their rigorous application, particularly in studies with constrained resources or small sample sizes.
6 Putnam’s Expertise in Rare Disease
Putnam has been supporting rare disease strategy since the company’s founding, with over 35 years of exclusive focus on the life sciences. This deep history reflects our consistent commitment to addressing highly complex and often underserved medical needs.
We bring significant experience not only in traditional rare conditions, but also in exploring rare manifestations, subtypes, and genetic profiles of more common diseases. Our rare disease engagements are designed to support the development and commercialisation of therapies that target high unmet need with urgency and precision.
6.1 Focused on Accelerating Therapy Development
Our rare disease practice is built around a shared goal with our clients: helping manufacturers develop therapies as efficiently and strategically as possible. We do this through:
- Early-stage planning and landscape assessment
- Patient journey and unmet need analysis
- Clinical development strategy
- Pricing, market access, and evidence planning
These efforts are grounded in both rigorous analysis and a deep understanding of evolving regulatory and market access landscapes.
Putnam serves as a strategic thought partner at all stages of the development cycle:
- Before engagements – shaping the opportunity and clarifying strategic options
- During engagements – executing detailed, evidence-based strategy and analytics
- After engagements – refining implementation and supporting long-term value realisation
6.2 Preference Elicitation Methods in Rare Disease
Understanding patient and stakeholder preferences is critical in rare disease contexts, where evidence may be limited and treatment trade-offs are complex. We apply a range of quantitative preference elicitation techniques suited to small, often heterogeneous populations:
Discrete Choice Experiments (DCE): A widely used method to quantify preferences by asking respondents to choose between hypothetical treatment scenarios. DCEs are adaptable to small samples with careful design and are particularly useful for informing value frameworks and regulatory submissions.
Best-Worst Scaling (BWS): This method offers an efficient way to elicit preferences by asking participants to select the most and least preferred options. BWS can reduce cognitive burden and is more efficient than DCE in small samples.
Online Personal Utility Functions (OPUF): An innovative approach that elicits individual-level preferences to facilitate preference elicitation in very small samples. It captures personal trade-offs and enables efficient subgroup analyses, making it ideal for rare conditions with geographically dispersed patients.
Multidimensional Thresholding (MDT): This technique systematically explores respondents’ thresholds for trade-offs between multiple attributes. MDT is especially useful when treatment decisions involve balancing competing risks and benefits, and can reveal nuanced patterns in individual preferences.
These methods are selected and adapted based on clinical context, population characteristics, and decision-making needs. We advise on study design, method selection, survey programming, data analysis, and interpretation to ensure findings are robust and actionable.
6.3 Broad and Deep Rare Disease Expertise
Our expertise spans a wide range of therapeutic areas and mechanisms, including but not limited to:
- Neuromuscular disorders
- Haematologic and oncologic rare diseases
- Rare metabolic and genetic conditions
- Autoimmune and inflammatory disorders
We tailor each engagement to the specific clinical, scientific, and commercial needs of the product and patient population.
6.4 Collaborate With Putnam
If your organisation is seeking a partner to support its rare disease strategy, we welcome the opportunity to connect.
To learn more or initiate a discussion, please reach out to our team at Putnam.