1. Introduction
Over the last decade, the behavioural design (BD) domain has actively promoted health-conscious and preventive behaviours among individuals and communities. The domain synthesises theories from cognitive psychology, behavioural science and design to develop artefacts that ethically influence desired behaviours in areas such as sexual health and contraception, smoking cessation and healthy lifestyle practices (Reference Bay Brix Nielsen, Daalhuizen and CashBay Brix Nielsen et al., 2021; Reference Cowdell and DysonCowdell & Dyson, 2019; Reference Khadilkar and CashKhadilkar & Cash, 2020). However, beyond preventive health, a critical yet underexplored facet of health and well-being for behavioural designers is clinical diagnosis and its underlying complexity. A crucial component of healthcare, diagnosis helps estimate the systems and pathologies associated with a patient’s presentation, further guiding relevant treatment and recovery strategies (Reference Bornstein and EmlerBornstein & Emler, 2001). Diagnostic errors are often consequential, leading to significant patient morbidity and mortality (Reference CroskerryCroskerry, 2009b, Reference Croskerry2009a). They also contribute to increased medico-legal cases and burnout in the medical fraternity. If we consider the US healthcare system alone, Reference Newman-Toker, Schaffer, Yu-Moe, Nassery, Saber Tehrani, Clemens, Wang, Zhu, Fanai and SiegalNewman-Toker et al. (2019) estimated that 40,000 to 4 million individuals may be exposed to misdiagnosis-related harm. Diagnostic errors are a major threat to the safety and well-being of a population and a critical area for BD to intervene. This is because errors are essentially a result of, or are, “wrong or undesirable behaviours” (Reference Dey, Dabral and KhadilkarDey et al., 2023) which is either flawed plan formulation (problem-solving behaviour) or incorrect execution of well-formed plans (monitoring behaviour) (Reference ReasonReason, 1990).
Multiple reports have identified diagnostic complexity and its imposed cognitive load as critical antecedents to diagnostic errors (Reference Croskerry and ClancyCroskerry & Clancy, 2025; D. Reference Newman-Toker, Peterson, Badihian, Hassoon, Nassery, Parizadeh, WIlson, Jia, Omron, Tarmarajah, Guerin, Bastani, Fracica, Kotwal and RobinsonNewman-Toker et al., 2023). Clinicians often diagnose under intense emotional and physiological states in stressful and ambiguous contexts. These situations are more prominent in high-stakes settings like emergency medicine (EM), where effective patient disposition under extreme time pressure is a critical demand (Reference Calder, Arnason, Vaillancourt, Perry, Stiell and ForsterCalder et al., 2015; Reference WearsWears, 2009). Such contextual factors and a patient’s inherent conditions make the diagnostic process complex and error prone. A disease spectrum which is highly vulnerable to such errors is infectious diseases (ID) (Reference Newman-Toker, Schaffer, Yu-Moe, Nassery, Saber Tehrani, Clemens, Wang, Zhu, Fanai and SiegalD. E. Newman-Toker et al., 2019, Reference Newman-Toker, Wang, Zhu, Nassery, Saber Tehrani, Schaffer, Yu-Moe, Clemens, Fanai and Siegal2021). The vulnerability arises from the spectrum’s underlying unpredictability, common syndromic presentations across multiple infections and the potential for global repercussions (Reference Roosan, Weir, Samore, Jones, Rahman, Stoddard and Del FiolRoosan et al., 2017). Identifying and mapping the causes of diagnostic complexity within a clinical setting is necessary during the problem identification and framing stages of a design process. Furthermore, the understanding can inform the development of evaluation criteria to test existing and proposed solutions. These insights are crucial for designing behaviour-centred and contextually relevant interventions like information management software, workflows, patient report/form designs, medical device designs, diagnostic aid designs, etc.
Despite the opportunities, current methods utilised for estimating diagnostic complexity in healthcare are limited in scope. As a result, they can’t effectively elucidate the factors contributing to the diagnostic complexity of disease cases (Reference Islam, Weir and FiolIslam et al., 2016; Reference Roosan, Weir, Samore, Jones, Rahman, Stoddard and Del FiolRoosan et al., 2017). Among the few studies addressing diagnostic complexity, Roosan et al’s (2017) operationalised a task complexity estimation framework by Reference Liu and LiLiu & Li (2012) to identify the complexity contributory factors (CCFs) associated with the handling of infectious disease cases. However, they only considered interactions within the clinician’s team as a data source. To holistically identify the factors contributing to diagnostic complexity, it is essential to observe the diagnostic interactions between clinicians, teams, patients and caregivers across the diagnostic continuum. Another major limitation of the past research was the discrepancy between the perceived difficulty rated by the clinicians and the observed objective complexity determined by the authors.
Addressing these gaps, the paper identifies factors contributing to the diagnostic complexity of 10 ID cases in an EM setting. In doing so, we demonstrate the operationalisation of a complexity estimation tool for diagnostic tasks in the concerned setting. Although this study specifically focuses on ID and EM, it aims to provide a generalised foundation for utilising the tool across various clinical settings and for speciality cases. This can help define the behavioural and human factors issues associated with the diagnostic workflow of a concerned clinical setting and generate actionable insights towards improving diagnostic accuracy. The work is a subset of a collaborative study between the Department of Design, Indian Institute of Technology Delhi and the Department of Emergency Medicine, All Indian Institute of Medical Sciences (AIIMS), New Delhi. Appropriate ethical approval was taken from the Institute Ethics Committee of both institutes to conduct the study.
2. Study site
2.1. Site
The study was conducted in the Emergency Medicine 1 (EM 1) section of the Department of Emergency Medicine, AIIMS New Delhi. Although patients with infectious diseases entering EM 1 can be moved to EM 2 (Gynaecological and Surgical Section) and EM3 (Observation Section), the scope of our study was limited to the diagnostic interactions within the EM 1 facility to maintain consistency in data.
2.2. Patient flow
Following the AIIMS Triage Protocol (Reference Sahu, Bhoi, Galwankar, Aggarwal, Murmu, Nayer, Sinha, Mishra, Ekka and KumarSahu et al., 2020), incoming patients are brought to a Triage area (A and B). Here, the patient’s presentation, history, and earlier reports are evaluated by attending clinicians while their current vitals are recorded. This is followed by triaging the patients into Green (E mergency S everity I ndex 5, i.e. ESI 5), Yellow (ESI 3 and 4) and Red (ESI 1 and 2) (Reference Sahu, Bhoi, Galwankar, Aggarwal, Murmu, Nayer, Sinha, Mishra, Ekka and KumarSahu et al., 2020). If marked green, the patient is referred to the outpatient department after primary treatment. A yellow patient marked ESI 3 and 4 is moved to areas marked D1 (Yellow Counter) until there is a bed available in D2 (Yellow Area). Patients marked ESI 2 are moved to C1 (Red Counter) before transferring to C2 (Red Area), while ESI 1 patients are directly moved to C2. Area E acts as a coordinating area between the healthcare providers and other specialized departments, while areas F and G are nursing stations. Figure 1 is a schematic representation of the mentioned patient flow and spatial layout of EM1.

Figure 1. Schematic representation of a patient flow within the EM 1 facility
2.3. Spatial mapping of diagnostic subtasks according to the patient flow
The process of diagnosis consists of four primary subtasks.
-
a) Presentation and Perception Formation (P) ‐ Clinicians are presented with the patient’s symptoms and signs (Reference CroskerryCroskerry, 2009a). Based on the interaction, they begin preliminary examinations. In an emergency context, this is when potentially affected systems are identified.
-
b) Hypothesis Generation (H) ‐ Based on the patient’s age, sex, race, and presentations, clinicians further develop several diagnostic hypotheses (Reference Kassirer and KopelmanKassirer & Kopelman, 1989). At this phase, the potential pathology of the patient is hypothesised.
-
c) Gathering and Interpretation of Evidence (E) ‐ Clinicians assess the tests and evidence required to modify the relative probability of each hypothesis based on the ones that are developed. This aids in comparing multiple hypotheses (Reference Bornstein and EmlerBornstein & Emler, 2001).
-
d) Verification (V) ‐ Final diagnosis is formulated by assessing the hypotheses to determine the one accommodating most of the patient’s manifestation and test results (Reference Bornstein and EmlerBornstein & Emler, 2001).
Although sequential, these subtasks are performed dynamically across multiple locations by different clinicians of varying expertise within a clinical setting (Reference Zwaan and SinghZwaan & Singh, 2015). Each of the locations within the concerned setting imposes distinct contextual demands that can influence the performance of the subtasks. Hence, to simultaneously observe and identify relevant complexity factors for multiple ID cases, we spatially mapped the diagnostic subtasks according to the patient flow within EM 1. This allowed an optimal positioning of observers to study diagnostic subtasks within their specific environment demands. The mapping of diagnostic subtasks to designated regions was informed by direct observation of clinical workflows within EM 1, undertaken during a week-long pilot study. The insights were further discussed, modified and validated after extensive discussions with the chief medical officer and senior resident clinicians engaged in patient triage and diagnosis. This ensured that our observation framework aligned with the real-world diagnostic workflow within the setting.
For cases marked ESI 2,3, and 4, A and B were defined as areas where a patient case’s presentation and perception formation stage is undertaken. C1 and D1 were identified as the areas where hypothesis generation and evidence interpretation take place, and verification can happen in C1, C2, D1, and D2. For cases marked ESI 1, all the diagnostic subtasks were undertaken in area C2.
While this mapping reflects the specific workflow of EM 1, it can be modified for other clinical settings based on variables like patient acuity, departmental configuration, and resource allocation. Prospective users of the tool may adjust the mapping according to these factors to align with their respective ward environments and workflow.
3. Objective diagnostic complexity estimation tool
The diagnostic complexity estimation tool developed by the first and last author effectively quantifies the objective complexity of diagnosing a patient presentation and further justifies it through causalities. The tool consists of the 44 identifying criteria and their associated clinical probes derived from the 25 CCFs mentioned by Reference Liu and LiLiu & Li (2012), arranged in a tabular format. To determine the criteria and probes, we reviewed the key literature through which Liu et al. identified the CCFs. A total of 27 papers were reviewed. Furthermore, these probes are classified into 33 disease-specific (X) and 34 context/environment-specific (Y) probes, catering to patient presentation and context/ environment-imposed complexity. These probes are then provided with a binary scoring mechanism (1 for presence and 0 for absence of the corresponding CCF) and the scoring criteria to mark their relevance across the four diagnostic subtasks (refer to Table 2). Users are required to score each probe retrospectively based on the observed diagnostic interaction between patients and clinicians across the four subtasks. While the Y probes are generic and can be scored by individuals without clinical expertise based on direct observation, scoring the X probes requires extensive consultation with medical experts and clinicians.
Table 1 presents an example of the identifying criteria, probes, and scoring criteria for the complexity contributory factors of Goal/ Output Conflict and Redundancy, Process Clarity and Presentation Heterogeneity, and Process Repetitiveness (Refer Reference Liu and LiLiu & Li (2012) for details).
Table 1. Examples of clinical probes, sources and scoring criteria for CCFs

An example of a probe’s potential relevancy across the four diagnostic subtask stages, along with their highest achievable scores, is provided in Table 2. Probes A, B, C, and D can be relevant across each stage, and hence the highest achievable score (HAS) for these probes is 4, while probe E is only relevant for the verification stage, and hence its HAS is 1. Probe F caters to two CCFs, i.e., Presentation Heterogeneity and Process Repetitiveness, and it is relevant across Presentation and Perception formation and Verification. In this case, even though the HAS for this probe is 4, it is to be divided among the two CCFs,0, i.e., 2 per CCF.
Table 2. Applicability and scoring System of each probe across Four diagnostic subtasks

4. Protocol
10 infectious patient cases were observed across their diagnostic continuum by three observers, and field notes around each diagnostic subtask were undertaken. Observer 1 was stationed at areas A and B (Triage) and would be informed by the attending clinicians at triage once they were presented with cases carrying infectious disease symptoms. If the case was marked for the yellow area, observer 2 would be alerted by observer 1, who would further observe the patient’s case across areas D1 and D2. If a case was marked for the red area at the triage, observer 3 would hand over the case for areas C1 and C2. The observers captured the contextual and environmental conditions in which the diagnostic tasks were being undertaken. The observation of a patient case would close when the attending clinician informed the observers that they had identified a probable diagnosis.
Prior to the main study, the observers underwent a week-long immersion at the site to counter observer bias. Cases observed during this immersion period were utilised to assess the inter-rater reliability of the three observers using Fleiss Kappa. This exercise was repeated twice using the immersion period data until moderate agreement (0.449) was reached. Post the observation period, focus group discussions consisting of the Chief Medical Officer, 2 attending Senior Residents and 1 observer were conducted on the observed cases. Based on the discussions, the disease-specific (X) probes inherent to the case were scored. Next, the attending clinicians were asked to rate the difficulty of the case on a scale of 1 to 5 to record their perceived difficulty.
5. Data analysis and results
The HAS for each complexity contributory factor (CCF) was calculated by adding the HAS for all the probes catering to it (see Table 4). Two CCFs, Presentation Compatibility and Presentation Format, were found to be irrelevant in a clinical context. Variety was found to be relevant across all the cases. The total objective complexity score calculated was 205.
Table 3. HAS for CCFs across each diagnostic subtask and total HAS for each probe

Next, we calculated the contribution percentage of each CCF for the 10 observed cases and the complexity percentage of each diagnostic stage. Due to space constraints, we elaborate on three cases, while the summary of the other seven cases is provided below in Table 7.
Table 4. Diagnostic complexity estimation of case 1

Table 5. Diagnostic complexity estimation of case 4

Table 6. Diagnostic complexity estimation of case 7

Table 7. Summarised diagnostic complexity estimation of remaining cases

6. Discussions
A major cause of diagnostic errors is the underlying complexity of the diagnostic process. To design and deploy behavioural centred artefacts that effectively counter these errors, it is inherent to identify the causes of complexity. Through our work, we successfully demonstrate the operationalisation of a complexity estimation tool to identify the objective complexity of 10 infectious disease cases. Undertaking a Pearson correlation test between the perceived difficulty scores mentioned by clinicians attending the cases and our objective scores revealed a strong correlation of 0.739. Our analysis also revealed Hypothesis Generation (H) as the most complex subtask, followed by Verification of Evidence (V), Gathering and Interpretation of Evidence (E), and Presentation and Perception Formation (P) for ID cases. Further, we also elucidate specific complexity contributory factors relevant to each diagnostic subtask (Table 8).
6.1. DICE model as a causality determination for DEER analysis
A major implication of the tool is its ability to define the causality behind diagnostic errors. Schiff et al. (2009) developed a taxonomy of diagnostic errors (DEER) in clinical settings that elucidates the modality of errors and the stage of their occurrence in a diagnostic interaction. Combining our model with the DEER taxonomy framework offers a critical opportunity to correlate the causality and modality of diagnostic errors in a clinical workflow. This can form the foundation for predictive informatics focusing on anticipating potential future diagnostic errors based on current infrastructure and workflow of healthcare settings. Leveraging these insights can help design tailored behavioural interventions that aid diagnostic decision-making. Future research should focus on developing a broader framework that operationalises this combination.
Table 8. Most relevant complexity contributory factors of diagnostic subtasks

7. Conclusions and limitations
We operationalised a complexity estimation tool and elicited factors contributing to diagnostic complexity for infectious diseases in an EM setting while aiming to establish a foundation for employing the tool in diverse clinical settings and other disease spectrums. A strong correlation between the measured objective complexity and the difficulty perceived by attending clinicians emphasises the validity of our findings. However, the study is bound by certain limitations. The small data set of 10 cases provides necessary insights but limits the robustness of our findings for ID cases. Future research should focus on expanding the dataset to a larger number to refine and further adjust our findings. Second, considering this was a single-centre study, our findings may not generalise to other EM settings. While the disease-specific complexity might remain constant, the contextually imposed complexity might vary according to the organisational structure and cultural influences within a clinical setting. Therefore, future studies need to be conducted across multiple emergency settings to account for these variations and optimise the generalisability of our findings.
Nevertheless, our work provides a basis for designing behavioural interventions aiming to reduce the complexity of diagnostic processes. Such interventions can help enhance the efficiency of clinical diagnosis, improving patient outcomes and well-being.
Acknowledgements
The authors are grateful to Dr. Radhika Sabnis, Dr. Yatharth Choudhary, Mr. Deepayan Mukherjee, Ms. Saumya Jain and the Department of Emergency Medicine staff at AIIMS, New Delhi.