1. Introduction
Computer-aided design (CAD) tasks require engineering designers to solve cognitively complex design problems while interacting with CAD systems, which comprise CAD software and Human-Computer Interaction (HCI) tools (Reference UllmanUllman, 2002). Consequently, engineering designers continuously engage their perceptual, cognitive, and motor capabilities to complete these tasks. Understanding the workload (WL)—costs of accomplishing CAD tasks using available perceptual, cognitive, and motor resources—and its relationship with CAD performance is considered as a prerequisite for improving design outputs and processes.
Previous studies within engineering design have revealed varying (e.g. curvilinear and negative monotonic) relationships between WL and design performance. Besides these controversial insights, the relationship between WL experienced by engineering designers and their performance remains underexplored, especially in CAD modelling tasks. This study aims to address the recognized research gap by investigating the WL-performance relationship in two CAD modelling tasks with intentionally different complexity levels, designed to impose distinct WL levels. By employing these two experimental tasks, the study aims to examine how task complexity—and the WL it induces—affects the WL-performance relationship. Accordingly, this paper answers the following research questions (RQs):
1. What is the relationship between performance and WL in CAD tasks?
2. Does the relationship between WL and performance in CAD tasks vary with CAD task complexity?
To answer these RQs, the paper overviews related work in Section 2, describes the research methodology in Section 3, presents the results in Section 4, and discusses the findings, limitations, and future work in Section 5. Finally, the paper concludes the findings in Section 6.
2. Research background
When investigating WL, scholars have explored how its variations influence design performance to potentially enhance creativity of design outputs. For instance, Calpin and Menold (Reference Calpin and Menold2023) investigated relationships between perceived WL (measured via NASA Task Load indeX (TLX) questionnaire) and design outputs in idea generation and prototyping tasks. They assessed design outputs through creativity, usefulness, uniqueness, elegance, quantity, and prototype quality. The results imply significant, moderately positive correlations between mental demand and uniqueness, effort and usefulness, and effort and idea elegance. All three relationships were monotonic. Furthermore, Nguyen and Zeng (Reference Nguyen and Zeng2014) investigated how WL levels may lead to creative design solutions by redesigning methods and tools. They proposed a curvilinear relationship between WL and design creativity, implying that design outputs improve with increasing WL up to an optimal level, beyond which excessive WL leads to performance decline. Scholars in this study quantified WL (operationalised through mental stress) and design creativity (indirectly measured through mental effort) using physiological measures, including heart rate variability and electroencephalography (EEG).
The authors found no prior studies investigating relationships between WL and performance specifically in CAD tasks. However, related research has explored WL and design performance in engineering tasks involving engineering information interpretation and transformation using digital or physical tools. For example, Zimmerer and Matthiesen (Reference Zimmerer and Matthiesen2021) examined WL and performance in six engineering design tasks, including technical drawing analysis, functional analysis, design for manufacturing, failure analysis, and load analysis. They reported a weak but significant negative monotonic relationship between WL (measured with NASA TLX, focusing on mental demand) and design performance (quantified as the ratio of correct items in total available items per task). Similarly, Dadi et al. (Reference Dadi, Goodrum, Taylor and Carswell2014) investigated WL and productivity in reconstructing structures based on different engineering information formats: 2D drawings, 3D CAD models, and 3D printed models. WL was measured using NASA TLX, and productivity was assessed by the percentage of time spent on effective work, non-effective work, and rework. Results indicated a negative monotonic relationship, with lower WL scores associated with higher productivity. The study concluded that formats requiring less cognitive demands for understanding engineering information positively affect task performance.
While CAD performance was not a primary focus in these studies, other research has investigated it (without reference to WL) to identify factors through which it can be enhanced (Reference Phadnis, Arshad, Wallace and Olechowskie.g. Phadnis et al., 2021) or to improve CAD learning (Reference Company, Contero, Otey and Plumede.g. Company et al., 2015). Scholars assess and describe CAD performance through modelling outputs and processes, using various metrics. CAD modelling outputs are often evaluated by model quality, though its definition and metrics remain underexplored. Company et al. (Reference Company, Contero, Otey and Plumed2015) proposed six quality dimensions—validity, completeness, consistency, conciseness, simplicity, and effectiveness in conveying design intent—mainly for educational purposes but also applied in research. For instance, Phadnis et al. (Reference Phadnis, Arshad, Wallace and Olechowski2021) used four dimensions (completeness, conciseness, consistency, and validity) to compare CAD model quality in individual and team setups. While this approach provides structured criteria, evaluating CAD quality remains manual, subjective, and reliant on independent raters. Furthermore, common metrics for describing CAD modelling processes include task duration, the number of sketches and CAD features, their types, and their application sequence. CAD logs have also been analysed to study engineering designers' behaviour in CAD activities. For example, Gopsill et al. (Reference Gopsill, Snider, Shi and Hicks2016) investigated the use of CAD commands (such as creating, editing, and constraining), proportions of command types, and transitions between commands to better understand CAD processes.
Previous studies in engineering design suggest that the WL-performance relationship varies with the design task, with evidence supporting both negative and curvilinear (inverted-U) relationships. Scholars have used diverse methods and metrics to assess WL and design performance. Subjective methods, particularly questionnaires like NASA Task Load IndeX (TLX), dominate WL measurement, though physiological methods such as heart rate variability and EEG have also been explored. Moreover, CAD performance has been evaluated using various metrics focused on outcomes, with some studies also describing processes. However, these efforts have not accounted for WL. The reviewed literature underscores the need for studies connecting WL to CAD performance metrics, laying a foundation for understanding WL-performance relationships in CAD tasks. The experimental protocol presented in Section 3 is going to integrate subjective metrics for WL and both outputs- and process-wise metrics for CAD performance.
3. Research methodology
Twenty-four graduated mechanical engineers (2 females and 22 males), young professionals, participated in the empirical study through which the relationship between CAD performance and WL was investigated. Participants' age, working experience, CAD modelling skill and proficiency, and frequency of conducting (computer-aided) design activities were similar, as summarized in Table 1 using mean (M), standard deviation (SD), median (Med), median absolute deviation (MAD), and range.
Table 1. Participants' experience and expertise

3.1. Experimental procedure and setup
At the beginning of the experimental procedure, presented in Figure 1, participants signed an informed consent. In the second step, they filled out the experience and expertise questionnaire (EEQ). After that, participants were provided with instructions about the NASA TLX questionnaire, which included the description of its components. Participants then proceeded to the CAD tasks, where they were instructed to generate a 3D CAD model of a component using SolidWorks (SW) as CAD software. The duration of each CAD task was limited to 15 minutes. All the participants completed both tasks, but they were divided into two groups with reversed task sequences. This division enabled control and mitigation of potential influence of task order on results. After completing each CAD task, participants filled out NASA TLX questionnaire.

Figure 1. Experimental procedure
The experiment was conducted on one monitor screen (24'', 1920 x 1080 pixels, 60 Hz), powered by a high performance computer. The EEQ, instructions, stimuli, and SolidWorks applications were presented on the same monitor, one window a time. Participants advanced through the experiment and switched between the application windows using a designated keyboard key. The standard keyboard and a mouse were used as interaction devices.
3.2. CAD tasks
The study contained two main experimental tasks in which the participants were asked to generate 3D CAD models of two components. The modelled components were presented to the participants in the technical drawing (see Figure 2). Two components used in the experiment differed in their complexity levels, defined based on the overall geometric complexity (approximated by the number of model surfaces and dimensions) and the complexity of their CAD models (approximated by the number of CAD features required to build CAD models in SW) (Reference Johnson, Valverde and ThomisonJohnson et al., 2018). The low-complexity (LC) component (shown in Figure 2 on the left) consisted of 33 surfaces, 21 dimensions, and 7 features. The high-complexity (HC) component (see Figure 2 on the right) consisted of 49 surfaces, 31 dimensions, and 11 features. Hence, modelling the HC component involved simultaneous processing and generation of more interrelating information elements (design characteristics such as shape and dimensions) than modelling the LC component. Following the suggestions from Cognitive Load Theory (Reference Sweller, Ayres, Kalyuga, Spector and LajoieSweller et al., 2011), it was assumed that performing such HC task imposed higher WL than the LC task.

Figure 2. Technical drawing of the low-complexity (left) and high-complexity (right) component
3.3. NASA TLX questionnaire
Participants self-reported the perceived workload in CAD modelling tasks with NASA TLX questionnaire (Reference Hart, Staveland, Hancock and MeshkatiHart and Staveland, 1988). The NASA TLX questionnaire was selected as the subjective method for measuring WL for two reasons. First, NASA TLX has been widely used as a “gold standard” for estimating WL across various research fields (Reference HartHart, 2006). Second, it considers several aspects of WL, captured through six dimensions described in Table 2. This capability to capture several WL aspects enables a comprehensive analysis required for investigating relationships between WL and performance in CAD tasks that require not only cognitive but also perceptual and motor resources.
Table 2. NASA TLX components, descriptions are adopted from (Reference Hart, Staveland, Hancock and MeshkatiHart and Staveland, 1988)

The WL experienced during each CAD task was reported by rating the magnitude of each of the six NASA TLX components on a twenty-point scale. In addition to providing these ratings, participants evaluated the contribution (weight) of each component to the overall WL. The overall weighted WL for each CAD task was calculated by summing the weighted rating across the six components and dividing by 15, as instructed in Hart and Staveland (Reference Hart, Staveland, Hancock and Meshkati1988).
3.4. Data analysis
Data analysis was performed using R version 2023.03.1+446. Descriptive statistics encompassed the calculation of the mean (M) and the median (Med) as measures of central tendency, as well as the standard deviation (SD) and the MAD as measures of variability. These measures were calculated for NASA TLX for each of the six components (described in Table 2), as well as for the overall weighted WL. Additionally, measures of descriptive statistics were calculated to describe and summarize two aspects of participants' performance in CAD tasks: CAD modelling output and process.
CAD output was assessed through the completeness of the CAD models as a quality rubric. In particular, a CAD model was considered complete if it accurately replicated the size and the shape of the component (Reference Company, Contero, Otey and PlumedCompany et al., 2015). To quantify the completeness, size and shape of CAD models generated by participants were compared with the “gold standard” model created by the authors. Accuracy in replicating the size was calculated by dividing the number of accurate dimensions of the participants' models with the total number of dimensions defined in technical drawings of components (i.e. dimensional accuracy). In addition to this metric, percentage of modelled volume (i.e. volume) and percentage of the modelled surface area (i.e. surface area) were calculated. Moreover, accuracy in replicating shape was calculated using D1 geometric similarity algorithm available within Graderworks application (Reference Garland and GriggGarland and Grigg, 2019), and detailed in (Reference Renu and MockoRenu and Mocko, 2016) and (Reference Cardone, Gupta and KarnikCardone et al., 2003).
Furthermore, following output-based metrics were used to describe the CAD modelling process: number of sketch entities (i.e. sketch entity), number of sketch relations (i.e. sketch relation), number of sketches (i.e. sketch), number of CAD features (i.e. feature), average number of sketch entities per sketch (i.e. sketch entity average), average number of sketch relations per sketch (i.e. sketch relation average), and relation/dimension ratio. In addition to these, output-based metrics, CAD modelling processes were described with task duration (i.e. duration). This metric was considered only in the LC task, while it was omitted from the correlation analysis in the HC task since in that condition all the participants used total 15 minutes available for modelling.
Furthermore, Spearman's rank correlation was computed between NASA TLX components and CAD performance metrics as a measure of association to investigate their relationships and thus answer the first RQ. This type of correlation was selected for two reasons: 1) some variables were at the ordinal scale, and 2) due to violations of normality in some of the subsets (tested by Shapiro-Wilk test; p 0.05). Statistical significance of the correlations was tested using the R function “cor_pmat”, with the “spearman” as a selected method. Given the assumption of this type of correlation, the resulting coefficients indicate the strength and direction (positive or negative) of monotonic relationship between the variables. Finally, identified relationships were qualitatively compared between the LC and the HC task to answer the second RQ.
4. Results
Measures of descriptive statistics used to summarize CAD performance (Table 5) and scores assigned to NASA TLX components (Table 6) across both CAD tasks are detailed in Appendix A. The following subsections describe associations between these variables (presented in Tables 3 and 4) with correlation coefficients. SubSection 4.1. presents correlation coefficients for the LC task and subSection 4.2 for the HC task.
4.1. LC task
The overall weighted WL showed a significant positive correlation with LC task duration (see Figure 3). This relationship indicates that participants who spent more time modelling in the LC task perceived a higher WL. Similarly, task duration was significantly and positively correlated with three NASA TLX components: mental demand, effort, and frustration. Additionally, two metrics related to CAD modelling process (number of generated sketches and number of features) were significantly correlated with perceived effort. Notably, these correlations were negative, suggesting that participants who generated fewer sketches and features felt they had to apply more effort during the LC task. Along the same lines, participants who generated fewer features and sketch relations experienced higher levels of frustration, as indicated by a significant negative correlation between these variables. Regarding CAD outputs, dimensional accuracy and modelled surface were significantly correlated with NASA TLX components. Specifically, frustration levels decreased as the dimensional accuracy of CAD models improved. Furthermore, participants reported greater satisfaction with their CAD performance when the percentage of modelled surface was closer to 100%.

Figure 3. Correlation matrix - LC task
Table 3. Statistically significant correlations between NASA TLX components and CAD performance aspects in the LC task

4.2. HC task
In the HC task, the overall weighted WL was significantly correlated with three CAD output metrics and one CAD process metric (see Figure 4). All four correlations were negative, indicating that participants experienced higher WL when the number of modelled features, sketches, and sketch entities, as well as the percentage of modelled volume, were lower. Furthermore, significant correlations were also observed between individual NASA TLX components and metrics describing the CAD process. Specifically, the number of sketches and features was negatively correlated with mental demand, effort, and frustration, and positively correlated with perceived performance. Similarly, the number of sketch entities and sketch relations showed significant correlations with frustration and perceived performance. For physical demand, the only significant correlation was with the relation/dimension ratio, where the relationship was positive. This suggests that physical demand increased for participants who used more relations when generating sketches. Moreover, three CAD output metrics significantly correlated with two different NASA TLX components. Modelled volume showed a negative correlation with mental demand, while dimensional accuracy and surface area were positively correlated with perceived performance.

Figure 4. Correlation matrix - HC task
Table 4. Statistically significant correlations between NASA TLX components and CAD performance aspects in the HC task

5. Discussion
This study represents the first attempt to investigate relationships between CAD performance of engineering designers (graduated young professionals) and WL they perceived in CAD tasks. Most prior research has focused solely on design outputs (often related to creativity) when examining the impact of WL on performance. This paper, however, considers two aspects of CAD performance, namely CAD modelling output and CAD modelling process. Additionally, the study evaluates all six individual NASA TLX components alongside overall weighted WL. Prior studies often concentrated only on cognitive aspects, such as mental demand, or overall WL. While CAD tasks are predominantly cognitive, perceptual and motor processes are integral to engineering designers' CAD performance (Reference UllmanUllman, 2002). Thus, it is important to consider all WL components in the CAD context. By addressing both performance aspects and individual WL components, this study provides a comprehensive analysis of the relationship between WL and CAD performance.
5.1. Relationship between performance and WL in CAD tasks
The experimental study revealed monotonic relationships between perceived WL and both aspects of CAD performance. These findings align with prior research on WL and performance in engineering design (e.g. Reference Calpin and MenoldCalpin and Menold, 2023; Reference Zimmerer and MatthiesenZimmerer and Matthiesen, 2021). The results suggest consistent directional changes in WL and CAD performance; variables either increase together (positive relationship) or one increases while the other decreases (negative relationship). Strength and direction of the relationship varied depending on the used metrics, with most CAD performance metrics showing negative correlations with WL components. Notable exceptions include task duration and relation/dimension ratio, which were positively correlated with WL. Additionally, all significant correlations involving perceived performance (a NASA TLX component) were positive. This study only tested monotonic relationships due to the applied statistical methods. Future work will explore other relationship types, such as the commonly discussed inverted-U curve.
The majority of significant relationships was found between NASA TLX components and output-based metrics used to describe CAD modelling process. However, as no prior studies have examined individual NASA TLX components in CAD modelling tasks, direct comparisons are limited. These results highlight the importance of analysing CAD processes in future research, which requires developing process-based metrics and incorporating dynamic WL assessments through physiological methods like EEG (Reference Lukacevic, Becattini and ŠkecLukacevic et al., 2023) or eye tracking (Reference Cass and PrabhuCass and Prabhu, 2024). A smaller number of significant correlations were found between NASA TLX components and CAD modelling outputs. For instance, perceived performance was positively correlated with the percentage of modelled surface area in both tasks and with dimensional accuracy in the HC task. In the LC task, dimensional accuracy negatively correlated with frustration. Volume was the only CAD output metric significantly associated with overall weighted WL and mental demand.
5.2. Differences in relationships between the LC and the HC task
The study compared relationships between WL and CAD performance across two tasks with different complexity levels. Overall weighted WL and five of six individual NASA TLX components (except physical demand) scored significantly higher in the HC task than in the LC task (see Table 6 in Appendix A). The HC task showed a greater number of significant correlations, which may align with Cognitive Load Theory (Reference Sweller, Ayres, Kalyuga, Spector and LajoieSweller et al., 2011). According to this theory, cognitive load (mental demand) effects become more evident under high intrinsic load (resulting from element interactivity), while low intrinsic load may obscure such effects (Reference Sweller, Ayres, Kalyuga, Spector and LajoieSweller et al., 2011). Consequently, the weaker WL in the LC task may have limited the strength of its relationship with CAD performance.
The literature also suggests that the direction of WL-performance relationships can change with WL levels, as indicated by the inverse-U curve (Reference BruggenBruggen, 2015). However, a direction of most relationships in this study was consistent across both tasks, regardless of complexity. For instance, the directions of the relationships between weighted WL and all CAD performance metrics were the same in both tasks. Despite this, differences were observed in several relationships involving individual NASA TLX components. Temporal demand showed differences in its relationship with volume and surface area. Direction differed in correlations of physical demand with dimensional accuracy, surface area, sketch entity, and sketch relation. Mental demand, frustration, and effort also exhibited directional variation across specific metrics, including volume and relation/dimension ratio. Perceived performance changed direction of relationship with shape, sketch entity average, and sketch relation. These findings imply how task complexity influences specific WL-performance relationships.
6. Conclusions
This study explored the relationships between engineering designers' CAD performance and perceived WL during modelling tasks of varying complexity. By analysing both CAD outcomes and processes alongside all NASA TLX components, it offers a comprehensive understanding of WL-performance interactions. Significant monotonic relationships were observed, with stronger correlations in high-complexity tasks, implying the impact of task complexity. The findings emphasize the need for process-based metrics and dynamic WL assessments using methods like EEG or eye tracking. This work lays the foundation for future studies to investigate more complex, non-monotonic relationships, advancing the understanding of WL and performance in CAD.
Acknowledgement
This work was supported by/supported in part by the Croatian Science Foundation under the project DATA-MATION number [IP-2022-10-7775].
Appendix A
Tables 5 and 6 summarize achieved CAD performance and scores assigned to NASA TLX components in the LC and the HC CAD task.
Table 5. Metrics related to CAD performance

Table 6. Perceived workload in CAD tasks
