Introduction
With the advancement of artificial intelligence (AI), large language models (LLMs) have demonstrated powerful capabilities in data retrieval, computation, and natural language understanding (Zhai, Reference Zhai2022), revealing significant potential as collaborators in creative concept generation (Liu et al., Reference Liu, Han, Ma, Zhang, Yang, Tian, He, Li, He, Liu, Wu, Zhao, Zhu, Li, Qiang, Shen, Liu and Ge2023; Han et al., Reference Han, Gao, Liu, Zhang and Zhang2024). Studies suggest that compared to human–human collaboration (HHC), designer–LLM partnerships can reduce design steps, shorten task completion times (Zhou et al., Reference Zhou, Li, Zhang, Yu and Duh2024), and facilitate faster idea generation (Huang et al., Reference Huang, Zhang and Tang2023). Recent evidence further demonstrates that LLM collaboration can enhance cognitive flexibility and creative thinking(Koivisto and Grassini, Reference Koivisto and Grassini2023; Fabio et al., Reference Fabio, Plebe and Suriano2025). This emerging field of human–agent collaboration (HAC) has become a significant research focus in design (Li et al., Reference Li, Huang, Liu and Zheng2022).
To enhance the design reasoning capabilities of LLMs, researchers are actively exploring the integration of classic concept generation techniques (CGTs) with LLMs. CGTs can be classified according to their degree of structure, and researchers have employed CGTs with varying levels of structure in combination with LLMs. For unstructured techniques, Shaer et al. (Reference Shaer, Cooper, Mokryn, Kun and Ben Shoshan2024) have utilized brainstorming (Osborn, Reference Osborn1953), where designers and LLMs engage in free association and open discussion to stimulate creativity (Shaer et al., Reference Shaer, Cooper, Mokryn, Kun and Ben Shoshan2024). In the realm of semistructured techniques, Wang et al. (Reference Wang, Zuo, Cai, Yin, Childs, Sun and Chen2023) adopted the function–behavior–structure (F-B-S) (J. Gero and Milovanovic, Reference Gero and Milovanovic2021) framework, decomposing design tasks into three steps and enabling designers to guide the output of LLMs (Wang et al., Reference Wang, Zuo, Cai, Yin, Childs, Sun and Chen2023). For highly structured approaches, Lee et al. (Reference Lee, Liang, Yung and Keung2024) developed the “Ecoinnovate Assistant” based on Theory of Inventive Problem Solving (TRIZ) (Altshuler, Reference Altshuler1984), a systematic methodology for inventive problem-solving that identifies and resolves technical contradictions through established patterns. TRIZ employs core theoretical tools, including the contradiction matrix, which maps specific engineering conflicts to applicable solutions, and 40 inventive principles derived from patent analysis across diverse technological domains. This methodology provides a structured framework for converting abstract innovative concepts into concrete design solutions, reducing trial-and-error approaches in the innovation process. The LLM-driven assistant strictly follows specific steps to advance the design process, ensuring adherence to the systematic principles of TRIZ.
Despite the advantages of integrating CGTs with large LLMs, effective collaboration between designers and LLMs poses a unique cognitive challenge. Appropriate prompt construction and cognitive adaptation are essential for success. Current research primarily focuses on final performance, while studies on cognitive processes largely rely on subjective methods such as questionnaires, protocol analysis, and qualitative interviews. These approaches make it difficult to objectively reveal the neurocognitive mechanisms of the designers. Neuroscience research has shown that different CGTs with varying levels of structure activate distinct neural patterns in designers (Shealy et al., Reference Shealy, Gero, Hu and Milovanovic2020). Designers experience a higher cognitive load when using brainstorming, while TRIZ demands greater cognitive coordination across brain regions (Shealy and Gero, Reference Shealy and Gero2019). However, the applicability of these findings to HAC contexts remains unexplored, creating a research gap. This gap limits the ability to optimize HAC design processes and tools from a neurocognitive perspective.
This study employed functional near-infrared spectroscopy (fNIRS) to objectively measure designers’ brain activity during the design process. Compared to physiological measurement techniques, such as heart rate variability (Guo et al., Reference Guo, Fang, Li, Ren and Zhang2024) and electrodermal activity (Hernandez Sibo et al., Reference Hernandez Sibo, Gomez Celis and Liou2024), brain imaging techniques provide a more direct means of measuring cognitive processes. fNIRS offers distinct advantages compared with electroencephalography and functional magnetic resonance imaging in certain design research contexts. It supports natural activity by providing a balance between temporal and spatial resolution, better cost-effectiveness, great portability, and reduced artifacts (Milovanovic et al., Reference Milovanovic, Hu, Shealy and Gero2021a). Furthermore, fNIRS effectively measures the prefrontal and parietal cortices (Chrysikou, Reference Chrysikou2019), key brain regions involved in design planning and execution. Given these advantages, fNIRS has been widely used to study creative and problem-solving tasks (Milovanovic et al., Reference Milovanovic, Hu, Shealy and Gero2021a, Reference Milovanovic, Hu, Shealy and Gero2021c).
fNIRS measures neural activity by detecting changes in oxyhemoglobin (Oxy-Hb) concentration within the cerebral cortex (Ekstrom, Reference Ekstrom2010). An increase in Oxy-Hb typically indicates enhanced neural activity. Among various analytical methods for fNIRS data, two prevailing approaches include: (1) the area under the curve (AUC) of Oxy-Hb, which serves as a key metric for cognitive load assessment. A larger positive AUC indicates higher cognitive demands (Verdière et al., Reference Verdière, Roy and Dehais2018), and (2) functional brain network analysis (Hosseini et al., Reference Hosseini, Peyrovi and Gohari2019), which includes two key components: effective connectivity and functional connectivity. Effective connectivity reveals the causal relationships between brain regions (Wu et al., Reference Wu, Zheng, Hua, Wei, Xue, Li, Xing, Ma, Shan and Xu2022), while degree centrality (DC) assesses the importance of each subregion as a node within the network. Functional connectivity quantifies neural synchronization between regions, with network density (ND) indicating the overall level of coordination across the brain network (Qu et al., Reference Qu, Cui, Guo, Ren and Bu2022).
To address this research gap, this study explores how CGTs with varying degrees of structure (brainstorming and TRIZ) influence cognitive processes and outputs of the designers when collaborating with LLMs. This study proposes the following research questions (RQs):
RQ1: How do different collaborators and CGTs impact designers’ cognitive load?
RQ2: How do different collaborators and CGTs affect the functional brain network of designers’ prefrontal cortex (PFC)?
RQ3: How do different collaborators and CGTs influence designers’ creative thinking performance?
To comprehensively investigate the effectiveness of combining LLMs with CGTs, this study uses HHC as a baseline group, adopting a 2 × 2 factorial design (collaborators: human/LLM-based agent × CGT: brainstorming/TRIZ). Before the experiment, two LLM-based agents were developed: IntelliStorm (based on brainstorming) and EvoluTRIZ (based on TRIZ). Thirty-two participants were randomly assigned to either the HAC or HHC groups for conceptual design tasks. Using fNIRS, Oxy-Hb data were recorded to analyze cognitive load and functional brain networks, while design experts evaluated the creativity of the design outcomes. The research evidenced that the LLM-based agents significantly reduced participants’ cognitive load compared to a human collaborator. It enhanced the DC of the right ventrolateral PFC (R-VLPFC) subregion and the ND of the PFC’s functional connectivity while improving thinking fluency and flexibility. Regarding CGTs, brainstorming primarily activated the right dorsolateral PFC (R-DLPFC), affecting the fluency of ideas, while TRIZ mainly activated the left dorsolateral PFC (L-DLPFC), influencing the elaboration of ideas. Moreover, significant interaction effects were observed between collaborators and CGTs, jointly influencing cognitive load, originality of creativity, and flexibility of thinking. In summary, the research provides neuroscientific evidence for the effectiveness of HAC’s conceptual design.
Methods
Collaborative agent
Based on OpenAI’s Generative Pre-trained Transformers (GPTs), two agents were developed: IntelliStorm, incorporating brainstorming techniques, and EvoluTRIZ based on the TRIZ methodology. The CO-STAR framework guidelines (Sheila, Reference Sheila2024) are used to configure the GPTs’ initialization parameters, where the dialog principles and opening prompts are presented in Table 1 for both IntelliStorm and EvoluTRIZ.
Table 1. Parameter settings for collaborative agents

Participants
This study adhered to ethical standards and was approved by the relevant Ethics Committee. Thirty-two participants were recruited from the design school of a top university, including master’s students, doctoral students, and faculty members. All participants had over 5 years of product design experience and at least 6 months of frequent LLM usage experience. They were in good health with normal hearing abilities. Detailed participant information is presented in Table 2.
Table 2. Participants information

Experimental equipment
The experiment was performed in a neuroscience laboratory equipped with soundproofing and controlled lighting conditions. Two trained lab technicians supervised the experimental procedures. Table 3 lists the main equipment used in this study.
Table 3. The list of experimental equipment

Experimental design
The thirty-two participants were randomly assigned to either the HAC or the HHC group, with 16 participants in each group. To avoid order effects, a within-group Latin Square Design was implemented such that the participants would complete two parallel design tasks (see Task A and Task B in Figure 1). In both conditions, collaborators (agents or humans) actively contributed ideas during creative discussion, with participants retaining final decision-making authority.

Figure 1. Group assignment and experimental sequence.
For the HAC group, participants collaborated with IntelliStorm and EvoluTRIZ agents. In the HHC group, participants worked with a senior designer (8 years of experience) recruited from an external design company with no prior relationship to any participants. This avoided potential power relationships that might inhibit creative expression. The experimental setup for the HAC group is illustrated in Figure 2.

Figure 2. HAC scenario.
Experimental tasks
Task A: Cross-language communication challenge. Design an innovative product or service system to help tourists overcome language barriers during their international travels. The product or system should assist the traveler with essential daily interactions, such as asking for directions, shopping, and ordering food in countries where they do not speak the local language.
Task B: Beginner’s cooking challenge. Create an innovative product or service system that enables cooking for beginners to successfully prepare delicious meals. The solution should make cooking accessible and enjoyable for beginners with zero experience in cooking.
All solutions must address real user pain points while being both innovative and practical. Designs should integrate three core elements: functionality, structure, and user behavior patterns (F-B-S) (J. S. Gero and Kannengiesser, Reference Gero and Kannengiesser2004).
Experimental procedure
The experiment involves three main phases: preparation, baseline data collection, and task execution. (1) In the preparation phase, participants receive a detailed briefing and are asked to sign consent forms. After putting on the fNIRS equipment, they are introduced to CGTs and practice communication through a sample task: designing an outdoor drinking product. (2) Participants sit quietly for 2 min for physiological baseline measurements. (3) During task execution, participants first tackle Task A – an 8-min natural conversation with either LLM-based agents or an experienced designer. They can sketch or take notes as needed to simulate real design conditions. After reviewing their solution, participants spend 1 – 3 min verbally describing the F-B-S patterns of their design. After a short break, they repeat the process for Task B. Figure 3 illustrates the complete procedure.

Figure 3. Procedure.
Data collection and analysis
fNIRS data
The electrode placement was designed to cover the entire PFC, as this region is critical for design planning and execution (Dietrich, Reference Dietrich2004; Chrysikou, Reference Chrysikou2019) and is central to design concept generation (Gilbert et al., Reference Gilbert, Zamenopoulos, Alexiou and Johnson2010). Based on Brodmann areas (Brodmann, Reference Brodmann and Garey1999), 16 sensors (8 emitters and 8 detectors) were positioned on the fNIRS cap using the 10/20 international system, creating 22 channels to cover the key subregions: L-DLPFC, left ventrolateral PFC (L-VLPFC), R-DLPFC, R-VLPFC, medial PFC (mPFC), and the orbitofrontal cortex (OFC). Each channel consists of a light source and a nearby receiver. Table 4 provides detailed channel distribution across brain subregions and their associated cognitive functions. The layout of the electrodes and channels is shown in Figure 4.
Table 4. Subregion and channels


Figure 4. Layout of electrodes and channels. (a) Sources and detectors: red and blue dots representing emitter and receiver, respectively. (b) Channels (from left to right): L-DLPFC, L-VLPFC, mPFC, R-DLPFC, and R-VLPFC.
In the initial preprocessing of the raw data, two participants were excluded due to weak signal quality. The remaining data were processed using MATLAB R2023b with the Homer3 plugin to remove physiological noise and motion artifacts. In more detail, a moving average over the last five points was used to replace any value above five, which is the threshold for abnormal values. Motion artifacts were then eliminated via a regression-based temporal derivative distribution repair method (Fishburn et al., Reference Fishburn, Ludlum, Vaidya and Medvedev2019), which effectively removes baseline shifts and artificial spikes by iteratively reducing the weight of the noise. Finally, a Butterworth filter of order six was applied to filter the data, with a selected passband of 0.01 – 0.08 Hz. Given that Oxy-Hb shows a better signal-to-noise ratio and sensitivity compared to deoxygenated hemoglobin (Shealy et al., Reference Shealy, Gero, Hu and Milovanovic2020), the analysis was focused on Oxy-Hb data. After baseline correction, valid Oxy-Hb values were obtained from 30 participants. The preprocessed data were then analyzed using SciPy (v.1.11.0) and HERMES Toolbox for further metrics calculation.
Positive AUC: The AUC for positive Oxy-Hb signals was calculated using the trapezoidal rule. The AUC values from all 20 channels (excluding OFC channels) were summed to represent the overall cognitive load in the PFC, with higher AUC values indicating greater cognitive load.
DC of PFC subregions: Oxy-Hb data from five subregions were used to analyze information flow between PFC regions using Granger causality analysis, which can identify predictive relationships between variables without prior hypotheses (Hasanzadeh et al., Reference Hasanzadeh, Mohebbi and Rostami2022). A directed network matrix based on significant Granger causality (p < 0.05) was constructed, assigning a value of one for significant causal relationships and 0 otherwise. From this binary matrix, in-DC and out-DC for each node (PFC subregion) were calculated. The weighted sum of these metrics provided a comprehensive DC index, with higher values indicating greater importance of the subregion in the network.
ND of PFC functional connectivity: Functional connectivity was calculated using Pearson’s correlation coefficients based on Oxy-Hb data from five PFC subregions. Previous research has established that a threshold of 0.6 for correlation coefficients indicates significant synchronous activation between brain regions (Bressler and Menon, Reference Bressler and Menon2010). A binary connectivity matrix was created, with a value of 1 assigned to region pairs with correlation coefficients exceeding 0.6 and 0 otherwise. The ND was then calculated from this binary matrix, with higher values indicating stronger synchronization of cognitive resources across the network.
Design outputs data
To evaluate participants’ creative thinking abilities, the audio-formatted final solution was transformed into text and scored by two experts using standardized criteria. The scoring system is based on Torrance’s four dimensions of creative thinking (Torrance, Reference Torrance1974): fluency, flexibility, originality, and elaboration. The detailed scoring criteria are presented in Table 5.
Table 5. Creative thinking assessment criteria

Results
To analyze the main effects and interaction effects of the two factors (n = 30 for neuroimaging, n = 32 for performance), data normality was first tested using the Shapiro – Wilk test. For data meeting normality and homogeneity of variance assumptions, the two-way analysis of variance (ANOVA) was applied; for data violating these assumptions, the Scheirer–Ray–Hare test was used. For AUC data specifically, which involved calculations across 20 channels, a two-step analysis strategy was employed. First, independent two-way ANOVAs were conducted for each channel. Then, Brown’s method (David, Reference David1975) combines p-values across channels, accounting for inter-channel correlations while preserving the integrity of multidimensional data. Table 6 presents the means (SD) and p-values for all metrics.
Table 6. The mean values, standard deviations (SDs), and p-values for each measurement under different RQs

Cognitive load analysis based on AUC (RQ1)
As shown in Figure 5, there was a significant main effect of collaborators on cognitive load (F = 17.271, p < 0.001, η 2p = 0.365). Participants collaborating with agents showed significantly lower AUC values (M = 4.66 × 10−3, SD = 3.22 × 10−3) compared to collaboration with humans (M = 6.66 × 10−3, SD = 4.84 × 10−3), indicating that collaboration with agents effectively reduced participants’ cognitive load.

Figure 5. AUC results. (a) The main effect of collaborators.(b) Interaction effect.
While CGTs did not show a significant main effect (F = 3.015, p = 0.088, η 2p = 0.091), there was a significant interaction effect between collaborators and CGTs (F = 8.115, p < 0.004, η 2p = 0.213). Specifically, in the HHC group, participants using TRIZ exhibited significantly lower cognitive load (M = 5.01 × 10−3, SD = 3.32 × 10−3) than those using brainstorming (M = 8.31 × 10−3, SD = 5.99 × 10−3). However, in HAC, participants using EvoluTRIZ showed higher cognitive load (M = 4.52 × 10−3, SD = 3.26 × 10−3) compared to those using IntelliStorm (M = 4.01 × 10−3, SD = 3.17 × 10−3). This interaction pattern suggests that the impact of CGTs on cognitive load varies depending on the collaborators.
DC analysis of subregions (RQ2)
Table 7 and Figure 6 present the DC results across subregions. In terms of the primary effect from collaborator types, only R-VLPFC showed significant differences in DC (F = 10.212, p = 0.002, η 2p = 0.201). The HAC group demonstrated significantly higher DC (M = 0.56, SD = 0.29) compared to the HHC group (M = 0.38, SD = 0.15), suggesting that agents enhanced participants’ R-VLPFC centrality within the PFC network more effectively than humans. The other four subregions showed no significant differences in DC between collaborators.
Table 7. Two-way ANOVA results for DC of subregions


Figure 6. The main effect of collaborators and CGTs on DC of PFC subregions.
For the main effect of CGTs, significant differences were observed in both L-DLPFC (F = 59.572, p < 0.001, η 2p = 0.665) and R-DLPFC (F = 4.377, p = 0.041, η 2p = 0.127). Specifically, when using TRIZ, participants exhibited significantly higher DC in L-DLPFC (M = 0.92, SD = 0.43) compared to brainstorming (M = 0.14, SD = 0.18), indicating that TRIZ enhances L-DLPFC’s centrality in the network. Conversely, brainstorming led to higher DC in R-DLPFC (M = 0.77, SD = 0.40) compared to TRIZ (M = 0.55, SD = 0.39), suggesting that brainstorming strengthens R-DLPFC’s central role in the network. The remaining three subregions showed no significant differences between the two techniques. Notably, no significant interaction effects between collaborators and CGTs were observed across all five subregions.
To provide more detailed insights, p-values from individual Granger causality analyses were combined using Fisher’s method and constructed directed graphs of effective connectivity using statistically significant results. In Figure 7, blue nodes represent the five PFC subregions, while arrows indicate significant connections, with arrow directions showing causality. The graph illustrates the effects of collaborators and CGTs on subregions: in HAC conditions, R-VLPFC showed a higher number of effective connections compared to HHC. Under TRIZ influence, participants’ L-DLPFC demonstrated the most central cognitive role, whereas in the brainstorming task, participants’ R-DLPFC exhibited the highest centrality.

Figure 7. Effective connections comparison.
ND analysis based on functional connectivity (RQ2)
As shown in Figure 8, there was a significant main effect of collaborators on functional connectivity ND (F = 83.820, p < 0.001, η 2p = 0.736). Participants collaborating with agents demonstrated significantly higher ND (M = 0.550, SD = 0.10) compared to those collaborating with humans (M = 0.16, SD = 0.14). This indicates stronger cognitive synchronization among PFC subregions and enhanced coordination of cognitive resources during HAC.

Figure 8. The main effect of collaborators and CGTs on ND.
CGTs also showed a significant main effect (F = 12.511, p < 0.001, η 2p = 0.294). When using TRIZ, participants exhibited significantly higher ND (M = 0.450, SD = 0.278) compared to brainstorming (M = 0.250, SD = 0.186), suggesting that TRIZ enhances cognitive synchronization across PFC regions. However, no significant interaction effect was observed (F = 4.231, p = 0.452, η 2p = 0.124).
To illustrate detailed results, functional connectivity matrices were constructed by averaging Pearson’s correlation coefficients across participants. In Figure 9, red nodes represent the five PFC subregions, while green undirected edges indicate connections with thresholds above 0.6, representing significant synchronous activation. The visualization reveals that participants in the human–agent group exhibited more extensive functional connectivity throughout the PFC, including increased bilateral and inter-hemispheric connections, while the human–human group primarily showed localized connections in the R-PFC. Additionally, TRIZ demonstrated higher functional connectivity ND compared to brainstorming in both HAC and HHC conditions.

Figure 9. Functional connectivity comparison.
Creative thinking performance (RQ3)
To assess the reliability of performance scores, the scoring consistency between two experts across four dimensions was first evaluated using the intraclass correlation coefficient (ICC). The results demonstrated good scoring consistency across all dimensions: fluency (ICC = 0.982), originality (ICC = 0.849), flexibility (ICC = 0.770), and elaboration (ICC = 0.870), providing a reliable foundation for the subsequent analyses.
Table 8 and Figure 10 present the performance evaluation results. Significant primary effects between collaborator types were found in fluency (F = 8.556, p = 0.005, η 2p = 0.222) and flexibility (F = 6.738, p = 0.022, η 2p = 0.183). Participants collaborating with agents generated more solutions (M = 3.84, SD = 1.61) and demonstrated greater perspective shifting (M = 2.25, SD = 1.00) compared to those collaborating with humans (M = 1.94, SD = 0.66 and M = 1.66, SD = 0.57, respectively). These findings indicate that HAC enhances participants’ ability to generate diverse design solutions and explore problems from multiple perspectives. However, no significant effects were observed in originality or elaboration dimensions.
Table 8. Two-way ANOVA results for performance


Figure 10. The main and interaction effects on performance.
In terms of CGT, significant differences are observed in fluency (F = 10.756, p = 0.002, η 2p = 0.264) and elaboration (F = 5.235, p = 0.026, η 2p = 0.148). Designers applying brainstorming (M = 3.44, SD = 1.49) produced a higher number of design solutions than those using TRIZ (M = 2.34, SD = 0.90). However, in terms of solution depth, TRIZ (M = 2.25, SD = 0.75) outperforms brainstorming (M = 1.81, SD = 0.71). These findings evidence that brainstorming promotes divergent thinking, whereas TRIZ leads to more detailed creative expression. No significant differences were observed between the techniques in terms of originality and flexibility.
Significant interaction effects between collaborators and CGTs emerged for originality (F = 12.209, p < 0.001, η 2p = 0.289) and flexibility (F = 8.649, p = 0.005, η 2p = 0.224). In terms of originality, HHC with brainstorming yielded the highest creative uniqueness (M = 3.56, SD = 1.35), while TRIZ produced the lowest (M = 1.20, SD = 0.41). For flexibility, HAC using TRIZ demonstrated the strongest multiperspective thinking (M = 2.43, SD = 1.18), while human collaboration with TRIZ showed the weakest (M = 1.13, SD = 0.25). No interaction effects were found for fluency and elaboration.
Performance results were visualized through a radar chart using averaged and normalized data. As shown in Figure 11, participants collaborating with IntelliStorm showed superior fluency, while those working with EvoluTRIZ demonstrated strengths in flexibility and elaboration. In HHC, participants using brainstorming maintained their distinctive advantage in originality, whereas those using TRIZ exhibited relatively weak overall performance.

Figure 11. Performance comparison.
Discussions
Reduced cognitive load (RQ1)
The dual-process theory proposed by Evans posits two distinct cognitive systems underlying human reasoning: System 1 (intuitive) operates rapidly and automatically, while System 2 (reflective) operates slowly, requires conscious control, and permits abstract reasoning and hypothetical thinking (Evans, Reference Evans2003). System 2 operations are constrained by working memory capacity (Evans and Stanovich, Reference Evans and Stanovich2013). When cognitive load increases, the limited working memory resources become overconsumed, forcing individuals to shift from System 2’s controlled processing to System 1’s automatic processing (Dominiak and Duersch, Reference Dominiak and Duersch2024). The findings of this study are interpreted within this theoretical framework.
Significant impact of collaborators: Participants in the HAC group showed significantly lower cognitive load than those in the HHC group due to two factors: agents handled information storage, retrieval, and processing tasks (Zhu and Luo, Reference Zhu and Luo2022; Zhou et al., Reference Zhou, Li, Zhang, Yu and Duh2024), reducing working memory demands, while eliminating social cognitive burdens like emotional regulation and conflict resolution present in human collaboration (Fink et al., Reference Fink, Grabner, Gebauer, Reishofer, Koschutnig and Ebner2010). This cognitive load reduction freed working memory resources for System 2, allowing participants to engage in deeper abstract reasoning and logical processing without working memory constraints (Evans and Stanovich, Reference Evans and Stanovich2013), ultimately improving overall efficiency.
A significant interaction between collaborators and CGT revealed a complex pattern. For HHC, TRIZ was more effective than brainstorming in reducing cognitive load. This advantage is likely a result of TRIZ being a structured method, which provides clear cues and organized information retrieval, connecting short-term and long-term memory systems (Belski and Belski, Reference Belski and Belski2015) and reducing cognitive load (Lara and Wallis, Reference Lara and Wallis2015), thereby providing structured processing pathways for System 2. In contrast, brainstorming simultaneously engages multiple cognitive systems: System 1’s intuitive associations, working memory storage, and System 2’s creative evaluation and abstract reasoning. The frequent switching between these cognitive processes leads to information overload (Kohn and Smith, Reference Kohn and Smith2011), consuming limited cognitive resources in short-term memory (Kirschner, Reference Kirschner2002; Artino, Reference Artino2008). When cognitive load becomes excessive, System 2 cannot be adequately utilized, causing individuals to revert to System 1’s automatic processing.
This pattern, however, is reversed in HAC. Participants working with EvoluTRIZ showed higher cognitive load than those using IntelliStorm. IntelliStorm continuously provides diverse creative solutions. This allows participants’ System 1 and System 2 to process information in parallel. However, EvoluTRIZ’s logical matching and analogical reasoning suffer from accuracy issues (Cong-Lem et al., Reference Cong-Lem, Soyoof and Tsering2025). This forces participants to frequently activate System 2 for verification, correction, and abstract reasoning. The working memory-intensive serial processing increases overall cognitive load.
In summary, reduced cognitive load in HAC does not indicate a shift toward System 1’s automatic processing. Instead, agents handle information processing tasks and eliminate social cognitive burdens. This creates a more optimal operating environment for System 2, enabling complex concept generation tasks to be completed more efficiently under System 2’s dominance.
Enhanced R-VLPFC centrality and improved PFC coordination (RQ2)
Cognitive network theory emphasizes that the brain dynamically adjusts functional connectivity patterns when executing different cognitive tasks. Through dynamic network reorganization, the network topology adapts to task demands (Bassett and Sporns, Reference Bassett and Sporns2017). Based on this theory, we analyzed network relationship characteristics among PFC subregions to investigate how collaborators and CGTs affect relationships between PFC subregions.
The collaborators had significant effects on network centrality. In particular, HAC enhanced DC in the R-VLPFC, a region crucial for similarity detection (Garcin et al., Reference Garcin, Volle, Dubois and Levy2012) and maintaining hypothesis generation and divergent thinking (Goel and Grafman, Reference Goel and Grafman2000). This reflects the brain’s adaptive configuration of similarity detection and hypothesis generation functions as core network nodes according to the specific demands of HAC. Such dynamic reorganization optimizes information integration efficiency. This network topology adaptation aligns with the task characteristics of continuously evaluating diverse information during agent collaboration.
The two CGTs had significant effects on network centrality. Brainstorming enhanced the centrality of R-DLPFC in the PFC network, a region critical for divergent thinking (Goel and Grafman, Reference Goel and Grafman2000). This enhancement may be due to brainstorming requiring rapid and judgment-free idea generation, promoting conceptual associations and creative expansion, and strengthening R-DLPFC’s network coordination (Kohn and Smith, Reference Kohn and Smith2011). Conversely, TRIZ enhanced L-DLPFC’s network centrality, a region associated with logical reasoning (Birdi et al., Reference Birdi, Leach and Magadley2012) and convergent evaluation of R-hemisphere creative ideas (Luft et al., Reference Luft, Zioga, Banissy and Bhattacharya2017). This effect emerged from TRIZ’s focus on systematic thinking and logical analysis, where structured idea evaluation enhanced L-DLPFC’s cognitive control function (Milovanovic et al., Reference Milovanovic, Hu, Shealy and Gero2021b).
Functional connectivity analysis revealed the influence of collaborators and CGTs on synchronized activation across cognitive regions. Participants in HAC showed higher FC across PFC regions, reflecting increased cognitive resource coordination. This enhanced connectivity emerged from participants simultaneously engaging in multiple cognitive tasks: analysis, evaluation, retrieval, and divergent thinking, promoting tighter coordination across the PFC network. In contrast, HHC showed FC primarily in R-PFC regions, indicating a distinct creative mechanism linking divergent thinking and problem exploration with mPFC’s experiential memory retrieval (Euston et al., Reference Euston, Gruber and McNaughton2012). Additionally, TRIZ elicited higher FC than brainstorming, likely due to its systematic problem-solving framework (Altshuler, Reference Altshuler1984) coordinating multiple cognitive resources. These findings demonstrate that HAC enables flexible integration of cognitive resources, while structured CGTs facilitate their synchronized activation.
These findings fully validate the core tenets of cognitive network theory: both collaborators and CGTs, as important factors of the task situation, significantly influence central nodes and functional connectivity patterns in the PFC network. Combined with the aforementioned cognitive load findings, this further reveals that efficient System 2 operation may require broader functional connectivity and more flexible cognitive resource coordination. This provides preliminary insights for understanding the neural basis of dual-process theory.
Improved fluency and flexibility in creative thinking (RQ3)
The agents significantly enhanced participants’ cognitive fluency and flexibility. From a phenomenological perspective, this enhancement likely resulted from several factors: agents’ immediate feedback, a diverse knowledge base offering multiple perspectives (Cong-Lem et al., Reference Cong-Lem, Soyoof and Tsering2025), and the ability to build upon participants’ ideas and create effective interaction loops (Huang et al., Reference Huang, Zhang and Tang2023). From a neural perspective, HAC reduced cognitive load, freeing more working memory resources for System 2 to support complex creative generation processes. Meanwhile, stronger functional connectivity in PFC regions promoted effective cognitive resource coordination, enabling participants to better engage in abstract reasoning, hypothesis generation, and concept transformation.
CGTs differentially affected participants’ cognitive fluency and elaboration. From a phenomenological perspective, brainstorming’s principles of quantity over judgment (Osborn, Reference Osborn1953) provide participants with a free cognitive environment. This encourages rapid associative memory retrieval and free connections between concepts, benefiting fluency development. TRIZ’s structured steps and principles guide participants toward deeper logical analysis and evaluative thinking, promoting thinking elaboration (Birdi et al., Reference Birdi, Leach and Magadley2012). From a neural perspective, brainstorming activates R-DLPFC as a network hub, providing a foundation for divergent thinking and promoting idea quantity (Milovanovic et al., Reference Milovanovic, Hu, Shealy and Gero2021c). TRIZ activates L-DLPFC as the network core, providing structured processing pathways for System 2 and supporting deeper logical reasoning and elaborated thinking processes.
The interactive effects revealed that different collaborators excelled with specific generation techniques. HHC with brainstorming produced the highest originality, leveraging human emotional empathy, experience, and intuitive thinking (Shealy et al., Reference Shealy, Gero, Hu and Milovanovic2020). In contrast, HAC with TRIZ enhanced cognitive flexibility. Agents rapidly generated solutions by matching TRIZ principles to their case databases, whereas humans evaluated feasibility and guided multiperspective solution generation. From a neural perspective, the high cognitive load environment in HHC constrained System 2’s full operation, forcing participants to shift toward System 1’s rapid association. The high activation in the right hemisphere and relatively simplified functional connectivity patterns provided a suitable neural foundation for originality – reducing excessive logical constraints and allowing intuition and association to operate freely. In contrast, HAC’s low cognitive load created an optimal operating environment for System 2. TRIZ’s structured characteristics further activated L-DLPFC as the network core, combined with stronger PFC functional connectivity. This network configuration precisely supports the cross-conceptual category switching and multidimensional thinking required for flexibility. The findings suggest that HAC complements, rather than replaces, HHC, such that each offers unique strengths.
Summary
This study examined the effects of collaborators and CGTs on participants’ cognitive processes and performance, exploring links between neural responses and design outputs. Furthermore, the results revealed advantages in combining LLMs with different techniques: brainstorming enhanced divergent thinking and solution quantity, whereas TRIZ improved logical reasoning and solution depth. Although HHC maintained unique creative advantages, the integration of the agent enhanced other performance aspects. Particularly, HAC reduced the learning curve for TRIZ and improved overall performance.
These experimental findings reframe how LLM-based agents function in creative collaboration. Agents differ from both traditional design tools and human collaborators, serving as creative inspiration partners that reshape human cognitive neural activation patterns. By handling information storage, retrieval, and solution generation, they significantly reduce cognitive load and create better operating conditions for System 2. They also optimize cognitive resource coordination for abstract reasoning, hypothesis generation, and concept transformation. These advantages particularly benefit fluency and flexibility.
LLM-based agents also have clear limitations: HHC maintains irreplaceable advantages in originality. Genuine creative breakthroughs still require uniquely human, intuitive thinking, emotional resonance, and experiential knowledge. This establishes AI as a cognitive enhancement tool rather than a creative substitute, clarifying the nature of HAC. The future of human–AI collaboration is not replacement but a complementary partnership based on respective cognitive strengths.
Limitations and future research
While this study provides valuable insights, several limitations should be acknowledged. Our sample was limited to Chinese university students and faculty, restricting generalizability. The fNIRS technology only captured PFC activity, excluding other brain regions involved in creativity. Additionally, our evaluation focused on creative dimensions while overlooking practical feasibility and commercial viability.
Future research should address these limitations by including diverse participants, especially professional designers from different cultural backgrounds. More comprehensive evaluation frameworks should balance creativity with practical applicability.
Beyond addressing current limitations, future work will explore integrating traditional creative generation techniques, such as morphological analysis (Zwicky, Reference Zwicky, Zwicky and Wilson1967) and SCAMPER (Eberle, Reference Eberle2023), with LLMs. Developing specialized agents for different design tasks and investigating agent synergies could enhance personalization. Insights from design cognition neuroscience can inform algorithmic optimization to improve innovation efficiency. More specifically, future research should explore how to develop LLM-based agents that integrate structured and unstructured thinking processes, achieving a dynamic balance between divergent and convergent approaches to enhance HAC design effectiveness.
Conclusion
This study used neuroscience methods and expert evaluation to examine how CGTs with LLMs influence design processes and outputs. Results showed that designers partnering with LLM-based agents exhibited lower cognitive load and better cognitive coordination, enhancing problem exploration (as indicated by increased DC in the R-VLPFC) and improving fluency and flexibility of thinking. Different techniques activated distinct subregions: brainstorming engaged the R-DLPFC, associated with divergent thinking and enhanced ideation fluency, whereas TRIZ activated the L-DLPFC, related to logical reasoning and improved idea elaboration. However, HHC combined with brainstorming retained unique advantages in creative originality.
This research makes several contributions. It extends dual-process theory and cognitive network theory into design creativity, demonstrating their effectiveness for understanding HAC design. The study creates a comprehensive “cognition neural performance” framework that offers methodological guidance for creativity research in design. It provides empirical evidence of how emerging AI design tools influence designers’ cognitive neural processes. The identified interaction effects between collaborators and CGTs offer scientific guidance for optimizing HAC and developing intelligent design tools.
Data availability statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Funding statement
This work was supported by the Beijing Educational Science Project (CDAA22036), the Fundamental Research Funds for the Central Universities (2024CX06121), and the Guangdong Province Educational Science Planning Project (2024GXJE108).
Competing interests
The authors declare none.