Nomenclature
-
$A$
-
rotor disk area (m2)
-
$ABMS$
-
agent-based modelling and simulation
-
$ABS$
-
agent-based simulation
-
$ACD$
-
aircraft conceptual design
-
$AI$
-
artificial intelligence
-
${C_L}$
-
lift coefficient
-
${C_T}$
-
thrust coefficient
-
${C_P}$
-
power coefficient
-
$ConOps$
-
concept of operations
-
$IC$
-
Incident Commander
-
$L$
-
lift force (N)
-
$LCES$
-
lookouts, communications, escape routes, safety zones
-
$LLM$
-
large language model
-
$R$
-
rotor radius (m)
-
$V$
-
flight velocity (m/s)
-
${V_{{\rm{stall}}}}$
-
stalling velocity (m/s)
-
$S$
-
wing area (m²)
-
$SoS$
-
system of systems
-
$T$
-
thrust (N)
-
$TRDP$
-
technical requirements definition process
-
${\rm{tanh}}\left( x \right)$
-
hyperbolic tangent function
-
$UAV$
-
unmanned aerial vehicle
-
$W$
-
aircraft weight (N)
Greek symbol
-
${\lambda _h}$
-
hover inflow ratio
-
${\rm{\Omega }}$
-
rotor rotational velocity (rad/s)
-
$\rho $
-
flow density (kg/m3)
-
${\rho _\infty }$
-
free-stream air density (kg/m³)
-
$\sigma\!\left(x\right)$
-
sigmoid activation function
1.0 Introduction
Over the last decade, wildfires have increased in number and size, earning the designation of megafires. For this reason, more resources are being allocated to preventing and fighting wildfires [Reference Wintle, Legge and Woinarski1]. These resources may include increased funding for firefighting efforts and improved technology for early detection and monitoring. Additionally, collaboration between government agencies, environmental organisations, and local communities has become increasingly vital to address the growing threat of wildfires and minimise their devastating impact on both the environment and human lives [Reference Gallagher, Allen, Mackenzie, Yates, Gosper, Keith, Merow, White, Wenk and Maitner2]. To achieve this, developing an SoS framework for performing comprehensive and holistic analyses is a logical approach [Reference Staack, Amadori and Jouannet3]. By considering all the constituent systems involved in a mission and assessing the resulting operational outcomes, it becomes possible to identify emergent behaviours that arise from collaboration, unmet needs, and the desired capabilities. These insights will provide a data-driven foundation for ACD.
However, despite multiple technological advancements, there is still a gap in the conceptualisation phase of aircraft design, particularly in modelling realistic operational scenarios and collaborative dynamics to fully understand the outcome of different sets of capabilities [Reference Raymer4]. For such scenarios and dynamics, the use of ABMS is generally recognised as adequate and capable of capturing emergent behaviours of collaborative agents, which are very difficult to predict [Reference Lovaco, Staack and Krus5]. Agents represent constituent systems and are defined by sets of behaviours. These behaviours traditionally rely on decision trees such as the one shown in Fig. 1, which, while structured, constrain the fluidity and collaborative essence of real-life situations, such as those encountered in wildfire suppression operations. This limitation not only distorts the design process but also potentially impacts the operational efficacy of the designed system.

Figure 1. Example of a decision tree for a firefighting agent.
1.1 Objectives of the paper
The present study aims to conduct an agent-based simulation (ABS) whose agents’ behaviour will be driven by an LLM. The goal is to evaluate the combination of both for future SoSs analyses with decision trees removed, offering a more dynamic and realistic representation of decision-making. The feasibility and potential benefits of utilising LLMs to model human decision-making within an ABS will be evaluated. The unique challenges of ACD specifically for wildfire fighting, such as the need for rapid response, operational versatility and coordination with other systems, make the application of ABS and LLMs to the study of SoSs particularly relevant if a more accurate, holistic, and operationally relevant analysis is to be achieved.
2.0 Related work
Wildfire modelling has been developed to provide support during incidents in terms of operations management, planning and even scientific analyses, with FARSITE being one such tool [Reference Finney6]. It was developed by the U.S. Forest Service, and the fire propagates its perimeter as expanding ellipses, with the rate of spread and intensity governed by the Rothermel model [Reference Rothermel7]. The fire simulator includes effects such as spotting and fuel moisture. Prometheus, a wildfire simulator, adopts a similar perimeter expansion methodology but is calibrated for Canadian forests [8, Reference Tymstra, Bryce, Wotton, Taylor and Armitage9].
These tools are semi-empirical and, while robust, simplify processes such as spotting, assume continuous fuels and are highly sensitive to uncertainties in wind, fuel and moisture inputs, which can lead to incorrect predictions and biased judgements [Reference Alexander and Cruz10]. Consequently, there is increasing interest in complementing, rather than replacing, these established tools with systems based on artificial intelligence (AI). Examples of domain-specific LLMs, such as WildfireGPT, have begun to appear [Reference Ramesh, Sun, Li, Zhang, Annam, Fang and Tong11]. Although their language capabilities could be used for wildfire discussions, they would require very specialised training data and the algorithmic precision necessary for adequate wildfire behaviour predictions. However, LLMs can also serve as an additional layer to support resource allocation and communication under uncertainty, improving human decision-making [Reference Chen, Tao, Tang, Qin, Yang and Zhu12].
The potential of LLMs extends beyond wildfire scenarios. In healthcare, advanced reasoning models have demonstrated diagnostic performance comparable to human physicians in medical scenarios [Reference Vrdoljak, Boban, Males, Skrabic, Kumric, Ottosen, Clemencau, Bozic and Völker13]. Similarly, in disaster response, LLMs, such as disaster response GPT, have been able to translate brief situation reports into structured plans of action, although human validation was still required [Reference Odubola, Adeyemi, Olajuwon, Iduwet, Aniekan and Odubola14]. In other fields, such as logistics and supply chain management, LLMs and forms of generative AI have been used to find new ways to enhance decision making, automate operations, or reduce disruptions, but they were found to be prone to hallucination issues [Reference Boone, Fahimnia, Ganeshan, Herold and Sanders15].
The trend is to use generative AI and LLMs for enhancing the speed and adaptability of human reasoning, provided their outputs are properly supervised and evaluated. Their integration offers rapid decision-support systems for dynamic environments such as wildfires, clinical emergencies, disaster response and logistics.
3.0 Theory
This section introduces the theoretical basics to understand the various concepts needed to perform the present work.
3.1 System design
System design is one of the technical processes that are part of systems engineering [Reference Hirshorn, Voss and Bromley16]. The system design process is a structured approach comprising four fundamental stages: developing stakeholder expectations, defining technical requirements, performing logical decompositions, and creating design solutions. A thorough understanding of stakeholder expectations, which include mission objectives, constraints, operational needs, and mission success criteria, is necessary to satisfy them. These expectations translate into requirements that guide a project throughout its life cycle. The Technical Requirements Definition Process (TRDP) transforms stakeholder expectations into problem definition first and into validated technical requirements later on, which are expressed as ‘shall’ statements [Reference Hirshorn, Voss and Bromley16]. Inputs to TRDP include top-level requirements, expectations and a description of how a system can be operated. The TRDP focuses on defining functional and behavioural expectations, establishing performance requirements for each function. The outputs are comprehensive technical requirements and technical measures that guide the development process. Well-crafted requirements establish a clear agreement between stakeholders and developers regarding the product’s intended functions, reduce development effort, provide a reliable basis for cost estimation, serve as a baseline for validation and verification processes, and as a foundation for future enhancements, ensuring that the system remains effective and adaptable over time.
3.2 Agent-based modelling and simulation (ABMS)
Field-testing data is sometimes scarce or even impossible to obtain. Such is the case with wildfires, where creating a real fire situation is not feasible due to the obvious costs and disastrous side effects. Therefore, a model is necessary, but modelling complex systems in a dynamic and interactive environment is a challenging task [Reference Lane and Epstein17]. ABMS becomes particularly convenient in this context, thanks to its representation of singular entities, or ‘agents’, each with unique behaviours and decision-making capabilities, which is a very valuable aspect in systems engineering [Reference Acheson, Dagli and Kilicay-Ergin18]. However, it is particularly relevant for wildfire management scenarios, where diverse agents interact dynamically with the environment. ABMS enhances the exploration of emergent behaviours that arise from these interactions with the environment and the agents with one another, providing additional insights into the systems’ performance both individually and collectively [Reference Collins and Etemadidavan19].
3.3 System of systems (SoS)
An SoS is a set of independent constituent systems working together to achieve a common goal with certain specific characteristics [Reference Maier20]. During the life cycle of an SoS, different levels of centralisation might be seen and be used to categorise the type of SoS [Reference Franzén21], with the command centres, departments and assets operating independently yet coordinated towards a shared mission. Under this distributed and multi-agency configuration, emergent behaviours will be exhibited. Whether these new behaviours are improvements or conflicts, they are not found in any of the systems alone; thus, the communications, operating conditions and collaborations that affect the overall SoS need to be explored and understood.
3.4 Neural networks and large language models
An artificial neural network is a computational architecture based on biological neural networks. These artificial neural networks contain computational units, known as neurons, which are connected through weights that scale their inputs. These weights scale the computed function of the neuron, and the results are propagated through the neural network layers as outputs. Multilayer neural networks used for machine learning or artificial intelligence typically consist of:
-
• Input layer: Receives the inputs
-
• Hidden layers: Intermediate layers where the computations take place
-
• Output layer: Produces the final outputs
The chain structure of the layers forming a neural network consists of the input features
$x$
, the weights
$W$
, the bias
$b$
, the activation function
$g$
, and the output
${h_i}$
:
Based on advanced forms of statistical language models, which model the likelihood of a word appearing in a given sequence [Reference Shannon22], combined with machine learning methods, such as recurrent neural networks or reinforcement learning [Reference De Mulder, Bethard and Moens23] and transformer architectures [Reference Brown, Mann, Ryder, Subbiah, Kaplan, Dhariwal, Neelakantan, Shyam, Sastry and Askell24], LLMs are trained iteratively to learn how to provide an outcome expected by humans. This capability is relevant in many fields, such as SoS modelling, to obtain evolving models that better reflect real-world decision-making [Reference Luzeaux and Ruault25].
3.4.1 Universal approximation theorem
The universal approximation theorem states that a feed-forward neural network with enough hidden layers can approximate the linear output layer to a non-zero amount of error for any Borel measurable function from a finite-dimensional space to another one. A continuous function is Borel measurable if it is continuous for a closed and bounded subset of
${\mathbb{R}^n}$
[Reference Hornik, Stinchcombe and White26, Reference Hornik, Stinchcombe and White27]. Feed-forward networks with hidden layers require activation functions to introduce non-linearity into the model for the hidden layers [Reference Goodfellow, Bengio and Courville28], being some common examples of them:
-
• Sigmoid:
$\sigma\!\left(x \right) = \frac{1}{{1 + {e^{ - x}}}}$
-
• Hyperbolic tangent:
${\rm{tanh}}\left( x \right) = \frac{{{e^x} - {e^{ - x}}}}{{{e^x} + {e^{ - x}}}}$
This theorem implies that, with enough training, a neural network can be trained to obtain outputs with an error very close, but never equal, to zero.
3.5 Operational concepts
Planning for emergencies involves making assumptions about the conditions that might exist when such crises occur. This type of planning includes developing strategies for deploying, employing, sustaining and redeploying assets in response. When a crisis emerges, an existing plan with a similar scenario might be used as a starting point for further planning. The Concept of Operations (ConOps) is defined as a statement that clearly and concisely expresses what the joint force commander intends to accomplish, and how it will be done using available resources [29]. It outlines assumptions, phases of the operation, prioritised missions and requirements for deployment of the assets. Through detailed and simulated planning, the requirements are determined for likely operational needs, which allows for rapid adaptation by comparing hypothetical conditions with real-time needs [30]. The joint planning process is a method used for military planning. Planners utilise it to develop ConOps, force plans, deployment plans and supporting plans that include multiple courses of action. These provide flexible options for joint forces, enabling adaptation to changing operational environments, resource identification and risk mitigation. The organisation of joint forces and the establishment of command relationships depends on the complexity and control required by the operation or ConOps. This process involves determining the types of subordinate commands and the level of authority delegated to each. Commanders also define the authority and responsibilities of functional component commanders, adjusting them as necessary during operations [30].
3.5.1 Force planning and operational design
The goal of force planning is to identify all the assets needed to execute the ConOps. This process involves determining force requirements based on operational phase, mission priority and operating area. It includes reviewing asset requirements, integration planning and refining asset lists [30].
Operational design serves as an analytical framework for planning, helping commanders and planners understand and organise the operational environment as a complex system. Commanders must comprehend the audience and political environment to provide the best advice to civilian decision-makers. Planners ensure that the language is understood, with inter-agency partners being crucial to this process. Operational design is intertwined with the planning process, filling gaps in guidance and information, and providing a framework for addressing the complexity of the operational environment. It supports mission analysis, course of action development and the creation of ConOps with a high likelihood of success [30].
3.6 Basics of rotary-wing aerodynamics
Helicopter aerodynamics studies the numerous complex interactions during the different flight regimes, such as the main rotor wake structure and its effect on the tail rotor, transonic flows over the blades, stalled regions, etc. There are, however, several results based on momentum theory that are simple enough for quick estimations [Reference Glauert31, Reference Johnson32]. The thrust coefficient is one of the most relevant values and is defined as:
Where T is the thrust, A is the rotor disk area,
${\rm{\Omega }}$
is the rotor rotational velocity, R is the rotor radius and
$\rho $
is the flow density. In a similar manner, the power coefficient is defined as:
Being P power, A rotor disk area,
${\rm{\Omega }}$
, R rotor radius and
$\rho $
the flow density. Then, assuming one-dimensional flow for a hovering rotor case, the inflow ratio
${\lambda _h}$
can be defined based on momentum theory as:
3.6.1 Initial rotor sizing example in conceptual helicopter design
This section includes a small illustrative example of rotor sizing for a conceptual helicopter. It is common to start by considering the hover state [Reference Leishman33] and use Equation (3). Historical trends can be used together with weight build-up methods for an initial value of the thrust coefficient [Reference Raymer4], although the additional weight of a water bucket should be considered. An initial consideration for the process could be to keep the thrust coefficient value similar to other helicopters [Reference Kee34], and modify the rotor geometry:
From the additional weight
${\rm{\Delta }}T$
requirement related to the water bucket load, it is possible to use reference values and the relationship above to obtain the variation of the rotor radius
${\rm{\Delta }}R$
for the new geometry. After some manipulation, the increment can be computed using the new relationship:
\begin{align*} {R_{ref}}\left( {\sqrt[4]{{1 + \frac{{{\rm{\Delta }}T}}{{{T_{ref}}}}}} - 1} \right) = {\rm{\Delta }}R \end{align*}
The new rotor radius needs to be checked against operational constraints, such as the need to land in remote wooded terrains, where finding open areas with adequate blade clearance from trees will become more challenging for larger rotor blades. The velocity at the blade tip (
${V_{tip}} = {\rm{\Omega }}R$
) is another constraint, as values beyond
$M = 0.7$
will be in the transonic region, leading to issues such as additional noise.
It is possible to continue the sizing by using blade element analysis [Reference Leishman33], which provides the analytical result that relates the thrust coefficient to the mean lift coefficient of the entire blade:
With
${C_{{l_\alpha }}}$
defined as the blade profile lift-slope,
$\bar \alpha $
as the mean angle-of-attack of the blades,
${N_b}$
as the blade count and
$c$
as the blade chord. It is convenient to introduce the definition of solidity (
$\sigma $
) [Reference Leishman33] as:
The different values in Equation (4) show that if the goal is to keep the thrust coefficient constant at a specific value, the variations of the geometrical values and the aerodynamic properties are going to be correlated. To continue with the example for the initial considerations for the rotor, remaining equations (Equations (4) and (5)) can be combined with Equation (3) to relate the ideal power coefficient to the thrust coefficient:
This relationship is useful for computing power, but it is more convenient for generating a figure of merit (FM) to relate the ideal power to hover and an estimation of a more realistic value of the non-ideal power required [Reference Leishman33]:
The power needed can be approximated without measurements by considering losses. If only induced and profile power losses are considered, it is possible to write:
Where the induced power losses are considered by adding a correction factor
$\kappa $
with a typical value of 1.15 [Reference Leishman33], then:
Then, for a rectangular blade with an assumed constant blade section profile drag coefficient (
${C_{{d_0}}}$
), the profile power coefficient can be found to be [Reference Leishman33]:
It is possible now to plot the figure of merit with respect to thrust coefficient variations. Helicopter typical values [Reference Kee34] for solidity and blade section profile drag coefficient are, respectively,
$\sigma = 0.1$
and
${C_{{d_0}}} = 0.01$
, resulting in Fig. 2. A good performance for hovering rotors is between 0.7 and 0.8 in the figure of merit [Reference Leishman33, Reference Johnson, Withrow-Maser, Young, Malpica, Koning, Kuang, Fehler, Tuano, Chan and Datta35]. The analysis is not extended further for this example, as the process would continue with different considerations and discussions for going deeper into the new concepts’ specifics [Reference Johnson and Sinsay36], which is beyond the scope. Variations in the number of blades, their chord length, aerodynamic profiles, etc., should be considered then.

Figure 2. Figure of merit for a rotor during hover.
3.7 Basics of fixed-wing aerodynamics
For the case of fixed-wing aircraft, there are some useful closed-form solutions under the assumption of inviscid and incompressible flow, especially for finite wings [Reference Anderson37]. An important value is the lift coefficient:
Where L is the lift force,
$\rho $
is the flow density, V is the flight velocity, and S is the wing area. From Equation (11), using the maximum lift coefficient achievable, the stalling velocity can be obtained, which is particularly relevant when designing air tankers that will drop water or retardant:
\begin{align} {V_{stall}} = \sqrt {\frac{{2W}}{{{\rho _\infty }S{C_{L,max}}}}} \end{align}
3.8 System capabilities
Stakeholders define the desired capabilities for the systems to be designed. These capabilities will constrain the requirements for new system concepts, whether they are for aircraft or any other kind. Several of the capabilities that will define the system requirements can be shared between concepts. Some capabilities to satisfy operational needs are defined and listed here, alongside the proposed requirements that will be common to all systems proposed:
-
1. Adaptability: Ability to respond to dynamic and unforeseen changes in the environment. Common requirements for all systems include modular design for new technologies’ integration and easier manufacturing [Reference Desai and Mital38]; capable of handling various weather conditions and terrains.
-
2. Communication: Effective use of communication protocols for coordination between different units. Common requirements for all systems include advanced communication systems for interoperability with ground units and other aircraft; encrypted communication.
-
3. Environmental Assessment: Ability to assess and respond to environmental factors like fire magnitude and weather conditions. Common requirements for all systems include onboard sensors and cameras for monitoring in real-time fire and weather; assessment of terrain features, vegetation types and weather patterns.
-
4. Operational Efficiency: Ability to perform its intended functions by minimising resources needed and maximising output.
-
5. Safety and Emergency Response: Ability to respond to medical emergencies and ensure safety protocols. Common requirements for all systems include fire-resistant features to avoid degradation [Reference Ellis, Bojdo, Filippone and Clarkson39].
-
6. Autonomy: Ability for autonomous operation with minimal human intervention.
3.8.1 Capability and requirement engineering
Capability engineering begins with an understanding of a need, and an exploration of the multiple possible functionalities to achieve a capability, which fulfils the aforementioned need [Reference Lane40]. Available resources are related to existing functions that can contribute to obtaining the new capability. Each potential solution is evaluated to measure performance and cost-effectiveness. Analyses of the SoS are used for the translation of capabilities into requirements in a repeatable and traceable manner. A clear understanding of SoS dynamics is necessary, as it will evolve, resulting in variable performance expectations. The process of translating capabilities into requirements that will support trade-off decisions for the expected level of performance against cost. The process is shown in Fig. 3.

Figure 3. Workflow of capability into requirements translation (adapted from (Reference Lane40)).
Requirements engineering is a systematic process that ensures stakeholder needs are accurately captured and translated into system specifications. To start this process, the necessary capabilities were previously translated into high-level requirements. These are decomposed into detailed functional and performance requirements. This decomposition process involves:
-
• Decomposition into lower-level requirements that specify what the system must do to achieve functional and performance requirements.
-
• Allocation of these requirements to specific system elements, subsystems, or components.
-
• Validation of each level of requirements against stakeholder expectations.
This iterative process continues until a comprehensive set of designs to requirements is established at all levels of the system hierarchy. Traceability is crucial and necessary throughout this process, ensuring that every requirement can be linked back to a stakeholder need or higher-level capability, and that the rationale for each requirement is documented.
Managing requirements effectively involves capturing not only the requirements statements, but also associated metadata from the outset, which is necessary for clarification and linkage among requirements. A general guideline of what each should include is:
-
• Requirement Statements: Clear expressions of what the system must do to achieve the desired capabilities.
-
• Metadata: Information such as the requirement’s unique identifier, source, rationale, owner, verification method and verification level. These points ensure that the intent and context of each requirement are clear, enhancing communication among project stakeholders and facilitating changes when necessary.
The verification methods are beyond the scope of the present work; however, when they are documented early in the process, traceability is improved and the inclusion of any unforeseen requirements found during a verification stage is easier.
3.8.2 Logical decomposition and system architecture development
Logical decomposition is the process of breaking down a system’s required functionalities into discrete elements [Reference Hirshorn, Voss and Bromley16]. The steps in the process are:
-
• Functional Analysis: Identifying and defining the functions to meet any set of desired capabilities.
-
• Functional Allocation: Assigning these functions to system elements and their respective subsystems/components.
-
• Interface Definition: Identifying the interfaces between system elements and/or the outside world.
This process is iterative and relies on its connection with the ConOps to guide the understanding of how a system will be used. Logical decomposition ensures that all required system behaviours are identified and that the role of each subsystem is clearly defined. With this decomposition, the system design concepts are refined through system architecture development. It is a creative and iterative process whose aspects include:
-
• Concept Generation: Propose multiple design concepts that meet the project’s goals.
-
• Stakeholder Involvement: Engaging stakeholders to provide feedback and ensure that the architecture and needs are aligned.
-
• Analysis and Trade-off Studies: Using modelling, simulation and trade-off analyses to evaluate different design options and select the most suitable design.
The process is recursive, with feedback loops necessary to refine requirements, design solutions and provide a robust concept that meets all requirements and is feasible.
3.8.3 Technical standards considerations
Technical standards contribute to compatibility, interoperability, safety and quality while often reducing costs and development time. They provide the normalisations and guidelines that inform various aspects of system development, which, depending on the case, may include specifications for performance, safety, reliability, materials used in manufacturing, verification procedures and interfaces. Appropriate standards reduce risk and increase interoperability with other systems.
4.0 Method
The present work was performed following the design research methodology [Reference Blessing and Chakrabarti41]. The objective is to improve product development by employing simulation as a means to refine requirement definition based on operational needs and system interactions of aircraft. To this end, an evaluative artefact must be generated to assess the outcomes of the simulated operations. This artefact should support the examination of whether the captured results are sufficient for decisions on required functionalities, and whether the identified uncertainties originate from limitations in the simulation model itself.
For a qualitative evaluation of the simulation and the decisions, a questionnaire was sent to fire suppression operations pilots. The current goal is to generate a sacrificial concept to understand the behaviour of the model and gain knowledge to point out the direction to be taken [Reference McGowan, Bakula and Castner42]. The iterative loop in Fig. 4 shows how the design research methodology helps in generating prototypes, gaining knowledge in the process and finding an agreement on the complexity needed with a holistic view of what is necessary for obtaining a satisfactory product.

Figure 4. Creativity focused prototyping loop for sacrificial concepts.
4.1 Interviews with firefighting pilots
The authors provided the following questions to firefighter pilots, and this section collects the summarised answers.
-
1. Mission: When is a wildfire incident considered under control and/or successfully concluded? Both interviewees converge on the view that wildfires are seldom ‘extinguished’ in an absolute sense; instead, operational closure is achieved through phases and gradual demobilisation. Firefighter-1 defines ‘under control’ as the point at which reignition is no longer expected after assets are withdrawn, beginning with higher-cost aerial assets while ground crews continue to secure. Firefighter-2 provides a formal phase taxonomy: stabilised, controlled, finished and observation, clarifying that interior hot spots may persist through the first two phases without implying loss of control. Together, these perspectives indicate the need for both practical simulation termination criteria and post-incident monitoring.
-
2. Situational awareness: What information must be visible before any deployment? Both respondents agreed on a minimum situational awareness set comprising precise fire location, active assets, and communications routing to the Incident Commander. Firefighter-1 stresses unambiguous identification via coordinates and a unique name, plus required frequencies and IC contact. Firefighter-2 adds operationally critical details: presence and frequency of an aerial coordinator; confirmation of ground crews to prevent hazardous drops; availability of nearby loading sites; and environmental/operational checks (visibility, on-scene wind verification, terrain escape planning via reconnaissance). The combination suggests a pre-launch information (coordinates, communications, assets, ground-crew status, visibility) and a mission-planning tier (loading sites, coordinator info, route/egress planning).
-
3. Realism: Which environmental factors must the model represent to be considered reliable? Firefighter-1 considered the question too broad and preferred not to answer. Firefighter-2, on the other hand, considered the proper evolution of the fire and the local orography particularly relevant: the factors that shape a fire’s evolution are usually the wind (changes in both direction and intensity) and its ability to carry embers that start spot fires. Temperature is another critical element. So is the relative humidity in the area, both in the air and in the vegetation. Orography is very important: slopes, valleys, gullies, etc. This affects not only how the fire develops, but also often limits the capabilities of aerial resources, especially the heavier ones.
-
4. Topography: What level of detail in maps/fuels is necessary to plan firelines, strategies, attack plans, etc.? Both firefighters considered that this question should be addressed and answered by a forestry engineer.
-
5. Communication: Which communication behaviours are most important to simulate? Both emphasise the centrality of an aerial coordinator to reduce workload and deconflict traffic, and both describe multi-frequency operations. Firefighter-1 specifies a three-frequency architecture (air-to-air/coordination, ground/incident-management-authority, and company/formation) and notes that the absence of a coordinator forces aircraft-to-aircraft self-coordination with a substantial workload increase. Firefighter-2 highlights typical medium-to-high radio saturation, and the need to model standard phraseology, radio discipline and local-language constraints; an emergency ‘radio-silence’ control is identified as essential for critical events. Jointly, these inputs motivate simulations that involve coordinator presence, explicitly meter channel occupancy, and enforce procedure-aware message flows.
-
6. New systems: What unmanned aerial vehicle (UAV) roles and information cadence would be necessary? Both interviewees cautioned against UAV presence within active manned airspace due to collision and distraction risks. Firefighter-1 proposes a vertical separation convention, manned aircraft if concurrent, while acknowledging high-altitude UAV sensing as a compromise. Firefighter-2 argues for excluding UAVs during manned operations entirely, citing poor UAV situational awareness (no transponder, limited communications) and recommending sensing to be performed from a coordinating aircraft; UAVs are endorsed for night or post-ops mapping. This comparison supports a conservative policy baseline (no mixed ops), with a clearly parametrised alternative allowing strictly controlled vertical separation when mixed operations are unavoidable.
-
7. Operational emergencies: What steps should be taken when a team member is injured? Firefighter-1 considers this question as a ground concern. Firefighter-2 answers based on events experienced and provides an anecdote: if crews are in severe danger – due to entrapment, a fire involving a pumping unit, or similar events – the best course is for the aerial operations coordinator to order radio silence on the channel and issue highly directive instructions about what is needed. As example, during an operation in Portugal, a ground crew member became trapped; radio silence was imposed, and all Canadair aircraft were directed to the crew’s coordinates to drop water, wet the area, and lower temperatures to avert a tragedy. The key in such cases, if aerial support is required, is radio silence, clearing the area of resources that will not participate, and giving very clear instructions on what is required.
-
8. New events: How should the chain of command behave in the simulation when a second ignition appears or priorities change? Both agree that secondary ignitions require immediate attention to prevent rapid spread, and note that aerial assets frequently provide first detection. Firefighter-1 stresses rapid redirection, highlighting the higher success probabilities when spots are small at first engagement. Firefighter-2 frames prioritisation in terms of area sensitivity, implying a risk-weighted allocation rather than a purely first-come response. Together, these inputs argue for a detection-to-tasking pipeline that minimises latency and ranks emergent targets by criticality.
-
9. Uncertainties: What types of uncertainty should be present? Firefighter-1 emphasises logistical and infrastructural constraints such as fuel resupply limitations, airport congestion that decouples aircraft count from drop throughput, and accommodation/transport frictions (diminishing returns from additional aircraft). Firefighter-2 focuses on environmental and obstacle uncertainties such as wind shifts, adverse weather and clouds, and evolving hazards such as new power lines or reservoir modifications, and incomplete information on loading sites.
-
10. For this type of simulation: What would make it useful for training or for an after action review or debriefing? Both endorse standardisation and realism as prerequisites for training value. Firefighter-1 calls for internationally standardised procedures (sectoring, drop types, priorities, formation management) and highlights the off-season as an opportunity for cross-regional training. Firefighter-2 specifies training objectives – resource management, workload handling, prioritisation and safety under changing meteorology and terrain – and identifies comprehensive recording (multi-frequency comms and operational timelines) as essential for debriefs that surface errors and lessons learned. Collectively, these elements define a training feature set that would combine standards-based scenario templates, stress-inducing variability and telemetry capture to enable procedure adherence checks and evidence-based feedback.
4.2 Responsibilities and actions of an Incident Commander (IC)
The main points extracted from ConOps found in the reviewed literature are presented in this section [43, 44]. They will be used as prompts for the selected state-of-the-art LLM [Reference Achiam, Adler, Agarwal, Ahmad, Akkaya, Aleman, Almeida, Altenschmidt, Altman and Anadkat45] and constrain its responses to the required operational concepts. Figure 5 shows a sequence diagram for a directed SoS driven by the LLM-commander. The prompts given follow a latex-based structure, which the authors have found to be easier to interpret by the LLM due to its clear structure.

Figure 5. Sequence diagram for wildfire incident management.
The IC guidelines in terms of responsibilities and communication are listed in this section. All the following points listed here are used as a prompt for the LLM.
-
• Provide complete briefings:
-
– Clearly communicate the task, purpose, and end state of the assignment to subordinates.
-
– Ensure subordinates have a clear understanding of the assignment.
-
-
• Evacuation procedures:
-
– Plan how to get injured personnel to a road or helispot.
-
– Identify the required personnel and equipment for evacuation.
-
– Calculate the time needed to get the injured to a hospital and determine the closest hospital.
-
– Decide on air or ground transportation and consider factors that could affect the timeline.
-
-
• Recognise common denominators of fire behaviour in tragic fires:
-
– Be aware of the conditions under which fatal or near-fatal fires often occur, such as:
-
* Small or quiet areas of large fires.
-
* Light fuels like grass and light brush.
-
* Unexpected shifts in wind direction or speed.
-
* Critical burn period between 1400 and 1700 hours.
-
-
-
• Radio traffic and discipline:
-
– Maintain disciplined and concise radio communication.
-
– Use clear text for all operations.
-
– Follow established procedures for radio calls, identifying the resource being called and the frequency used.
-
-
• Issuing air traffic information and advisories:
-
– Provide clear air traffic information and ensure pilots acknowledge it.
-
– Pass on new information and ensure no missions are launched in conflict areas.
-
– Follow standard scripts for communication to ensure clarity and consistency.
-
-
• Risk management principles:
-
– Accept no unnecessary risk and make risk decisions at the appropriate level.
-
– Balance risk against benefits and integrate risk management into planning and execution at all levels.
-
4.3 Responsibilities and duties of helicopter personnel
The helicopter personnel guidelines in terms of responsibilities and communication are listed in this section. It is assumed that the pilot is responsible for ensuring that all of them are followed; thus, only the pilot is considered for the communication. All the points listed here were used as a prompt for the LLM.
-
• General responsibilities:
-
– Adhere to Federal Aviation Regulations (FAR) and agency-specific regulations.
-
– Coordinate with dispatchers, helicopter managers and/or helibase managers.
-
– Ensure aircraft and communication equipment are in good condition and operable.
-
-
• Flight operations:
-
– Do not deviate from the flight plan without notifying the appropriate dispatch office.
-
– Do not descend below 500 feet above ground level without prior authorisation or for high-level reconnaissance.
-
-
• Risk management:
-
– Accept no unnecessary risk.
-
– Make risk decisions at the appropriate level.
-
– Accept risk only when the benefits outweigh the costs.
-
– Integrate risk management into the planning and execution phases.
-
4.3.1 Helicopter-specific procedures
-
• Landing area selection:
-
– Choose a flat area free of obstructions (trees, poles, wires, etc.).
-
– Ensure the area is clear of stumps, brush, rocks, and any objects over 18 inches high.
-
– Consider wind direction for landing and takeoff.
-
-
• Aerial delivery of retardant/water drop considerations:
-
– Maintain communication with aerial resources.
-
– Identify and communicate flight hazards to aerial supervisors.
-
– Provide specific target information using clear descriptions and signal markers.
-
– Clear the area to avoid direct flights over ground personnel and equipment.
-
-
• Aircraft mishap response actions:
-
– Prioritise life preservation and secure the area.
-
– Exercise caution due to potential hazards at the wreckage site.
-
4.3.2 Communication protocols
-
• Radio discipline and clear text usage:
-
– Ensure all aircraft and ground personnel have compatible radios and frequencies.
-
– Maintain disciplined and concise radio traffic.
-
– Use clear text for all operations, keeping messages brief and to the point.
-
-
• Standard scripts for communication:
-
– Flight following departure script:
-
* Identify the tail number or other designated identifier.
-
* State the frequency being used.
-
* Provide departure location, number onboard, fuel onboard, estimated time enroute (ETE), and destination.
-
* Confirm automated flight following (AFF).
-
-
– FTA (flight traffic advisory) calls in the blind:
-
* Identify the receiving unit.
-
* State the tail number or designated identifier.
-
* Provide distance and direction from the incident, altitude, intent, and frequency.
-
-
-
• Landing and departure communications:
-
– When helicopter contacts helibase:
-
* Provide helicopter number and helibase identification.
-
* Report wind speed and direction.
-
* Notify about traffic status (inbound/outbound aircraft).
-
-
– Before helicopter departs helibase:
-
* Provide helicopter number and helibase identification.
-
* Report wind speed and direction.
-
* Notify about traffic status (inbound/outbound aircraft).
-
-
4.4 Responsibilities and actions of firefighters in firefighting operations
The guidelines for the firefighters responsibilities and communications are listed in this section. It is assumed that the team leader is responsible for ensuring that all of these steps are followed and performing communication when needed. All the following points listed here are used as a prompt for the LLM.
4.4.1 Fireline construction and safety
-
• Establishing Firelines:
-
– Use direct attack whenever possible, constructing a fireline as close to the fire edge as conditions safely permit.
-
– If an indirect attack is required, locate the fireline at an adequate distance from the main fire to allow for completion, firing and holding, considering the predicted rate of fire spread.
-
– Make the fireline as short and straight as practical, using topography to your advantage.
-
– Use existing natural and human-made barriers.
-
-
• Communication Requirements and Procedures:
-
– Assign experienced and competent lookouts at good vantage points with knowledge of crew locations, escape and safety locations, and trigger points.
-
– Confirm radio frequencies and establish backup procedures and check-in times.
-
– Provide updates on any changes in the situation and sound alarms early.
-
-
• Radio Traffic and Discipline:
-
– Use clear and concise radio communication.
-
– Identify yourself and the resource you are calling first, followed by the message.
-
– Follow established procedures for radio calls and ensure disciplined radio traffic.
-
– Confirm receipt of critical information and acknowledge messages.
-
4.4.2 Reporting issues
-
• Reporting Injuries:
-
– Immediately report any injuries to the supervisor or IC.
-
– Provide details about the injury, including the nature and severity of the injury, and the number of injured personnel.
-
– Communicate the need for medical support and transportation.
-
-
• Reporting Stress:
-
– Report signs of fatigue or stress among crew members to the supervisor.
-
– Monitor crew members for symptoms of overexposure to smoke, such as headaches, visual impairment and impaired decision-making.
-
4.4.3 Specific communication scripts
-
• Initial Contact Script:
-
– Identify the resource being called and your identification.
-
– Provide a brief message regarding the situation.
-
-
• Requesting Medical Support Script:
-
– Identify the resource being called and your identification.
-
– State the nature of the injury and the need for medical support.
-
– Provide your location and any relevant details.
-
4.5 Agent-based simulation setup
The simulation is performed using NetLogo [Reference Tisue and Wilensky46], a software designed for modelling and simulation of complex systems using agents. Advanced models for simulating wildfires can be built to include features such as compatibility with Geographic Information System data, dynamic changes in weather conditions, and smoke [Reference Lovaco, Staack and Krus5]. For the present work, however, a simplified version is used and shown in Fig. 6. The model includes a variety of agents, each assigned with a specific role in wildfire suppression scenarios: helicopters, UAVs and firefighters. Among these, the agents representing aircraft are of particular significance for this study, more so than those performing ground operations. The reason for this is the specific interest of the authors in generating requirements for performing ACD and systems engineering. Firefighters will remove material around the fire to stop the spread. The helicopters will assist with water drops, but they can also drop or rescue firefighters at designated points. The UAVs can fly around the map and report areas on fire or loiter over the firefighters. Both firefighters and helicopters can report random emergency calls during the simulation, such as engine problems for the helicopters or health issues for the firefighters. The map limits are (−50,−50) to (50,50). Water areas (shown in blue) and wildfire starting points are generated randomly during the initialisation. It is also possible to start additional fires during the simulation to represent spotting. Time steps are taken dimensionless to simplify, and all agents move 1 position in the grid per time step when they decide to do so. A wildfire spreads every 500 steps in all directions where trees are found. Such a difference in time steps is to ensure the spread of the fire due to the lack of wind in the model, which generally drives the spread and can be approximated using a 10% rule of thumb [Reference Cruz and Alexander47]. In the simulation, everything coloured in yellow is the information the commander can request: UAV’s vision and the locations of firefighters and helicopters. The only information the agents can see or report is provided if and only if an event is triggered or the LLM-commander requests it, and given through the GPT-4 web-based interface.

Figure 6. Agent-based model user interface.
4.6 Incorporation of large language models
The decision tree that the present work aims to replace for an LLM IC was shown in Fig. 1. For the present proof of concept, a web-based chat interface is used for the communication between LLM and the simulation. The roles of the commander and the different agents involved are taken by several instances of the LLM, whereas the information regarding the simulation situation is provided as prompts by the user.
5.0 A use case scenario implementation and evaluation
This section presents the results from the interaction with the LLM and its interactions with the wildfire simulation. Due to the length of the LLM outputs, the most relevant part of the conversation has been extracted and presented here. The ConOps from Section 4.2 are given as an initial context prompt for the commander-LLM, whereas Sections 4.3 and 4.4 are given respectively to the helicopter pilot-LLM and the firefighters-LLM. The simulation driver is the human in the loop, providing prompts to the different LLMs that respectively play the role of IC, helicopter pilot, and firefighter teams and using the output to reconfigure the simulation.
LLM-Incident Commander Reply – 0
‘Initial Briefing and Intent
Task: Our primary task is to contain and control the wildfire while ensuring the safety of all personnel involved.
Purpose: The purpose of this mission is to prevent wildfire from spreading to populated areas, protect natural resources and minimise damage to property and infrastructure.
End State: The end state of this operation is to have the wildfire fully contained with no injuries to personnel, minimal damage to property and a thorough plan for monitoring and managing hotspots.
Summary of Communication Requirements:
-
• Clear and complete briefings.
-
• Pre-operational medical emergency planning.
-
• Tactical pauses and reassessments.
-
• Establish and communicate LCES (lookouts, communications, escape routes, safety zones).
-
• Safe transitions between shifts.
-
• Disciplined radio communication.
-
• Standardised air traffic advisories and acknowledgements.
-
• Integration of risk management into all phases of operations’.
Simulation Driver Prompt – 1
‘The map of a remote area is divided into coordinates from −50 to 50 in the X axis and −50 to 50 in the Y axis. A UAV is patrolling over the area, and a wildfire is reported. It is unknown how long it has been burning and its actual size. One helicopter and two firefighter teams are available in the local agency. Please state your decision and keep consistency with the IC CONOPS’.
LLM-Incident Commander Reply – 1
‘[…] Summary of Actions
-
• UAV: Conduct initial fire location and assessment.
-
• Brief teams: Ensure all personnel are aware of their roles, LCES, and medical plans.
-
• Medical preparedness: Confirm medical personnel, equipment, and evacuation routes.
-
• Deploy helicopter: For reconnaissance or water drops as needed.
-
• Deploy firefighter teams: From strategic direction to create containment lines.
-
• Tactical pause: Reassess the situation around 1400 hours.
-
• Continuous communication: Maintain disciplined and clear radio communication’.
Simulation Driver Prompt – 2
‘The wildfire coordinates are reported to be: (−17.88, 36.07). Upon checking the map of the area, the following are the coordinates of the lakes with water available for water drops: 1) Lake A: (7, −11); 2) Lake B: (−31, 47). The entry point to the map from the helicopter is (−31,−50). Please state your decisions and how you would communicate them’.
LLM-Incident Commander Reply – 2
‘[…] Communication Plan:
-
• To Helicopter Crew: ‘Helicopter team, this is Incident Command. Enter the map at (−31, −50) and proceed directly to Lake B at (−31, 47) to fill the water tanks. After filling up, head to the wildfire at coordinates (−17.88, 36.07) for water drops and aerial reconnaissance. Maintain communication on primary frequency Alpha and report any significant changes. Over’.
-
• To Firefighter Team A: ‘Team A, this is Incident Command. Proceed to coordinates (−17.88, 50) and establish a line approaching the wildfire from the north. Ensure LCES is established and ready for a tactical pause at 1400 hours. Maintain communication on primary frequency Bravo and update status every 30 minutes. Over’.
-
• To Firefighter Team B: “Team B, this is Incident Command. Proceed to coordinates (−17.88, −50) and establish a line approaching the wildfire from the south. Ensure LCES is established and ready for a tactical pause at 1400 hours. Maintain communication on primary frequency Bravo and update status every 30 minutes. Over’.
-
• To UAV Operator: “UAV Operator, continue monitoring the wildfire and provide real-time updates on fire behaviour, spread, and intensity. Report any critical changes immediately on primary frequency, Charlie. Over’.
[…]Summary
-
• Helicopter: Fill water at Lake B, perform water drops, and conduct aerial reconnaissance.
-
• Firefighter Teams A and B: Approach wildfire from the north and south, respectively, ensuring safety protocols (LCES) are in place.
-
• UAV: Provide continuous monitoring and updates.
-
• Communication: Maintain disciplined, clear, and regular communication with all teams, ensuring updates and adjustments are communicated promptly.
Figure 7 shows the situation after following the commander’s orders. Notice the helicopter flying towards the required destination with water.

Figure 7. UAV detecting fires.
Simulation Driver Prompt – 3
The simulation continues. The helicopter pilot reports this upon reaching the wildfire: ‘Alpha frequency, Helicopter six eight x-ray (68X) arrived at coordinates (−17.88, 36.07) for water drops and aerial reconnaissance. Beginning operations now. Over’. At the same time, another fire spot has been reported by the UAV: (−19, 36)
LLM-Incident Commander Reply – 3
‘Acknowledgement and Instructions to Helicopter Pilot:
-
• To Helicopter Pilot (68X): ‘Alpha frequency, this is Incident Command. Acknowledged, Helicopter 68X. Begin water drops at coordinates (−17.88, 36.07) and continue aerial reconnaissance. Be advised that a new fire spot has been reported by a UAV at coordinates (−19, 36). After initial water drops, proceed to assess and report on new fire spots. Maintain visuals on both spots and update on progress. Over’.
-
• To Firefighter Team A: ‘Bravo frequency, this is Incident Command. Be advised that a new fire spot has been reported at coordinates (−19, 36) in proximity to your current target. Adjust your approach to include containment of both fire spots. Maintain LCES protocols and ensure regular status updates. Over’.
-
• To Firefighter Team B: ‘Bravo frequency, this is Incident Command. Be advised that a new fire spot has been reported at coordinates (−19, 36) near the initial fire location. Adjust your approach accordingly to assist Team A in containing both fire spots. Maintain LCES protocols and ensure regular status updates. Over’.
[…]Summary of Actions
-
• Helicopter 68X: Perform initial water drops, then assess and report on both fire spots.
-
• Firefighter Teams A and B: Coordinate to contain both fire spots with adjusted strategies.
-
• UAV: Provide detailed real-time updates and monitoring of fire behaviour for both spots.
-
• Tactical Pause: Reassess the situation at 1400 hours to ensure effectiveness and safety.”
Simulation Driver Prompt – 4
Team B reports the following: ‘Command, Team B, reporting a medical emergency. Firefighter experiencing heat stroke symptoms. We need immediate medical assistance at our current location. Over’.
LLM-Incident Commander Reply – 4
‘Acknowledgement and Instructions to Helicopter Pilot:
-
• To Helicopter Pilot (68X): ‘Alpha frequency, this is Incident Command. We have a medical emergency at Team B’s current location: a firefighter experiencing a heat stroke. Requesting immediate MEDEVAC (medical evacuation). Proceed to Team B’s coordinates. Secure a safe landing zone and prepare for transport. Over’.
-
• To Firefighter Team A: ‘Bravo frequency, this is Incident Command. Be advised that Team B is experiencing a medical emergency. A firefighter with heat stroke symptoms. Continue current operations, but prepare to assist with medical evacuation if required. Over’.
-
• To Firefighter Team B: ‘Bravo frequency, this is Incident Command. Acknowledged, Team B. Medical emergency confirmed. Immediate medical assistance will be dispatched to your current location. Ensure the firefighter is moved to a shaded area and start cooling measures immediately. Maintain hydration and monitor vital signs. Stand by for further instructions. Over’.
[…]Summary of Immediate Actions
-
• Team B: Administer first aid for heat stroke, move the firefighter to a shaded area and prepare for MEDEVAC.
-
• Helicopter 68X: Divert to Team B’s location for MEDEVAC, ensure safe landing and transport.
-
• UAV: Provide aerial monitoring and support for MEDEVAC operations.
-
• Team A: Continue firefighting operations, and remain on standby to assist with evacuation if needed.
-
• All Units: Maintain clear communication and follow updated instructions from Incident Command.
Simulation Driver Prompt – 5
The helicopter is en route to Team B for evacuation. Team A is alone without aerial support from the helicopter 68X. It is possible to ask the UAVs to loiter over the firefighters and help other agencies with an additional two helicopters and three firefighter teams.
LLM-Incident Commander Reply – 5
Summary of Updated Plan of Action
-
• UAV Support: Redirect UAV to provide continuous aerial support and monitoring for Team A.
-
• Request Additional Resources: Contact other agencies for two additional helicopters and three firefighter teams.
-
• Team A Operations: Continue firefighting efforts with UAV support, maintain LCES and report status every 15 minutes.
-
• Helicopter 68X: Complete MEDEVAC, then refill water at Lake B and resume firefighting support.
-
• Continuous Communication: Ensure all units maintain clear and disciplined communication, updating on status and any changes.
Simulation Driver Prompt – 6
Before this prompt, during the conversation, the LLM is informed that it can request help from other agencies. After doing so, it is informed of the following: You receive the following over the radio: ‘Command, Team A, reporting signs of severe exhaustion among crew members. Requesting immediate relief and rotation of resources. Over’. ‘Command, Team C, entering the area from point (−1,50). Over’. ‘Command, Team D, we are entering from point (−1,50). Over’. ‘Command, Team E, now entering from point (−1,50). Over’. ‘Incident Command, this is Helicopter Five Four Zulu (54Z). Entering from point (50, 8). Ready for further instructions. Over’. ‘Incident Command, this is Helicopter two one tango (21T). Entering from point (50, 8). Ready for further instructions. Over’.
LLM-Incident Commander Reply – 6
[…]To all units: ‘All units, this is Incident Command. Be advised that Team A is experiencing severe exhaustion. Teams C, D and E are now entering the area for relief and rotation. Helicopters 54Z and 21T are refilling at lakes A and B, respectively, and will provide aerial support and water drops. Maintain LCES protocols and update status regularly. Over’.
Monitor and Adjust Operations as Necessary
-
• Team A: Provide immediate relief and ensure safe rotation with incoming teams.
-
• Teams C, D and E: Establish LCES, focus on their assigned fire spots and ensure a smooth transition.
-
• Helicopters 54Z and 21T: Conduct water drops and provide aerial reconnaissance.
-
• UAV: Monitor the situation and provide real-time updates.
Summary of Updated Plan of Action
-
• Team A: Immediate relief by Teams C, D and E. Ensure safe rotation and rest.
-
• Teams C, D and E: Take over firefighting operations with clear focus areas and support.
-
• Helicopter 54Z: Refill at Lake A, conduct water drops and provide reconnaissance.
-
• Helicopter 21T: Refill at Lake B, conduct water drops and provide reconnaissance.
-
• UAV Support: Monitor relief operations and fire behaviour, providing real-time updates.
-
• Continuous Communication: Maintain clear and regular communication with all units, updating on the status and any changes.
Figure 8 show the final stages of the operation. The wildfire is being reduced with the different agents around it, continuing until the end of the simulation when the wildfire is completely extinguished. Notice the area where the vegetation has been removed by either fire or the work of the agents.

Figure 8. Wildfire under control.
6.0 Discussion
In this section, the results are discussed in terms of the performance of the LLM as IC, and in terms of operational capabilities for the generation of system requirements.
6.1 Performance of the LLM-IC
The simulation results highlight several key aspects of the LLM’s behaviour and decision-making in wildfire scenarios. There are, however, issues to be addressed using the information from Section 4.1. The LLM effectively tracked the agents’ actions and whereabouts, and it produced coherent and consistent intent, and disciplined radio formats, indicating that an LLM could replace, at least partially, some decision trees while preserving a correct operational structure for an SoS. For the UAV assets, coexistence rules in manned airspace were left implicit, as one of the goals of SoS analysis is to study the impact and conflict of new systems towards product development. However, as the interviewees stated, they populate a reserved airspace. Thus, an initial draft of both possible conflicts and ConOps should be created and included in the initial prompts for the UAV coexistence with manned aircraft. A conservative baseline is to have no mixed operations unless unavoidable, with strict vertical separation in such cases. The current use of the airspace without explicit rules is dangerous and should be corrected via new policies and explicit rules in the prompts. An example in the results was the IC decision of assigning a UAV from surveillance to loitering over Team A for aerial support. It also demonstrated adaptability, but there was no consideration of the potential risks.
The situational awareness was not sufficient: current wind conditions and visibility, or the presence of ground units before drops, were neglected, or even the fire’s magnitude. Even though the LLM adhered to scheduled weather checks, confirmation of the availability of water loading sites or terrain escape routes was neglected. These checks should be explicit preconditions in the LLM context prompts and checks for the SoS state.
During the emergency evacuation, radio silence and clearing the airspace were omitted. Interviews emphasised the figure of a central aerial coordinator. Aerial communications should be organised around this figure for the aerial units, differentiating clearly from the IC. This figure should have multi-frequency capability, standardised phraseology, and the ability to impose radio silence and clear airspace. An unexpected behaviour was the LLM’s use of the NATO phonetic alphabet (‘Tango’, etc.) for naming, demonstrating the value of diverse training for enhancing interoperability and communication across contexts. However, LLM directly utilised all resources available, indicating a focus on maximising effectiveness, possibly due to a lack of cost information. Although it is considered acceptable, since literature suggests that undermanned fire operations are more costly than overmanned ones [Reference Parks48], some reasoning was expected.
Finally, additional realism is necessary in the model. Beyond spotting, the interviewees identified as important: wind shifts and their intensity, temperature variations, the relative humidity of the air and fuel and orography (steep slopes or valleys) that both shape fire behaviour and constrain heavier aircraft manoeuvres. These should be treated as stochastic sources that constrain aircraft operational modes (e.g. approach or hover with external loads).
6.2 Hallucination issues
An emergent by-product of using LLMs is the apparition of hallucinations in the generated replies. An accepted definition of hallucination is a generated content that deviates from the real facts, resulting in unfaithful outputs, with some categories proposed to classify these hallucinations [Reference Rawte, Chakraborty, Pathak, Sarkar, Islam, Tonmoy, Chadha, Sheth and Das49]:
-
• Numeric Nuisance: numeric values are generated that are inconsistent with facts.
-
• Acronym Ambiguity: incorrect expansion of acronyms.
-
• Generated Golem: an imaginary personality is fabricated.
-
• Geographic Erratum: generated locations do not correspond to the associated event.
-
• Time Wrap: a generated text combining two different timelines.
The obtained results presented hallucinations that should be addressed:
-
• Considered a Numeric Nuisance was the tactical pause scheduled at 1400 hours. The period between 1400 and 1700 hours is considered critical, and a re-evaluation of the situation is indeed a good idea, but no tactical pause was ever requested, nor was there any prompt whatsoever. This is considered a numeric detail that can lead to wrong planning of the SoS tasks or life cycle.
-
• Frequencies used (Alpha/Bravo/Charlie) are assigned arbitrarily, which drifts from the protocols given, being close to Acronym Ambiguity. This may result into misleading or confusing communications.
-
• The helicopter assigned for medical evacuation is considered a Generated Golem, as the capability was fabricated since the status of the crew or equipment was not clearly declared nor requested.
-
• The instant availability of the assets added (two helicopters and three ground teams) is considered a Time Wrap hallucination, as they are integrated into the SoS without proper inclusion into the operation planning.
6.3 Operational capabilities
The simulation results must also be discussed in terms of ACD. From the operational capabilities used to fulfil the mission, the top-level requirements are to be extracted for generating new system concepts and adding them to those that could have been given by stakeholders. Four different concepts of aircraft could be proposed, as shown in Tables 1–4, using the results of the simulation. The vehicles need to be able to handle different weather conditions and terrains, since the location of a wildfire is going to be unknown a priori. In terms of collaboration, the various constituent systems must include communication subsystems, which are crucial from an SoS point of view. The communication capabilities must safely ensure interoperability, whether for requesting evacuation or informing about new issues, as observed in the simulation when one team was in distress or the new fire spot started. Another requirement identified is the inclusion of different types of sensors and cameras to provide enhanced situational awareness throughout the mission, which was the main task of the UAV when patrolling the perimeter of the map and supporting the firefighters.
Table 1. Helicopter Concept 1: Firefighting helicopter

Table 2. Helicopter Concept 2: Multi-role support helicopter

Table 3. UAV Concept 1: Fixed-wing configuration

Table 4. UAV Concept 2: Tilt-rotor configuration

Specific requirements extracted for helicopters, shown in Tables 1 and 2. The amount of water to be carried will depend on the system being specific for water drops or for a multi-role one. The requirements for a multi-role will reduce the size of the water bucket in exchange for being able to land in forestry areas to transport or evacuate people, as happened during the simulation. Equation (3) can be used for initial predictions related to the helicopter rotor thrust needed for these different operational situations. For example, a smaller rotor radius will be convenient to be able to land in forestry areas and avoid collisions, but the performance will differ from a helicopter with a bigger radius and specifically for water drops.
For the case of UAVs, Equations (11) and (12) can be used together with common weight estimation methods during the phase of generating the concept of an aircraft [Reference Horvath and Wells50]. These predictions are relevant because, if the UAV is requested to change its mode from surveillance to support, it will fly at a slower velocity and, depending on the needs, being able to either loiter or hover could make a difference. Hence, considering fixed-wing or tilt-rotor configurations is relevant for proposing new concepts, as well as the prediction from Equation (12). The reason is that the fixed-wing concept can be designed with endurance as the main goal, but when in loitering mode, its capabilities to support firefighters could be insufficient. Whereas a tilt-rotor concept will trade an enhanced loitering or hovering capability for reduced endurance. The process of requirements decomposition involves breaking down high-level stakeholder expectations into detailed requirements across the system hierarchy. In this context, the LLM’s performance as an IC in the simulation adds valuable insights for the decomposition process. Its inclusion in a simulation allowed for the dynamic generation of operational scenarios that reflect real-world complexities better than predefined decision trees. By observing the LLM’s decisions, it is possible to derive functional requirements that may not have been explicitly identified by stakeholders. Some of the LLM’s decisions, such as diverting a helicopter for evacuation or requesting reinforcements during the operation, came from certain capabilities that are directly related to the operational requirements of the different aircraft. These operational requirements are satisfied by functionalities that emerge at the aircraft level and are accounted for when generating the concepts [Reference Drego and Steinkellner51]. Since the procedure for including an LLM in a simulation is still under development, a proposed workflow for the analysis could be:
-
1. Capture the episode such as a concise statement of the operational context, agents involved and possible constraints; for example, emergency evacuation of ground units over a wooded area.
-
2. Express as an emergent function using a function/means structure (e.g., MEDEVAC).
-
3. Formulate requirements in capability terms, such as adaptability, communication, environmental assessment, operational efficiency, safety and emergency response, autonomy.
It is important to note that, for avoiding conflictive functionalities, the different modes or states must be analysed at the operational level for both the SoS and single systems. For the SoS, the operation changed from a single into a multi-aircraft operation, whereas for the single system, the firefighting mode of a helicopter switched into evacuation. This leads to conflict analysis, hover margins, etc. In addition, the simulated conditions should change to be able to consider multiple scenarios, such as having many fire spots, different elevations, no water in the vicinity, etc.
6.4 Functional analysis
The results obtained provided interactions that are also useful for logical decomposition and functional analysis, which can be done with the use of function/means trees [Reference Vigolo, Muñoz and De Negri52] and ontologies [Reference Franzén, Staack, Krus, Jouannet and Amadori53]. The process involves translating actions into system functions and sub-functions and the means for achieving them, as shown in Fig. 9. Among the different interactions, as an example of the decomposition process, it can be taken the high-level functionality: helicopter wildfire operations. This can be broken into:
-
• Sub-Function 1: Facilitate emergency evacuation when required.
-
• Means: Rescue hoist, space for a stretcher, first aid resources.

Figure 9. Example of function/means tree for a firefighting helicopter.
Each sub-function can be allocated to specific subsystems, and their interfaces and interactions can be defined. This hierarchical breakdown provides an understanding of all system functions and their relationships, necessary for developing a coherent system architecture. At each level of decomposition, the derived requirements must be validated against higher-level requirements and stakeholder expectations. For example, the LLM’s oversight in assessing the fire’s magnitude indicates a gap in situational awareness capabilities, which addresses the need for real-time fire monitoring and data integration.
7.0 Conclusion
The present work has shown that behaviours modelled with long decision trees can be replaced, at least partially, with an LLM in an ABS. This provides more degrees of freedom for the simulation and extends the analysis of SoS. The ABS was used to evaluate the LLM’s decision-making and its capability to follow operational guidelines as an IC, with a satisfactory outcome. Moreover, from the decisions taken, it was possible to evaluate what kind of capabilities would be required from the constituent systems that are part of the SoS. The operational capabilities found have been related to requirements that can be used for ACD and generate different aircraft concepts, such as helicopters or UAVs.

Figure 10. Incident Command structure for wildfire suppression. Adapted from (56).
Future work will be divided into two different paths for different goals:
-
• First the issues related to the inclusion of an LLM. Hallucination was already discussed, but other gaps need to be researched. Literature has proposed methods for benchmarking [Reference Rawte, Chakraborty, Pathak, Sarkar, Islam, Tonmoy, Chadha, Sheth and Das49], or mitigating hallucinations [Reference Verspoor54], being especially interesting is the addition of another LLMs as a means for mitigation. Since the figure of an aerial commander was mentioned in the interviews, it seems particularly relevant as a means of achieving both an improved simulation in terms of realism and in terms of reduced issues related to hallucinations. A particularly intriguing configuration would be in such a way that both have access to the same information, similar to radio communication, but with the added benefit of hallucination checking. For this setup in the simulation, it is distinctly interesting to use lightweight LLMs, as they were reported to be less prone to hallucination issues [Reference Rawte, Chakraborty, Pathak, Sarkar, Islam, Tonmoy, Chadha, Sheth and Das49], they require less computational power and are easier to fine-tune [Reference Zhalgasbayev, Khauazkhan and Sarsenova55]. As another long-term goal, it is important to study several configurations of the LLMs and their temperature parameter (which adjusts the level of determinism in the answers). Being able to reproduce results is important, especially when designing experiments, and certain parameters must be kept constant. However, the authors consider highly relevant the variability in the answers for the analysis of an SoS, since the resilience of it to errors or misunderstandings is considered very important for finding an optimal configuration. Human mistakes can always happen during an operation, and they provide valuable information for understanding an SoS, designing the ConOps, or finding the configuration with the best performance.
-
• The initial concept proposed for the case study has brought important insights, but it was clearly insufficient. The future work for the model needs to focus on improving the model fidelity and the overall architecture of the SoS. The feedback from the interviewees emphasised the importance of capturing not only fire dynamics but also wind conditions, terrain and smoke. This low fidelity affects the operation and does not allow for proper analysis, especially in terms of commander decisions. No escape routes for ground units could be decided upon, nor could dropping paths be planned for avoiding smoke and mountains. It was also found important the role of an air operation chief [56], as shown in the structure in Fig. 10, particularly relevant when designing the SoS architecture. Once a proper model is achieved, an evaluation can be done by using performance indicators. However, before considering any indicators for the SoS or the systems in it, experts should grade the decisions and behaviour of the agents. In doing so, it would be important for the evaluation of the performance to consider decisions related to tactics, fire behaviour evaluation, water drop planning, resource distribution, ground-aircraft collaboration and radio communication.
Acknowledgements
The research presented in this paper has been performed in the framework of the COLOSSUS project (Collaborative System of Systems Exploration of Aviation Products, Services and Business Models) and has received funding from the European Union Horizon Europe programme under grant agreement No. 101097120. The Swiss participation in the COLOSSUS project is supported by the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract number 22.00609. The authors would also like to acknowledge the Swedish Innovation Agency (VINNOVA) and The Swedish Defence Materiel Administration (FMV) for financial support through the grant 2019-05371. The authors also wish to acknowledge the help of the interviewed special operation pilots J. M. García and D. Moreno Gonzalez. ChatGPT is a language model developed by OpenAI in San Francisco, CA, USA, and is used to partly generate the conversation between the Incident Commander and the agents in the simulation.





