1. Introduction
Analogy is a powerful strategy for creative problem-solving, enabling designers to draw insights from similar or cross-domain contexts (Reference Holyoak and ThagardHolyoak & Thagard, 1996; Reference Singh and CasakinSingh, Casakin, et al., 2015) Effective Design by Analogy (DbA) requires representing source and target knowledge in a unified framework, typically stored in databases for systematic retrieval and mapping. However, maintaining these databases is resource-intensive, especially for cross-domain analogies, which face linguistic and representational challenges.
DbA relies on access (retrieving relevant analogies) and mapping (aligning source and target elements), both of which are cognitively and computationally demanding. Recent advances in Artificial Intelligence (AI), particularly Large Language Models (LLMs), enhance these processes by efficiently storing, retrieving, and reasoning about knowledge across domains.
This paper proposes a systematic DbA pipeline integrating LLMs with graph algorithms to automate and scale analogy-driven design. The framework follows five stages: Retrieval, Mapping, Transfer, Evaluation, and Storage (Reference Ball and ChristensenBall & Christensen, 2022), leveraging LLMs for semantic understanding and graph algorithms for structural organization. This approach enhances efficiency, scalability, and cross-domain applicability, addressing limitations in traditional DbA methods.
2. Literature review
Given its importance in creative problem-solving, numerous DbA approaches have emerged, including biomimetic design and analogical reasoning in engineering and design. These methods can be categorized based on their methodological principles and focus areas.
Natural language processing (NLP) and text mining techniques are often used to bridge terminological gaps between biological and engineering domains. For instance, (Reference Chiu and ShuChiu & Shu, 2007) proposed a bridging method using NLP to uncover less-obvious connections between engineering and biological terminology. Similarly, (Reference Verhaegen, Peeters, Vandevenne, Dewulf and DuflouVerhaegen et al., 2011) applied word co-occurrence and principal component analysis (PCA) to analyze patent data for identifying DbA candidates.(Reference Vandevenne, Verhaegen, Dewulf and DuflouVandevenne et al., 2016) developed SEABIRD, which maps technical systems described in patents to biological systems referenced in academic literature.
Function-based methodologies represent another key category, focusing on the functional characteristics of designs. (Reference Stone and WoodStone & Wood, 1999) introduced a functional basis framework for representing design. (Reference Fu, Murphy, Yang, Otto, Jensen and WoodFu et al., 2015) proposed a patent-based analogy search using functional vector approaches, while (Reference Briana, Julie and TurnerBriana et al., 2015) presented tools such as D-APPS and DRACULA, which integrate functional models with resources like WordNet and the AskNature repository. (Reference Sanaei, Lu, Blessing, Otto and WoodSanaei et al., 2017) devised a text-based system leveraging engineering ontologies and hierarchical function representations to retrieve design analogies.
In addition, computational models have been developed to facilitate biomimetic design and analogical reasoning. For example, (Reference Grace, Gero and SaundersGrace et al., 2015) introduced Idiom, a computational model for analogical mapping that reinterprets object representations. (Reference Oriakhi, Linsey and PengOriakhi et al., 2011) created the WordTree method and its associated tool, WordTree Express (WTE), which visually represent word relationships based on functional design principles. Tools like (Reference Vattam, Wiltgen, Helms, Goel and YenVattam et al., 2011) and VISION (Reference Song, Evans and FuSong et al., 2020) provide innovative approaches to structure-behavior-function modeling and visual interaction for analogical inspiration, respectively.
Model-based analogy approaches offer deeper insights by capturing structural and functional similarities. (Reference Goel and BhattaGoel & Bhatta, 2004) explored model-based analogy (MBA), which transfers generic teleological mechanisms (GTMs) between contexts. Other notable contributions include (Reference Goel, de Silva Garza, Grue, Murdock and ReckerGoel et al., 1997), which uses functional basis models for design modification and verification, and IDEAL (Reference Bhatta and GoelBhatta &Goel, 1996), which extracts generic teleological mechanisms for analogical mapping.
Case-based reasoning (CBR), rooted in analogical reasoning, is another influential method (Reference Hybs and GeroHybs and Gero, 1992). CBR relies on previously encountered cases, using similarity measures to retrieve relevant instances that inform new problem-solving scenarios (Reference PernerPerner, 2014). The CBR process includes four stages—retrieve, reuse, revise, and retain (Reference Aamodt and PlazaAamodt and Plaza, 1994)—to emulate human reasoning. Applications of CBR span various design domains, including architectural design (Reference MubarakMubarak, 2004) and mechanical device development (Reference Qin and RegliQin and Regli, 2003).
Overall, most DbA-inspired approaches focus on ideation and solution recommendation within predefined problem contexts. Earlier, DbA approaches faced significant challenges in managing cross-domain analogies due to linguistic and representational differences, requiring extensive manual effort or rigid databases for analogy retrieval and mapping. These methods often struggled with scalability and flexibility, limiting their applicability in diverse contexts. In contrast, using LLMs enables seamless integration of linguistic, contextual, and semantic reasoning, offering enhanced adaptability and efficiency in retrieving and mapping analogies across varied domains.
3. Methodology
3.1. Overall architecture
The proposed framework implements a systematic pipeline for automating and scaling DbA through the integration of LLMs and graph-theoretic algorithms. The framework employs the Function-Behavior-Structure (FBS) (Reference Gero and KannengiesserGero & Kannengiesser, 2004) ontological framework as its foundational representational paradigm (Reference Goel and BhattaGoel and Bhatta, 2004, Reference Vandevenne, Verhaegen, Dewulf and DuflouVandevenne et al., 2016), operating through five distinct computational phases: Retrieval, Mapping, Transfer, Evaluation, and Storage, as illustrated in figure 1.

Figure 1. Sequence of operations
During the Retrieval phase, LLMs perform systematic extraction of structural and relational information from the design problem specification, encoding it within the FBS ontological framework as a directed dependency graph G(V,E). The vertices V represent functional, behavioral, and structural entities, while edges E encode their interdependencies. Graph-theoretic algorithms, specifically union-find operations, facilitate the identification of functional clusters within G(V,E), enabling the abstraction of these clusters into higher-order functional representations optimized for analogical retrieval.
The Mapping phase implements established analogical reasoning principles (Reference Bhatta and GoelBhatta and Goel, 1996), wherein LLMs execute cross-domain structural mapping operations through the lens of FBS relationships. This approach ensures preservation of functional isomorphisms despite potential structural variations between domains. The resultant mappings undergo bidirectional projection while maintaining topological consistency with the source design’s dependency structure.
The framework employs iterative solution refinement during the Evaluation phase, utilizing quantitative FBS-based metrics to assess functional coherence and problem relevance. The Storage phase culminates in the systematic archival of validated solutions in a normalized FBS representation format, facilitating the development of a comprehensive design analogy repository with robust cross-domain applicability.
3.2. Retrieval phase
The retrieval phase involves leveraging LLMs to extract structural and relational information from the input design problem. This process is divided into two sub-steps:
-
1. Identification of Structures and Relationships: The LLM analyzes the input design context to identify key structural components
$$S = \left\{ {{s_1},{s_2}, \ldots ,{s_n}} \right\}$$ and the relationships or behaviors
$$R = \left\{ {{r_1},{r_2}, \ldots ,{r_m}} \right\}$$ between these structures. Each relationship
$${r_{ij}} \in R$$ is modeled as a dependency between structures
$${s_i}$$ and
$${s_j}$$ .
-
2. Graph Construction: The extracted structures and relationships are represented as a directed graph
$$G = \left( {V,E} \right)$$ , where:
•
$$V$$ is the set of vertices, corresponding to the identified structures
$$S$$ ,
•
$$E$$ is the set of directed edges, representing relationships
$$R$$ .
Formally, an edge
$${e_{ij}} \in E$$
is defined as:

where
$${s_i},{s_j} \in V$$
and
$${r_{ij}}$$
encodes the nature of the dependency.
This structured graph
$$G$$
serves as the foundation for downstream reasoning and analogy generation tasks. Figure 2 shows an example graph generated for the components of a motorcycle.

Figure 2. Directed graph depicting the structures and relationships for motorcycle design example
3.3. Component formation using union-find
After constructing the dependency graph
$$G = \left( {V,E} \right)$$
, we perform a union-find operation to partition the graph into connected components. Each connected component represents a unique function within the design problem.
3.3.1. Union-Find algorithm
The union-find algorithm is applied to efficiently group nodes into disjoint sets based on their connectivity in
$$G$$
. This process consists of two primary operations:
• Find: Determines the representative element (or root) of the set to which a node
$$v \in V$$
belongs. Formally:

• Union: Merges two sets containing nodes
$${v_i}$$
and
$${v_j}$$
if there exists an edge
$${e_{ij}} \in E$$
between them. This operation is defined as:

3.3.2. Component formation
Using the union-find operations, the nodes
$$V$$
are grouped into disjoint sets
$$C = \left\{ {{C_1},{C_2}, \ldots ,{C_k}} \right\}$$
, where each set
$${C_i}$$
corresponds to a connected component. Formally, a connected component
$${C_i}$$
is defined as:

Each component
$${C_i}$$
represents a unique function, encapsulating the structural elements and their relationships that contribute to that specific functionality. This step mirrors methods for function-based clustering Fu et al., Reference Fu, Murphy, Yang, Otto, Jensen and Wood2015.
3.4. Abstract function creation
After grouping the nodes into connected components
$$C = \left\{ {{C_1},{C_2}, \ldots ,{C_k}} \right\}$$
using the union-find operation, we assign an abstract function to each component. These abstract functions encapsulate the collective behavior and structural dependencies of the nodes within their respective groups while preserving the original graph topology.
3.4.1. Abstract function representation
Each connected component
$${C_i}$$
is mapped to an abstract function
$${F_i}$$
, where:

and
$${C_i}$$
represents the set of nodes
$$\left\{ {{v_1},{v_2}, \ldots ,{v_n}} \right\}$$
and their internal dependencies. The abstract function
$${F_i}$$
is designed to generalize the behavior of the component while hiding low-level implementation details. Figure 3 shows the identified abstract functions for each connected component from figure 2.

Figure 3. Functions graph preserving the topology of the structure graph - motorcycle design example
3.4.2. Graph transformation
The dependency graph
$$G = \left( {V,E} \right)$$
is transformed into a higher-level abstract graph
$$G{\rm{’}} = \left( {V{\rm{’}},E{\rm{’}}} \right)$$
, where:
•
$$V{\rm{’}} = \left\{ {{F_1},{F_2}, \ldots ,{F_k}} \right\}$$
represents the set of abstract functions,
•
$$E{\rm{’}}$$
represents the dependencies between abstract functions, derived from the original graph
$$G$$
.
An edge is created between
$${F_i}$$
and
$${F_j}$$
if there exists at least one edge
$${e_{xy}} \in E$$
, where
$${v_x} \in {C_i}$$
and
$${v_y} \in {C_j}$$
. Formally:

3.4.3. Preserving topology
The abstract graph
$$G{\rm{’}}$$
retains the original graph’s topology, ensuring that the hierarchical structure of functions and their dependencies remains consistent. This abstraction facilitates the application of reasoning and design analogy in subsequent phases.
3.5. Retrieval of analogical structures
With the abstract functions
$$F = \left\{ {{F_1},{F_2}, \ldots ,{F_k}} \right\}$$
defined for each component, the next step involves leveraging a Large Language Model (LLM) to retrieve analogical structures from various domains that exhibit similar functionality.
3.5.1. Analogical retrieval framework
For each abstract function
$${F_i}$$
, the LLM is queried to identify structures
$${A_i} = \left\{ {{a_1},{a_2}, \ldots ,{a_m}} \right\}$$
from diverse domains that are analogous to
$${F_i}$$
in terms of functionality. The retrieval process can be formalized as:

where
$$D$$
represents the set of available domains, and
$${\rm{LLM}} - {\rm{Retrieve}}$$
is the retrieval mechanism powered by the LLM. The retrieved structures
$${A_i}$$
are ranked based on their functional similarity to
$${F_i}$$
.
3.5.2. Functional similarity metric
To ensure the retrieved structures are relevant, a functional similarity metric
$${\rm{Sim}}\left( {{F_i},{a_j}} \right)$$
is computed for each candidate
$${a_j} \in {A_i}$$
. The similarity score is derived from the LLM’s embeddings and is defined as:

where
$${\rm{Embed}}\left( \cdot \right)$$
denotes the LLM-generated embedding of the input, and
$${\rm{cos}}\left( { \cdot , \cdot } \right)$$
is the cosine similarity function. This is reminiscent of techniques such as those used by Chiu and Shu, Reference Chiu and Shu2007.
3.5.3. Selection of analogical structures
Based on the similarity scores, the top
$$p$$
structures
$$A_i^{{\rm{top}}} \subseteq {A_i}$$
are selected for each abstract function
$${F_i}$$
:

where
$$\tau $$
is the similarity threshold. These selected structures represent the most relevant analogies for
$${F_i}$$
across domains, similar to PCA-based methods in Verhaegen et al. (Reference Verhaegen, Peeters, Vandevenne, Dewulf and Duflou2011).
3.5.4. Preservation of abstract graph topology
The retrieved analogical structures
$${A^{{\rm{top}}}} = \left\{ {A_1^{{\rm{top}}},A_2^{{\rm{top}}}, \ldots ,A_k^{{\rm{top}}}} \right\}$$
are integrated back into the abstract graph
$$G{\rm{’}}$$
while preserving its topology. Each node
$${F_i}$$
in
$$G{\rm{’}}$$
is replaced by its corresponding analogical structure
$$A_i^{{\rm{top}}}$$
, maintaining the edge connections
$$E{\rm{’}}$$
between abstract functions. Figure 4 shows an example for the analogous structures for each abstract function identified in figure 3.

Figure 4. Graph with analogous structures for motorcycle design example
3.6. Mapping and transfer
In the mapping and transfer phase, we proceed in topological order to transfer the function, behavior, and structure from the retrieved analogical structures to the source design problem. The goal is to ensure that the subsequent nodes in the analogy graph remain compatible with each other by respecting the topological dependencies.
3.6.1. Topological ordering
Given the abstract graph
$$G{\rm{’}} = \left( {V{\rm{’}},E{\rm{’}}} \right)$$
of the source design problem, we first compute a topological order of the abstract functions
$$F = \left\{ {{F_1},{F_2}, \ldots ,{F_k}} \right\}$$
. The topological order
$$\pi $$
is defined as a sequence of functions such that for every directed edge
$$e{{\rm{’}}_{ij}} = \left( {{F_i},{F_j}} \right) \in E{\rm{’}}$$
,
$${F_i}$$
appears before
$${F_j}$$
in
$$\pi $$
. Formally:

where
$$\pi = \left\{ {{F_{\pi \left( 1 \right)}},{F_{\pi \left( 2 \right)}}, \ldots ,{F_{\pi \left( k \right)}}} \right\}$$
is a valid topological order.
3.6.2. Function, behavior, and structure mapping
For each function
$${F_i}$$
in the topologically ordered sequence
$$\pi $$
, we map the function
$${F_i}$$
, its associated behavior
$${b_i}$$
, and its structure
$${s_i}$$
to the corresponding components
$$A_i^{{\rm{top}}}$$
of the retrieved analogical structure. The mapping process is formally defined as:

This ensures that the function
$${F_i}$$
from the source design is transferred to the analogous structure, preserving both its behavior and structure.
3.6.3. Ensuring compatibility
By following the topological order, we ensure that each node
$${F_i}$$
is mapped before its dependent nodes
$${F_j}$$
(where
$${F_i} \to {F_j}$$
in
$$E{\rm{’}}$$
). This guarantees that the behavior and structure of
$${F_i}$$
are compatible with those of
$${F_j}$$
, ensuring that the subsequent mappings do not violate any functional dependencies within the analogy graph.
3.6.4. Transfer process
The transfer is performed iteratively as follows:
• For each function
$${F_i}$$
in the topological order
$$\pi $$
, retrieve its corresponding analogical structure
$$A_i^{{\rm{top}}}$$
from the analogy graph.
• Transfer the structure
$${s_i}$$
, behavior
$${b_i}$$
, and function
$${F_i}$$
from
$$A_i^{{\rm{top}}}$$
to the source design problem, ensuring that the analogical structure aligns with the source.
• This process continues until all functions, behaviors, and structures are mapped and transferred to the source.
3.7. Solution storage and future retrieval
The mapping and transfer phase yields a set of design solutions by analogy, each of which encapsulates a functional, behavioral, and structural mapping from the analogical structures to the source design problem. These solutions are then stored in a vector database for efficient future retrieval.
3.7.1. Solution representation
Each solution
$${S_i}$$
is represented as a vector in a high-dimensional embedding space. The vector
$${{\bf{S}}_{\bf{i}}}$$
encodes the combined function, behavior, and structure of the analogical solution:

where
$${F_i}$$
is the function,
$${b_i}$$
is the behavior, and
$${s_i}$$
is the structure of the mapped solution. The embedding function
$${\rm{Embed}}\left( \cdot \right)$$
generates a vector representation for each solution that captures its key characteristics in the design space.
3.7.2. Vector database storage
All solutions
$$S = \left\{ {{{\bf{S}}_1},{{\bf{S}}_2}, \ldots ,{{\bf{S}}_{\bf{n}}}} \right\}$$
are stored in a vector database, such as FAISS or Pinecone, which supports efficient similarity search operations. The vector database allows for fast retrieval of solutions based on their functional similarity to new design problems. Formally, for a new query vector
$${\bf{Q}}$$
representing a new design problem, the closest solutions
$${{\bf{S}}_{\bf{i}}}$$
can be retrieved using a similarity metric:

where
$${{\bf{S}}_{\bf{i}}}^{{\rm{best}}}$$
represents the solution most similar to the query vector
$${\bf{Q}}$$
, and
$${\rm{Retrieve}}\left( \cdot \right)$$
is the retrieval function based on cosine similarity or another distance metric.
3.7.3. Future retrieval
By storing the solutions in a vector database, future design problems can be efficiently matched with the most relevant analogical solutions. This retrieval process enables quick adaptation of previous analogical designs to new contexts, facilitating faster design iterations and improving design efficiency over time.
4. Results
4.1. Case study - motorcycle design
The result obtained from the proposed framework for ’Motorcycle Design’ is presented in Table 1 (edited to tabular format for clarity)
Table 1. Motorcycle System Overview

The results of the proposed framework present a structured, systematic, and modular approach to motorcycle design. By breaking down the vehicle into distinct subsystems and then solving by analogy, it ensures that every aspect of the motorcycle’s functionality—energy generation, conversion, translator, shock absorber, etc.—is optimized individually. It integrates advanced engineering concepts such as piezoelectric energy harvesting, thermophotonic devices, shape-memory alloys, and graphene-based systems. This method ensures a high level of technical precision and modularity, making it well-suited for real-world engineering applications where individual subsystems can be optimized independently.
5. Discussion
This work presents a theoretical framework integrating LLMs and graph algorithms within the FBS ontology for DbA. The framework systematically addresses analogical mapping while maintaining functional consistency through structured graph representations. The utilization of FBS ontology as a common semantic framework enables cross-domain translation, addressing a fundamental challenge in analogical reasoning.
The incorporation of LLMs represents a methodological shift from conventional approaches dependent on curated databases or domain-specific knowledge bases. This enables exploration of a broader solution space without extensive data collection requirements. However, this approach necessitates careful consideration of prompt engineering methodologies, potential memorization artifacts in LLM outputs, and development of quantitative metrics for evaluating analogical relevance.
The implementation of graph algorithms, specifically union-find operations and topological ordering, provides a formal mechanism for preserving structural consistency in compound analogical structures. While this approach establishes mathematical rigor in maintaining functional relationships, further investigation is required regarding optimal graph representations for diverse design problems and computational scalability for complex dependency structures.
Several theoretical and methodological considerations emerge from this conceptual framework:
-
1. Computational complexity: The scalability of graph operations and LLM query optimization requires systematic evaluation, particularly for extensive dependency networks.
-
2. Ontological framework: The efficacy of FBS as a universal translation mechanism across heterogeneous domains demands rigorous investigation.
-
3. Current development: Ongoing research focuses on optimizing LLM query formulation and graph transformation operations, including the development of quantitative metrics for analogical relevance and implementation of vector-based solution storage systems.
As a conceptual framework, this research contributes to the DbA domain through a theoretical foundation for automated analogical reasoning. The synthesis of FBS ontology, LLMs, and graph algorithms establishes a systematic methodology for cross-domain analogical mapping while maintaining functional consistency. However, substantial research remains in empirical validation, metric development, and addressing technical constraints in LLM reliability and graph representations.
Future research directions include systematic framework evaluation and theoretical refinement. Additionally, investigation of framework behavior across diverse design domains will provide insights into generalizability constraints.
6. Conclusion
This paper presents a theoretical framework for DbA that combines LLMs with graph algorithms. The framework employs the FBS ontology as a basis for cross-domain translation, supported by a mathematical formulation for maintaining structural dependencies. By representing design problems as dependency graphs and utilizing union-find operations for functional clustering, the framework provides a systematic approach to handling compound analogical structures.
The integration of LLMs with graph-theoretic operations offers a mechanism for exploring cross-domain analogies while preserving functional relationships. The framework’s formalization of the retrieval, mapping, and transfer processes establishes a theoretical foundation for systematically generating and evaluating design analogies.
While the framework demonstrates potential in automating DbA, certain limitations should be acknowledged. The quality of analogical retrieval depends significantly on the LLM’s training and its ability to understand domain-specific technical concepts. Additionally, the framework’s current formulation assumes that functional relationships can be effectively captured through graph structures, which may not hold true for all design scenarios.
Future research directions include the development of robust evaluation metrics for assessing the quality of retrieved analogies, investigation of methods to incorporate domain-specific constraints into the mapping process, and exploration of techniques to handle temporal and dynamic aspects of design problems. The framework could also benefit from integration with existing design tools and methodologies to enhance its practical applicability in real-world design scenarios.