1. Introduction
Since its introduction in the 1970s, Virtual Product Development (VPD) has evolved into a standard procedure across various industries (Reference Horváth, Gerritsen and RusákHorváth et al., 2010), ranging from automotive and aerospace (Reference Pfouga and StjepandicPfouga & Stjepandic, 2018) to personal electronics (Reference KerttulaKerttula, 2006) and clothing (Papahristou, Reference Nehuis, Ruelas, Stechert and Vietor2016). Several factors have contributed to this essential role of VPD within the product development process (Stark et al., Reference Stark, Krause, Kind, Rothenburg, Müller, Hayka and Stöckert2010). One major driver is globalization and the rise of distributed teams, which has led companies to collaborate across multiple regions, necessitating virtual platforms for design, simulation, and testing (Reference Montoya, Massey, Hung and CrispMontoya et al., 2009). In such environments, collaboration across departments and teams is crucial for ensuring that VPD processes are efficient and successful (Reference Adegbola, Adegbola, Amajuoyi, Benjamin and AdeusiAdegbola et al., 2024). Computer-aided design (CAD) is a fundamental tool in collaborative product development workflows, enabling engineers to virtually design and test concepts prior to manufacturing (Reference Azemi, Mehmeti and MalokuAzemi et al., 2018).
In large organizations, it is not uncommon for thousands of individuals to collaborate on complex CAD assemblies. The successful coordination and integration of different components are essential to ensure all elements function seamlessly within the final design. Achieving this requires careful oversight and synchronization of activities across various teams and phases of development. To support such complex workflows, maintaining transparency and efficient information flow is crucial.
Research indicates that, despite advancements in CAD tools, significant challenges remain in knowledge sharing, information accessibility (Cheng et al., Reference Cheng, Davis, Zhang, Zhou and Olechowski2023) and coordination between teams (Larsson et al., Reference Larsson, Törlind, Karlsson, Mabogunje, Leifer, Larsson and Elfström2003). These issues often result in fragmented knowledge, where important information is scattered among various teams and systems, leading to miscommunication, and operational inefficiencies (Cheng et al., Reference Cheng, Davis, Zhang, Zhou and Olechowski2023).
Furthermore, Cheng et al. (Reference Cheng, Davis, Zhang, Zhou and Olechowski2023) highlight the lack of designer awareness regarding model dependencies and collaborators' changes to the model. This issue becomes particularly challenging in large-scale teams working on complex and highly interconnected virtual designs, where even small modifications can have an extensive impact on the entire system.
Knowledge graphs, a semantic representation method for organizing information, have gained increasing attention from both industry and academia, particularly in their application within CAD environments (Xiao et al., Reference Xiao, Zheng, Shi, Du and Hong2023). These knowledge graph based representations have demonstrated effectiveness in resolving issues related to product development, particularly by facilitating the extraction and recommendation of multidisciplinary knowledge (Reference Bharadwaj and StarlyBharadwaj & Starly, 2022). Approaches for combining knowledge graphs and CAD vary widely. For instance, Huet et al. (Reference Huet, Pinquié, R., Véron, Mallet and Segonds2021) introduce a context-aware design assistant that helps the engineer select and apply design rules during the construction of CAD models. Bharadwaj and Starly (Reference Bharadwaj and Starly2022) propose using knowledge graphs to analyze shape similarities of components, enabling design reuse of frequent co-occurring parts.
Despite the growing popularity of knowledge graphs in the CAD setting, current applications still fall short in incorporating specific engineering groups into the graph for enhanced collaboration. Existing approaches primarily focus on technical aspects, such as design reuse and rule enforcement. There is a significant gap in connecting these specialized engineering teams within the knowledge graph, as well as incorporating CAD-relevant information such as component dependencies, version histories, and design requirements. Bridging this gap could greatly enhance cross-functional collaboration by providing a view of both human expertise and technical design data. Resulting in better communication and coordination across teams as well as improving the overall design integrity and ultimately accelerating the development process.
In this study, we introduce a methodical approach for constructing a knowledge graph to explore the collaborative dynamics of virtual product development. We specifically investigate the potential of a knowledge graph that integrates geometrical and CAD data alongside user- and task-related information. Addressing key challenges such as information retrieval, identifying appropriate experts for resolving issues at model interfaces, and ensuring efficient flow of information to relevant stakeholders.
The research question we address is: What characterizes the steps required to construct a knowledge graph for virtual product development and how can it improve collaboration and knowledge sharing among diverse engineering teams?
To answer this, we adopt a bottom-up approach, which allows us to derive insights from our case study and determine what is necessary for broader application across various industries.
This research was conducted within the BMW Group, where we investigate the concept of knowledge graphs within the virtual product development process.
The remainder of the paper is structured as follows: Chapter 2 investigates the methodology and relevant data to be incorporated into the knowledge graph. Chapter 3 addresses the defined use-cases and user roles, while Chapter 4 presents the resulting ontology and data format, which form the foundation for the graph implementation discussed in Chapter 5. Chapter 6 provides a discussion of the results, and the paper concludes with Chapter 7.
2. Methodology
A knowledge graph represents a suitable method for efficiently retrieving information and knowledge from a growing amount of data (Tiwari et al., Reference Tiwari, Al-Aswadi and Gaurav2021). In this chapter, we explore which types of information are most crucial for development engineers and must be efficiently accessed to enhance their workflow, thereby determining what should be incorporated into the knowledge graph. Furthermore, we examine the significance of this information and identify gaps in the current data landscape that hinder its effective utilization.
This investigation involved engaging with multiple stakeholders within the company, applying an iterative process that encompassed workshops and unstructured interviews over a timeline of several months. The predominant group engaged in the workshops and interviews comprised of engineers from the case company, as they were identified as the primary end-user for the framework under development. This iterative process was essential for gaining a comprehensive understanding of the individual workflows and the challenges in virtual product development, particularly in determining how various information sources are interrelated. Understanding these pain points was crucial for identifying inefficiencies and areas for improvement in their data management and collaboration processes. This analysis results in the delineation of the use-cases and user roles, as described in Section 3.
The primary dataset consists of the CAD data, which is comprising of CADParts, CADProducts, geometrical sets, surfaces, lines, curves, planes, points, axis systems and parameters, each defined by a unique identifier. This data is highly interconnected across neighboring models. However, as Cheng et al. (Reference Cheng, Davis, Zhang, Zhou and Olechowski2023) noted, it is often challenging for engineers to maintain full awareness of these connections, largely because most CAD software lacks the capability to visualize these relationships or provide reminders that they exist. Furthermore, these connections are often lost when working across different CAD platforms and models. Additionally, the consistency of these links can depend heavily on individual working styles, weather information is simply copied or copied with an associated link. These dependency relationships are crucial to ensure the final product and its submodules work seamlessly together. The working hypothesis is that to address this, a knowledge graph database is required, capable of tracking these connections and validating that shared parameters or geometrical objects remain consistent across all models that incorporate this information.
Additionally, it is essential to consider data originated outside the CAD system that serve as requirements for CAD models, such as legal regulations or customer preferences. These requirements significantly influence the CAD development process. However, in industries like automotive and aerospace, where development cycles often span several years (Chacko, Reference Chacko2007), these requirements can evolve during the development process. Furthermore, globalization has introduced products to new markets, each characterized by distinct regional-specific requirements, thereby adding more complexity to the CAD development process (Reference Nehuis, Ruelas, Stechert and VietorNehuis et al., 2013). Currently, this external data is not directly integrated into the CAD environment, meaning that changes in the requirements could go unnoticed. In our industry scenario, such updates are typically communicated through meetings or emails. However, they can be overlooked due to the large volume of information. Therefore, a link between the requirements and CAD models is needed. Conversations with engineers highlight the importance of being immediately informed when modifications to their models are necessary due to new requirements. Additionally, it is crucial for them to understand why a model update is required and to trace the source of the requirement change.
One critical aspect often overlooked by other approaches, as mentioned in the introduction, is the user-related data, which is essential for capturing the collaborative aspects of product development. In our approach, we integrate information about specific engineers working on CAD models, including email address, name, department, and tasks/responsibilities. This data is currently absent from most CAD software, which hinders effective resolution of issues involving shared objects between neighboring models, as it is not always clear who is responsible. Furthermore, when requirements change, additional stakeholders may become involved, further complicating the issue resolving process. For clarity, in the following sections, we will refer to the engineer as the user, as the engineer is the primary user of the developed approach.
In summary, the three primary categories integrated into our knowledge graph are CAD data, including specific geometrical information, requirements, and user-related information. The precise details of what is encompassed within each category, along with additional elements incorporated into the graph, will be further elaborated in the subsequent chapters.
3. Use-cases and user roles
In this chapter, we present the main use-cases and the various user roles, along with the specific information each role is interested in. Following an investigation of process structures and workflows, we identified three different user roles. Additionally, we defined three primary use-cases for the proposed knowledge graph approach:
-
Information Retrieval,
-
User and Issue Management,
-
Tracking Changes.
These use-cases reflect the key functions of the approach, aimed to improve collaboration, information access, and efficiency in virtual product development. Each of the three use-cases is explained in more detail in the following.
The Information Retrieval use-case focuses on improving the management of complex CAD models by providing users with access to relevant information, such as design specifications, regulatory requirements, and customer preferences, directly linked to their models. By integrating and organizing data from multiple sources on one platform, the system enhances data availability and operational efficiency. Engineers are being informed of evolving requirements and can easily track how these changes impact their designs. This supports more informed decision-making and streamlined workflows.
The User and Issue Management use-case is designed to build user groups and to facilitate cross-departmental communication when inconsistencies or issues arise in CAD models. By integrating user-related information, such as responsibilities, contact details, and department affiliations, the knowledge graph enables quick identification of the relevant stakeholders. This ensures that model inconsistencies are promptly addressed by the appropriate person, improving response times and reducing bottlenecks in resolving issues across complex projects.
The use-case of Tracking Changes focuses on the ability to monitor and trace changes throughout the development lifecycle of CAD models. The knowledge graph provides a comprehensive view of all model dependencies and modifications, including design updates and requirement shifts, ensuring that engineers are aware of how each change affects interconnected components. By maintaining a clear history of changes, the system enables users to identify the source of updates and assess their impact on the overall design.
After introducing the use-cases we now discuss the different roles a user can assume within our knowledge graph approach. These three user roles are: Owner, Modifier and Consumer.
The Owner is responsible for a specific virtual model and holds the authority to publish information and updates related to that model. The Owner's primary tasks involve constructing and updating the model, facilitating communication with relevant stakeholders, and ensuring that the model remains consistent with neighboring models and conforms to the requirements. The Owner is interested in information that supports the model's construction and helps in resolving issues in different development stages. Relevant information includes the identification of key stakeholders, evolving requirements, and changes in neighboring models.
The Modifier may belong to the same the department as the Owner and is authorized to make changes to the model. The Modifier can also create alternative versions of the model for testing purposes. Their primary interest is related to information that helps identify potential solutions to an issue, which may include data from other models or insights from relevant stakeholders. Owner and Modifier collaborate closely on the model with only the Owner being allowed to publish the model for broader access. In the event of the Owner's absence, due to vacation or sickness, the Modifier temporarily assumes the responsibilities of the Owner.
The Consumer is typically an engineer who utilizes information from other CAD models, for example, to properly position their model or part, or reference dimensions from other parts. Consumers may also include non-CAD engineers or other stakeholders, such as people form the management level, who seek specific information from a model, such as whether it complies with established requirements.
After introducing the use-cases and user roles in our approach, the next chapter delves into the knowledge graph ontology and its respective entities and relations.
4. Ontology
After establishing the use-cases and user roles, the next step involves identifying the entities required in our knowledge graph based on these use-cases and roles, followed by defining the relations between them. This process involves constructing an ontology, where knowledge instances are organized in a predefined schema. This method of creating knowledge graphs is referred to as top-down approach (Reference Zhao, Han and SoZhao et al., 2018).
From an automotive perspective, the first entity in the graph corresponds to the project, which represents a specific vehicle architecture. This architecture is configured differently for various markets, such as the European or Chinese market. These configurations are represented by a virtual vehicle model (referred to as the CADProduct), which consists of multiple CADParts. Each CADParts is further defined by geometrical objects and geometrical sets. A geometrical set is a collection of multiple geometrical objects, while a geometrical object refers to fundamental design elements such as lines, points, axis systems, surfaces, solids, curves, and planes. Geometrical objects are defined by parameters or calculated through parameter-driven functions. Parameters can often represent specific requirements, such as the vehicle's length, ensuring that design elements comply to predefined specifications. When multiple parameters are involved in defining the properties of a geometrical object, a function is used for the calculation. For instance, a point may be defined as half the vehicle's length by applying a factor of 0.5 to the length parameter. In more complex cases, geometrical objects are derived from a combination of several parameters, reflecting intricate design dependencies within the model.
Another crucial entity is the user, who interacts with CADParts, geometrical objects, and parameters. The relationships “Owned_By,” “Modifierd_By,” and “Consumed_By” define the roles of Owner, Modifier, and Consumer for each user, respectively. Additionally, we introduce an “Issue” entity, which can be reported by and assigned to a user. Figure 1 displays the complete structure illustrating how these components relate to each other.

Figure 1. Ontology structure of the knowledge graph for collaborative virtual product development
The “Issue” entity has an additional relationship, termed “Occurs_In,” which connects it to other entities such as CADPart, Parameter, and GeoObject. For clarity and improved visibility, this relation is displayed separately in Figure 2 as a subset of the ontology presented in Figure 1.

Figure 2. Ontology of the issue sub-context
Table 1 outlines the properties and descriptions of all nodes depicted in the ontology in Figure 1.
Table 1. Nodes with properties of the ontology schematic

4.1. Application of the RFLP method
The RFLP method is a product definition approach in systems engineering that represents the requirements (R), functional (F), logical (L) and physical (P) structures (Reference Horváth and RudasHorváth & Rudas, 2014). The requirements component outlines the specific criteria and constraints that guide the product design process. The functional perspective outlines how the system operates to achieve its objectives (Haberfellner et al.,Reference Haberfellner, Weck, Fricke and Vössner2019). Technical functions describe the relationships between input and output variables. Together, these technical functions form the functional architecture of the complete vehicle, illustrating the system's interrelations and functional dependencies (Krog et al.,Reference Krog, Sahin and Vietor2022). The logical aspect represents the logical connections within the model (Horváth & Rudas, 2014). The physical perspective describes the technical implementation of the system architecture (Reference Krog, Sahin and VietorKrog et al., 2022).
The RFLP approach has proven to be an effective approach for enhancing transparency and traceability, promoting collaboration across engineering disciplines, and improving process efficiency, ultimately ensuring that the target system meets the requirements while reducing development time (Reference Kleiner and KramerKleiner & Kramer, 2013).
This method can be adapted to our virtual product development knowledge graph approach. We define the RFLP for our system as follows: Requirements, such as length of the vehicle, are represented by the parameters within the graph (see Figure 1). We then implement mathematical functions to capture the system's functional dependencies, corresponding to the entity “Function”. The logical structure reflects the logical connections between entities in the graph, such as how different geometrical objects interact to form a part. Finally, the physical layer represents the implementation of the system architecture at the virtual component and geometry level, corresponding to the entity CADProduct in the ontology in Figure 1.
4.2. Standard data extraction format
Based on the defined ontology, a standardized data format can be established, enabling the creation of a knowledge graph for every virtual vehicle model within the company. This approach is adaptable to other CAD assemblies as well. Specifically, requirements and geometrical data are extracted from the CAD data.
An Excel table serves as a practical tool for collecting the extracted data, which can later be converted into knowledge graph-compatible formats. One such format is the JavaScript Object Notation (JSON), supporting the construction of a labeled property graph. Additionally, Excel data can also be converted into the Resource Description Framework (RDF) format (Reference Han, Finin, Parr, Sachs and JoshiHan et al., 2008). The structure of the Excel table is as follows: The first column contains the node name, and the subsequent columns capture the node properties, as outlined in Table 1. Additionally, information regarding the Owner, Consumer, and Modifier - all corresponding to the user entity - are collected. The table also captures attributes that define the entities CADProduct, CADPart, GeoSet, GeoObject, and Functions, specifically through a column titled DefinedBy. For instance, a parameter may define a specific function, or a point may be used to define a line. Additionally, a CADProduct can be defined of multiple CADParts. An example of the data extraction from the virtual models is presented in Table 2, which illustrates the data extraction scheme for a GeoObject, specifically a line.
Table 2. Extraction schema for data from virtual models

The context (Path) specifies the location of Line1 within the specification tree of the CAD model. Definitions for Node, Identifier, Value, and Timestamp are provided in Table 1. The type is specifically collected for GeoObjects, as they can represent different elements, as previously mentioned. DefinedBy refers to the IDs of the objects that define Line1, while the headings Owner, Consumer and Modifier refer to the UserIDs of the respective users. Each node type listed in Table 1 is recorded in a separate Excel table, and this format can be customized and expanded to meet the specific needs of individual use-cases. Following the definition of the ontology for our virtual product development knowledge graph and the format for the data extraction in this chapter, the next chapter will focus on the implementation of the knowledge graph.
5. Knowledge graph implementation
After defining the ontology and introducing the specific data extraction format, this chapter will focus on the graph implementation. The previously introduced data format enables efficient graph construction. By leveraging internal interfaces, the labeled property graph shown in Figure 3 was generated from the Excel tables in under five minutes. In Table 1, we outlined the different groups of nodes along with their respective properties. While these properties are stored within the nodes and can be accessed using simple graph queries, they do not contribute to building the graph's relationships. In contrast, the columns DefinedBy and Owner, Consumer, and Modifier in Table 2 are used to map the relationships in the graph. During the construction of the graph from the Excel dataset, we invert the directionality of the DefinedBy relationship to Defines. This inversion is necessitated by the characteristic of objects within the CAD-environment, where an object is cognizant of its inputs (parent) but remains unaware of how its outputs are used by other objects (children). Therefore, we extract the relationship from the CAD objects based on the DefinedBy relation. However, for the purposes of the graph representation, this relationship is reversed. This reversal allows us to describe how an object is influenced by its inputs, providing a clearer view of its dependencies, interactions, and the overall information flow within the system. Additionally, the columns Consumer, Owner, and Modifier define relationships between users and other node types, mapping user roles through owned, consumed, or modified by relationships.

Figure 3. Reduced example of the virtual product and collaboration-based knowledge graph
Figure 3 presents a simplified example of the virtual product and collaboration knowledge graph, where only a subset of the nodes is displayed to improve overall visibility. The graph displays the requirements (green), functions (blue), geometrical objects and sets (orange), CAD parts and products (yellow), and users (magenta). It is important to note, that this example originates from the environment of our industry partner, consequently the graph features abbreviations and nomenclature in German. Furthermore, the user names are fictional.
In this representation, the issue node is omitted, as the figure focuses on the initial creation of the graph. The issue node is introduced when misalignments occur between entities, which can result from differing values, changing functions, or mismatched object versions. Additionally, the configuration and project node (depicted in light blue in Figure 1) are also hidden. These nodes typically serve to categorize the knowledge graph in relation to the company's structure and processes, but they fall outside the scope of this virtual product development and collaboration point of view and are therefore not further discussed.
Despite this reduced representation, Figure 3 highlights the key nodes, including parameters, functions, users, geometrical objects, CAD products and parts. A notable observation is the numerous internal relations exhibited by the geometrical objects, with requirements and functions serving as their initial inputs. The number of internal dependencies underscores the complexity of the virtual vehicle, necessitating a data management system such as a knowledge graph to effectively map and highlight these connections. The interaction between users and other data objects within the knowledge graph is visible, allowing for the analysis of information flow to identify and establish collaboration groups.
This outcome paves the way for several potential avenues of future research, which will be discussed in the following chapters.
6. Discussion
The findings of this study provide valuable insights into constructing a knowledge graph tailored to virtual product development, offering a foundation for further exploration of its potential to improve collaboration among users. The proposed knowledge graph serves as a foundational tool for analyzing and retrieving information related to users, as well as the requirements, geometrical connections, and model interdependencies involved in the virtual product development process. Following the initial interviews and workshops conducted to define the problem statement and use-cases, the knowledge graph concept was developed. Once the development was completed and the framework was established, the participants from the workshops and interviews were reintroduced to the concept, with all of them recognizing the relevance and applicability of the framework. The methodological approach outlined demonstrates the potential to effectively analyze both current and future virtual vehicle models, while supporting the collaborative aspects of the development process.
The knowledge graph presented in this study is constructed using a reduced dataset from a virtual vehicle at the case company. In the future, the knowledge graph should be directly referenced to the most up-to-date data. Integrated with version control mechanisms, the knowledge graph will be developed simultaneously with the virtual vehicle, as it directly references the vehicle's evolving data. As the virtual vehicle progresses through its development stages, the knowledge graph will continuously accumulate and structure information. Furthermore, by expanding the knowledge graph across multiple vehicle architectures over time, the collected data can be archived, enabling the application of artificial intelligence for historical analysis and data-driven insights. Moreover, this approach is adaptable to other virtual models and use-cases beyond the automotive industry. However, further evaluation is required to assess its effectiveness in managing diverse use-cases, particularly when applied to larger scales and more complex projects.
Additionally, it is important to note that an interface is required for both the extraction of the data from the virtual models and the conversion of the Excel table into a format compatible with the knowledge graph. Such interfaces would facilitate the automation of the graph creation from the unstructured data.
Based on findings from the case company, we recommend implementing a user interface, potentially supported by a large language model, alongside the knowledge graph. Integrating this interface is expected to accelerate information retrieval and enhance user interconnectivity, fostering collaboration. This hypothesis will be investigated in future research.
7. Conclusion and outlook
In this paper, we presented a method for constructing a knowledge graph to support virtual product development by enhancing data management and collaboration among diverse engineering teams. The process of building this knowledge graph was outlined, including the investigation of relevant data, the definition of specific user roles and use-cases, the creation of the ontology, and the implementation of the graph with a specific data format. The knowledge graph organizes and connects different elements such as CAD models, requirements, geometrical objects and user information, offering a structured foundation for retrieving and analyzing information across the virtual product development process. While the proposed approach demonstrates potential for improving collaboration, further evaluation is needed to validate its effectiveness in practical applications. The method provides a scalable framework, adaptable to various industries and use-cases, with the potential to enhance workflows and enable more informed decision-making.
Future steps involve applying machine learning algorithms, such as community detection or content recommendation tailored to specific users, to the graph to generate deeper insights. Other possibilities include link prediction and enhancing user-graph interaction through large language models and a user interface. Additionally, future work will focus on integrating the issue node into the graph to enable machine learning applications in issue resolution. This knowledge graph approach also holds potential beyond the collaboration improvement, offering increased data availability, flexibility, and transparency within virtual vehicle model, though this still requires evaluation in future research.
In conclusion, this approach presents itself as a promising tool for improving collaboration and overall supporting the virtual product development process, with the potential to ultimately reduce both development times and costs.