Hostname: page-component-cb9f654ff-mx8w7 Total loading time: 0 Render date: 2025-08-29T09:47:47.108Z Has data issue: false hasContentIssue false

Creating design catalog from patent documents with large language model for design concept generation

Published online by Cambridge University Press:  27 August 2025

Yutaka Nomaguchi*
Affiliation:
Osaka University, Japan
Kei Kuroishi
Affiliation:
Osaka University, Japan
Aiza Syamimi
Affiliation:
Yamaguchi University, Japan
Daichi Tanaka
Affiliation:
Osaka University, Japan
Kazuya Okamoto
Affiliation:
Yamaguchi University, Japan Nippon Institute of Technology, Japan
Kikuo Fujita
Affiliation:
Osaka University, Japan

Abstract:

A design catalog is a repository of design problems and their solutions, enabling designers to explore and discover applicable solutions for their specific design challenges. Creating such catalogs has depended on human knowledge and implicit judgment, with no systematic approach established. This study aims to develop a systematic method to create a design catalog from patent documents. We utilize a large language model (LLM) to extract problem-solution pairs described in the documents, presenting them as general purpose-means pairs. Subsequently, we create a design catalog by classifying the problems using similarity-based clustering, enhanced by the LLM’s semantic text similarity capabilities. We demonstrate a case study of creating a design catalog for martial arts devices and generating new design concepts based on the catalog to verify the effectiveness of the proposed method.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
© The Author(s) 2025

1. Introduction

Design problems are often solved by applying similar existing issues and their solutions from different fields (Reference GassmannGassmann, 2006). To support this, prior studies have been widely conducted on extracting and generalizing problems and their solutions from existing design cases to reuse knowledge (Reference Roth, Franke and SimonekRoth et al., 1971; Reference Roth and ChakrabartiRoth, 2002; Reference Chen, Cai, Jiang, Sun, Childs and ZuoChen et al., 2024; Reference MannMann, 2002). For example, in TRIZ (Reference Altshuller and ShapiroAltshuller, 1956; Reference MannMann, 2002), Altshuller analyzed and abstracted many patents and organized the generic solution patterns as inventive principles. Roth’s Design Catalog (Reference Roth, Franke and SimonekRoth et al., 1971; Reference Roth and ChakrabartiRoth, 2002) is a pioneer of such efforts in mechanical design. It focuses on the behavior and functions of mechanical elements, abstracts them, and classifies their functional solutions according to the design problems they solve, making it possible for designers to consider potential solutions. However, humans must extract knowledge from design cases and define the classification criteria. This requires much effort.

This study aims to propose a systematic method to create a design catalog with quantitative classification criteria and search indexes. We use patent documents of various fields as information sources, extracting purpose-means pairs claimed in the patent documents as more general problem-solution pairs, which this study calls design knowledge. We use a large language model (LLM), and clustering based on semantic similarity with the distributed representation of LLM. Patent documents contain a wealth of design information, such as the functions and structures of past design cases. In addition, patents are classified by technical field using an International Patent Classification (IPC), which is suitable for searching design cases and analyzing them by computer (Reference OkamotoOkamoto, 2023). The method proposed in this study makes it possible to systematically create a design catalog based on the latest patent information for any target. This study takes the design concept generation of sports equipment as an example. It verifies the effectiveness of the proposed method by creating a design catalog and knowledge retrieval for the design concept generation.

2. Theoretical backgrounds and our approach

2.1. Approaches of design catalog

The design catalog by Reference Roth, Franke and SimonekRoth et al. (1971) was created to support mechanical design by structurally classifying the problems (functions) and their solutions to the existing mechanical design. It is in the form of a table, as shown in Figure 1, with each row representing a pair of a problem and a solution. Multiple solutions are summarized for one problem, making it possible to compare and consider alternative solutions to the target problem. Solutions are expressed using sketches, examples, and conditional expressions.

Figure 1. A part of Roth’s design catalog for the function “replicate mechanical force without supplementary energy” (Reference Roth, Franke and SimonekRoth, 2002)

The contents of Roth’s design catalog are mainly limited to mechanical engineering and are not versatile (Reference InkermannInkermann, 2021). In addition, since it is created manually, much effort is required to maintain and update the contents of the latest trends. Even when using it, it can be difficult for designers to search for the solutions they seek because the classification criteria are unclear (Reference InkermannInkermann, 2021).

Although there are some limitations, Roth’s design catalog provides a general form of design knowledge representation: a pair of problems (the classifying criteria or function) and solutions (design principles). This form can be seen in many approaches. The 40 principles in TRIZ enhance versatility by abstracting problem-solving in various fields and consolidating them into a pair of contradiction (design problem) and innovation principles (Reference MannMann, 2002). Reference Rosen and ChoiRosen and Choi (2024) overview the approaches of mechanical metamaterial design and propose a configuration design method with the design library: a pair of physical principles and subtypes of metamaterial structure. The knowledge graphs of Reference Chen, Cai, Jiang, Sun, Childs and ZuoChen et al. (2024) are support tools that extract keywords from patents and documents about the ecology of living organisms and enable the search for similar knowledge based on their relevance. They provide benefits and applications, while the extracted keywords do not necessarily correspond to problem-solution pairs. Those approaches support designers in exploring potential solutions suitable to design problems. However, creating the catalog requires human effort.

2.2. Large language model and distributional representation

With the development of deep learning, various natural language processing has become possible through learning using large corpora (Reference Radford, Wu, Child, Luan, Amodei and Sutskever.Radford et al., 2019). One of the basic principles is distributed representation (Reference Mikolov, Chen, Corrado and DeanMikolov et al., 2013), which represents text as vectors with hundreds to thousands of dimensions based on the distribution hypothesis (Reference HarrisHarris, 1954), which states that words with similar meanings are also similar to the words that exist around them and that the definition of a word is determined by the distribution of words around them. Based on this principle, embedding is a method of learning the positional relationships of text that appear using a large corpus and representing text as vectors in a multidimensional space (Reference Cer, Yang, Kong, Hua, Limtiaco, John, Constant, Guajardo-Cespedes, Yuan, Tar, Sung, Strope and KurzweilCer et al., 2018). Quantifying the semantic similarity between texts is possible by defining the cosine distance between vectors defined by Equation 1.

(1) $$\cos (u,v) = (u \cdot v)/||u||||v||$$

Here, u and v are vectors in distributed representation.

2.3. Our approach

This study uses patent documents as information sources and extracts the purpose-means pairs claimed therein as design knowledge. An LLM is used to automate the extraction of design knowledge and their classification based on quantitative indicators using distributed representation. This study also proposes a retrieval method using the design catalog for design concept generation, which retrieves design knowledge based on its similarity to the design problem.

Prior works have attempted to extract knowledge graphs from patents with natural language processing to generate design concepts (e.g., Reference Huang, Guo, Liu, Zhao and Zhang.Huang et al., 2023). The novelty of this study is to focus on a knowledge representation form of a design catalog and to create it automatically with an LLM.

3. Method of creating and using design catalog

Figure 2 provides an overview of the proposed method. We define design knowledge as a pair of a purpose and a means, a general form of problem-solution claimed in patents. An abstract purpose is a set of similar purposes. An abstract means is a set of similar means.

Figure 2. Overview of creating design catalog and design knowledge retrieval

3.1. Method of creating design catalog

Creating a design catalog consists of three steps: 1) collecting patent documents that the design catalog will cover, 2) extracting design knowledge from the patent documents, and 3) clustering pieces of design knowledge into abstract purpose and abstract means. Hereinafter, we use the following symbols for simplicity: P for purpose, M for means, and F for design problem.

3.1.1. Collecting patent documents

Collecting all patents in the database would be desirable. Due to resource constraints, there is a limit to the number of patents that can be collected. We set keywords in fields related to the design problem. Since a design catalog must present knowledge other than that the designer has already acquired (Reference InkermannInkermann, 2021), it is also vital to search for patents in fields that seem unrelated at first glance in the hope that the knowledge of different fields would be applicable.

3.1.2. Extracting design knowledge

Pieces of design knowledge described in the patent are extracted from the title and summary of a patent document with OpenAI’s GPT-4 (OpenAI, 2023). GPT-4 can perform various tasks with few-shot learning of appropriate examples and instructions in the input prompt instead of reinforcement learning. We give the prompt that includes an instruction statement that shows the overview of the task to extract pairs of P-M, an example sentence that indicates the format of the output result, and below that, the title and summary of the patent. Because of the hallucination, incorrect design knowledge that does not exist in the original text will appear. To avoid this, we prompt the LLM to include the original text “Source” from which the design knowledge was extracted in the output, allowing a human to judge the validity of the results.

3.1.3. Clustering design knowledge

The extracted P and M in pieces of design knowledge are clustered through the following steps.

  • Step 1: Using GPT-4, we obtain distributed representations of p and m for the texts representing the P and M of design knowledge. The distributed representation is a 1532-dimensional vector.

  • Step 2: We obtain abstract purposes by clustering a set of p . A set of pieces of design knowledge with an abstract purpose is called a p-cluster. Since the distributed representation represents the meaning of each text, the text of the distributed representation closest to the center of gravity of the cluster is considered to represent the meaning of all the texts contained in the cluster. The text of P closest to the center of gravity of the p-cluster is taken as the representative P for the abstract purpose.

  • Step 3: We cluster the pieces of design knowledge in the p-cluster by m . Each cluster represents an abstract means in the p-cluster. This is called the m-cluster. The M closest to the center of gravity of the m-cluster is taken as the presentative M of the abstract means.

This study uses Ward’s method, which is an agglomerative hierarchical clustering method, for Steps 2 and 3. The distance used for classification is defined by the cosine distance between the distributed representations expressed in Equation 1. The design catalog creator sets the cluster threshold t to obtain the relevant granularity of the clusters.

3.2. Design knowledge retrieval

The design catalog provides a designer with multiple alternative Ms corresponding to the F by retrieving design knowledge that has an abstract purpose corresponding to the F.

  • Step 1: The similarity between F and each P of p-clusters is evaluated using Equation 1. A designer can choose an abstract purpose from the Ps corresponding to F. The chosen abstract purpose is denoted by P A.

  • Step 2: The list of the representative Ms of the m-clusters in P A provides a designer an overview of alternative multiple Ms corresponding to the F. By evaluating the similarity between the F and each representative M of the m-cluster using Equation 1. A designer can choose an abstract means from the list according to the similarity. The chosen abstract means is denoted by M AB. If a designer wants to explore more specific means, proceed to the step 3.

  • Step 3: The list of Ms of M AB shows the specific means. A designer can choose M corresponding to the F by evaluating the similarity between the F and each M using Equation 1.

4. Case study

4.1. Overview of case study

To analyze the effectiveness of the proposed method, we take the design concept generation for “capturing the movements of athletes in order to appeal to audiences the dynamic movements of Taekwondo, a martial art” as an example and conduct a case study of creating a design catalog and design knowledge retrieval. Taekwondo is one of the most notable martial arts sports with the most patents. It shows an upward trend and has much higher growth (Reference Syamimi, Tachibana, Nomaguchi, Fujita and OkamotoSyamimi et al., 2022). To verify the effectiveness of the proposed method, we compare and verify the patent categories covered by the pieces of design knowledge retrieved by the created design catalog and how they differ from patent categories obtained by keyword search.

4.2. Creating a design catalog

4.2.1. Collecting patent documents

Patent documents are collected from English patent databases. In this case study, 3,421 patents were collected in four categories: “Audience Behavior (ab)” and “Taekwondo (tk)” as keywords relatively close to the design problem, “Digital Transformation (dx)” because it is thought to be related to digital technology, and “Waste Management (wm)” related to the sustainability and in the hope of bringing new ideas. The above symbols in parentheses are abbreviations and represent the “Patent Category” in the following explanation. Table 1 shows the keywords and search conditions used to search for patents in each category. Patents with application dates from 2000 to 2019 were collected. The “Patents” row of Table 2 shows the breakdown of the collected patents.

Table 1. Using keywords search in patent database: Application year 2000-2019

Table 2. The number of patents and the number of design knowledge

4.2.2. Extraction of design knowledge

Examples of extracted design knowledge are shown in Table 3. From the left, “ID of design knowledge” represents three symbols connected by a hyphen, i.e., 1) the category of the patent document from which it was extracted, 2) the serial number of the patent document in the category, and 3) the serial number of the design knowledge extracted from the patent document. “Purpose” and “Means” represent the P and M of design knowledge, respectively. “Source” represents the original text in which “Purpose” and “Means” are described. The breakdown of the extracted design knowledge for each category is shown in the “Design Knowledge” row of Table 2.

Table 3. An example of the extracted design knowledge

4.2.3. Clustering design knowledge

In Step 1, we used the GPT-4 model to obtain distributed representations of the text of each P and M of 17,564 design knowledge as 1532-dimensional vectors. In Step 2, we clustered the distributed representation of P with a clustering threshold of t = 1.2 and created 65 p-clusters. Table 4 shows some of the p-clusters. In Table 4, “ID of representative design knowledge” and “Purpose (P)” are information about the representative Ps. “p-Cluster ID” is the id of p-Cluster, “Size” is the number of design knowledge in each p-cluster, “Sim. of P & F” is the cosine similarity between P and F defined by Equation 1. In Step 3, design knowledge included in each p-cluster was clustered with the distributed representation of M. The clustering threshold was uniformly set to t = 0.3. Table 5 shows the m-clusters of p-cluster 41, shown in the first row of Table 4, as an example. In Table 5, “ID of representative design knowledge,” “Purpose,” and “Size” have the same meaning as in Table 4. “Means” is the representative M. The first number in “m-Cluster ID” indicates the id of p-Cluster, and the second number indicates the serial number of m-Cluster. “Sim of M & F” is the cosine similarity between each M and F defined by Equation 1.

As a result, 17,564 design knowledge was extracted from 3,421 patents, and a design catalog with 65 p-clusters and 1,719 m-clusters was created.

Table 4. Examples of the representative Ps of p-Clusters and the similarity between each P and design problem F (Top 3) : t=1.2

Table 5. m-Clusters in p-Cluster 41 : t=0.3

4.3. Design knowledge retrieval using design catalog

Using the design catalog, we search for knowledge related to the design problem F of “capturing the athlete’s motion.”

First, we calculated the similarity between F and the representative Ps of 65 p-clusters. The results are shown in the rightmost column of Table 4. We selected p-cluster 41, which has the highest similarity and whose representative P is “sense the motion of the gesture.”

We overviewed the list of Ms of m-clusters in p-cluster 41, as shown in Table 5. In order of similarity between the F and the representative M of each m-Cluster, the list shows “use of a motion capture device” (41-6), “use a motion posture measuring unit that compares the skeleton data with preset reference motion data” (41-10), “use a proximity sensor mounted on the foot guard” (41-5), and so on. Among them, we chose “use of a motion capture device” (41-6), which has the highest similarity.

We reviewed the list of Ms in m-Cluster 41-6 to obtain detailed knowledge. Table 6 shows the list. In Table 6, “ID of design knowledge”, “Purpose”, “Means”, “Size”, “Sim. of M & F” have the same meaning as in the tables of Section 4.2. As shown in Table 6, the items in descending order of similarity between F and M were “use of a motion capture device” (tk-517-6), “use a motion recorder” (tk-74-7), and “Photograph the learner posing for the displayed Taekwondo posture and perform data transformation” (tk-102-4), and so on. Those are possible alternative solutions for the design problem: “capturing the athlete’s motion.”

Table 6. The similarity of means (M) of m-Cluster 41-6’s design knowledge and design problem (F)

5. Discussion

5.1. Considerations on the creation of design catalog and design knowledge retrieval

Table 4 shows that p-Cluster 41 was found to be most similar to design problem F “capturing the athlete’s motion.” Table 5 shows that the Ps included in p-Cluster 41 are similar to the representative P of p-Cluster 41: “sense the motion of the gesture.” In addition, p-Cluster 41 contains various Ms, all of which are considered to provide suggestions for solving F. From the above, a practical design catalog was created with the proposed method. It can also be confirmed that some puzzling results are included in the created catalog. For example, the P of m-Cluster 41-15, “display different light effects to indicate playing state,” looks far from the representative P of p-Cluster 41. Because the size of the m-cluster is only 3, which is about 1.6% of the pieces of design knowledge of p-Cluster 41, there is no significant impact overall. Even if it is puzzling, it could be used as a novel and effective means, depending on the case.

According to the results of Table 6, the representative M of m-Cluster 41-6 “use of a motion capture device” does not express any specific device for motion capture. This would be because there are pieces of design knowledge in the cluster with different Ms, such as tk-471-5: “use a body action sensor,” and tk-74-2: “use an image receiver,” which are specific methods of acquiring physical information and image processing. The advantage of the design catalog is that it allows a wide variety of Ms to be listed at a glance. If a designer wants an M that is directly useful for solving the F, this can be improved by adjusting the threshold t of the m-Cluster for each p-Cluster. In this way, the method proposed in this study can flexibly adjust the criteria for abstracting pieces of design knowledge and reconstruct the design catalog according to the designer’s intentions.

5.2. Comparison of patent search methods using the design catalog and methods in patent databases

We compare the patents from which the pieces of design knowledge were extracted in Section 4.3 with the results of a patent search with keywords on patent databases. Figure 3 shows the breakdown of IPC classifiers contained in the patents from which the p-Cluster 41 was extracted.

Figure 3. IPC (sub-class) of patents from which design knowledge in p-Cluster 41 are extracted

Next, we searched for patents related to the F in Section 4.1 from the 3,421 patents in Section 4.2.1 using a major keyword search method in free patent databases of WIPO (World Intellectual Property Organization) and the patent offices of Japan, the United States, and Europe. The keyword search was performed under the condition with a regular expression (‘\bgesture’ OR ‘\bpositioning information’ OR ‘\bmotion’). The keywords were selected by finding the number of patents related to the F and referring to their text. Note that ‘\b’ represents a delimiter such as a space, comma, or hyphen and is a measure to prevent ‘emotion’ and other words from matching when searching for the word ‘motion.’

We compared the results of the keyword search with those of the design catalog retrieval. Figure 4 shows the breakdown of IPC classifiers for the searched patents. Table 7 shows the breakdown of the number of patents for each category.

Figure 4. IPC (sub-class) of patents retrieved by keyword search

Table 7. Breakdown of patents retrieved by keyword search and those in p-Cluster 41

Figures 3 and 4 show that the IPC classifiers with more than 5% of the total number of patents are generally the same, and there is no significant difference in the technical scope of the searched patents. Table 7 shows that many patents are in the ab and tk categories in both. However, the searched patents are different. Among the patents in p-Cluster 41, the proportion of patents only included in p-Cluster 41 is high, especially in categories other than ab. For example, all of the design knowledge included in the m-Cluster 41-14 in Table 5 was extracted from patents only in the design catalog. It shows novel design knowledge: a device that generates sound through physical contact rather than electronic sensors or image processing is used to capture players’ movements.

IPC classifiers that exist only in p-Cluster 41 include A43B, A61C, E01H, G01R, G08B, G10L, H01L, and H04B. We take G01R, a technology category for measuring electrical and magnetic variables, as an example. Table 8 shows the design knowledge dx-30-1 extracted from patents in G01R. This describes a method for measuring the displacement of an object with changes in a magnetic field. Among the 141 patents found by keyword search (Table 7), three patents contained the word “magnet” in the text, but none were used to measure the displacement of an object. In other words, design knowledge of magnetic measurement methods was obtained only from the design catalogs created with the proposed method.

It should be noted that keyword search results depend highly on the keyword set. However, the above discussion indicates the possibility that the proposed method using an LLM can discover relationships not easily obtained by keyword search and obtain knowledge that exists in patents in different fields.

Table 8. Design knowledge dx-30-1 extracted from the patent of IPC G01R

6. Conclusions

This study proposed a systematic method for creating a design catalog with quantitative classification criteria and indexes based on semantic similarity using an LLM to extract pairs of purpose and means described in patent documents from patent documents in fields corresponding to the design problem. We conducted a case study using this method and verified that a design catalog can be created. We also compared the patent retrieval results with keyword searches and those with the proposed design catalog. The comparison analysis confirmed that the proposed method has the potential to retrieve design knowledge from various fields that raw keyword searches cannot obtain. Our future work includes the extensive evaluation of the quality and usefulness of the created design catalog through case studies and reviews from experts.

References

Altshuller, G. S., Shapiro, R. B. (1956). On the psychology of inventive creation, The Psychological Issues, 6, 3739. (in Russian)Google Scholar
Cer, D., Yang, Y., Kong, S., Hua, N., Limtiaco, N., John, R. S., Constant, N., Guajardo-Cespedes, M., Yuan, S., Tar, C., Sung, Y., Strope, B. and Kurzweil, R. (2018). Universal Sentence Encoder, arXiv:1803.11175v1.Google Scholar
Chen, L., Cai, Z., Jiang, Z., Sun, L., Childs, P., Zuo, H. (2024). A knowledge graph-based bio-inspired design approach for knowledge retrieval and reasoning, Journal of Engineering Design, 131.10.1080/09544828.2024.2311065CrossRefGoogle Scholar
Gassmann, O. (2006). Opening up the innovation process towards an agenda, R&D Management, 36(3), 223228.Google Scholar
Harris, Z. (1954). Distributional structure, Word, 10(2-3), 146162.Google Scholar
Huang, Z., Guo, X., Liu, Y., Zhao, W., Zhang., K. (2023). A smart conflict resolution model using multi-layer knowledge graph for conceptual design, Advanced Engineering Informatics, 55, 101887.Google Scholar
Inkermann, D. (2021). What Happened to Roth’s Design Catalogues? - A Review of Usage and Future Research, Proceedings of the ASME 2021 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, DETC2021-71746.Google Scholar
Mann, D. (2002). Hands-On Systematic Innovation, CREAX Press.Google Scholar
Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013). Efficient estimation of word representations in vector space, arXiv:1301.3781v3.Google Scholar
Okamoto, K. (2023). Introduction to R&D Management - Management of Technology -, Asakura Publishing.Google Scholar
OpenAI. (2023). GPT-4 Technical Report, arXiv:2303.08774.Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. and Sutskever., I. (2019). Language models are unsupervised multitask learners. https://api.semanticscholar.org/CorpusID:160025533.Google Scholar
Roth, K., Franke, H.-J., Simonek, R. (1971). Algorithisches Auswahlverfahren zur Konstrukton mit Katalogen, Feinwerktechnik, 75(8), 337364.Google Scholar
Roth, K. (2002). Design catalogues and their usage. In: Chakrabarti, A. (eds) Engineering Design Synthesis. Springer, London, 121129.CrossRefGoogle Scholar
Rosen, D. W., Choi, C. Y. (2024). Toward a configuration design method for mechanical metamaterials, Proceedings of Asia Design and Innovation Conference 2024, 148.Google Scholar
Syamimi, A., Tachibana, T., Nomaguchi, Y., Fujita, K., Okamoto, K., (2022). Generating business ideas in taekwondo using the novelty potential concept, Asian Sport Management Review, 16, 2844.Google Scholar
Figure 0

Figure 1. A part of Roth’s design catalog for the function “replicate mechanical force without supplementary energy” (Roth, 2002)

Figure 1

Figure 2. Overview of creating design catalog and design knowledge retrieval

Figure 2

Table 1. Using keywords search in patent database: Application year 2000-2019

Figure 3

Table 2. The number of patents and the number of design knowledge

Figure 4

Table 3. An example of the extracted design knowledge

Figure 5

Table 4. Examples of the representative Ps of p-Clusters and the similarity between each P and design problem F (Top 3) : t=1.2

Figure 6

Table 5. m-Clusters in p-Cluster 41 : t=0.3

Figure 7

Table 6. The similarity of means (M) of m-Cluster 41-6’s design knowledge and design problem (F)

Figure 8

Figure 3. IPC (sub-class) of patents from which design knowledge in p-Cluster 41 are extracted

Figure 9

Figure 4. IPC (sub-class) of patents retrieved by keyword search

Figure 10

Table 7. Breakdown of patents retrieved by keyword search and those in p-Cluster 41

Figure 11

Table 8. Design knowledge dx-30-1 extracted from the patent of IPC G01R