1. Introduction
Applying network analysis and natural language processing to a large set of corpora has been widely explored for its potential to identify patterns and forecast trends. One common method is through embedding techniques, which transform texts into numerical representations and group similar ones together. Several studies have applied such techniques to find research trends (Reference Huang, Chen, Zhang, Wang, Cao and LiuHuang et al., 2022; Reference Lo and WangLo & Wang, 2023), classify texts (Reference Mehta, Bawa and SinghMehta et al., 2021; Reference Wang, Xu, Xu, Tian, Liu and HaoWang et al., 2016), and identify topics (Reference Mu, Lim, Liu, Karunasekera, Falzon and HarwoodMu et al., 2022). In our related work, we also used word embedding techniques and network analysis to visualize research trends within the design methodology community. Specifically, keywords extracted from the corpus are transformed into word vectors, which are then used to compute semantic similarities between keywords and construct networks based on these similarities. The networks are then divided into several communities where keywords with similar semantic meanings are grouped, and for each community, the most representative keyword is selected as a community label, which is considered a topic. Topic trends can subsequently be formed by calculating the similarities between topics in consecutive years (Xiao & McAdams, 2025).
While our related study provides a generalized approach to exploring trends within the engineering design methodology field, several challenges remain. The existing method does not eliminate all the noise data, causing general words to be chosen as a topic. Additionally, the current community detection method lacks robustness since the technique is highly sensitive to human-defined threshold values (Xiao & McAdams, 2025).
To address these gaps, this paper extends the previous study’s framework by introducing new techniques for community and topic detection. Furthermore, the communities generated by computers are compared to those created by humans. This advancement avoids problems caused by irrelevant words extracted. Furthermore, comparing human and computer-generated communities helps evaluate the validity of the proposed method and provides deeper insight into the strengths and limitations of automated techniques.
2. Literature review
Citation and co-citation analysis are two types of network analysis frequently used to uncover relationships and explore patterns within scholarly literature. Citation analysis determines the importance of a paper through the number of citations it receives; therefore, it is usually applied to find the most influential paper in a field. Co-citation analysis is used to study how often two publications co-occur in the same bibliography, which is useful for identifying the underlying themes within the literature. In addition to citation and co-citation analysis, co-word analysis is also commonly used to determine patterns in large datasets. Similar to co-citation analysis, co-word analysis is based on the co-occurrence frequency of keywords in a collection of texts. Co-word analysis assumes two keywords appearing together share a similar thematic relationship and groups these keywords. The ability of co-word analysis to forecast future trends is derived from the content, as authors typically discuss their research contribution and future research within the article. By extracting keywords from the content, co-word analysis provides a preview of research trends. In contrast, citation and co-citation analysis are unable to achieve this, as these citation-based methods do not consider the paper’s content. Since citation and co-citation analysis lack the ability to forecast future trends, researchers often combine co-word and co-citation analysis to enrich the findings and predict future research directions (Reference Donthu, Kumar, Mukherjee, Pandey and LimDonthu et al., 2021). However, co-word analysis has drawbacks, such as words having multiple meanings and some terms being too general to categorize into a specific cluster. Therefore, novel approaches that involve using word embeddings to capture semantic meaning and calculate the semantic similarity of words, rather than a co-word network that relies on the co-occurrence frequency of words, have been proposed (Reference Huang, Chen, Zhang, Wang, Cao and LiuHuang et al., 2022). A popular word embedding model, Word2Vec, is a multi-layer neural network that transforms words into vector representations. These word vectors can further be used to determine key themes within literature.
Similar to how network analysis integrates word embedding models for topic extraction, topic modeling can also leverage word embeddings to uncover latent structures in a corpus. Topic modeling is a technique based on the co-occurrence of keywords, however, instead of measuring the actual co-occurrence frequency of the words, the algorithm measures the probability of the co-occurrence (Reference Chen, Wang, Pan and XiongChen et al., 2021), and groups extracted words according to a probability distribution (Reference Leydesdorff and NerghesLeydesdorff & Nerghes, 2017). Latent Dirichlet Allocation (LDA) is one of the most well-known topic modeling techniques. The LDA model assumes each document contains a mix of underlying topics, which are determined by the “probability distributions over words” within the document (Reference Vayansky and KumarVayansky & Kumar, 2020, p.3). Since traditional topic modeling algorithms group frequently co-occurring words together, the meaning of the words is not considered. In addition, LDA has some drawbacks, such as inconsistent results and ineffective assumptions. Therefore, many studies have proposed other topic models that derive from LDA or combine other statistical models with LDA to achieve better results (Reference Vayansky and KumarVayansky & Kumar, 2020). Incorporating the word embedding technique with topic models is one of the methods. For example, Reference Zhang, Cui, Liu, Jiang and LiZhang et al. (2023) proposed an LDA2vec model where the LDA model was used to identify the topics first, and the topics were then transformed into vectors through the Word2Vec model (Reference Zhang, Cui, Liu, Jiang and LiZhang et al., 2023). With the multiplication of the topic vector and LDA topic probability value, a new topic vector is generated and used to compute the “semantic similarity among topics at adjacency stages, and evolution paths that directly reflect the topic relationships are constructed” (Reference Zhang, Cui, Liu, Jiang and LiZhang et al., 2023, p.1).
Topic modeling outputs a list of topics, where each topic consists of a mix of words. This nature of topic modeling reduces the interpretability of the results, as the output may not always be meaningful. Therefore, our study continues to use embedding techniques with network analysis as in our previous work. However, instead of transforming words into vectors, our new study transforms each abstract into a vector by using the Doc2Vec model. The Doc2Vec model, as an extension of the Word2Vec model, was developed by Le and Mikolov in 2014 to represent sentences and documents as vectors. In line with the Word2Vec model, the Doc2Vec algorithm also has two model architectures - the Distributed Memory Model of Paragraph Vectors (PV-DM) and the Distributed Bag of Words version of Paragraph Vector (PV-DBOW). The PV-DM model extends Word2Vec’s CBOW model by incorporating a paragraph vector into the training process, which is then concatenated with word vectors to make predictions. The PV-DBOW model is analogous to Word2Vec’s Skip-gram model but instead of predicting surrounding words given a word, the PV-DBOW model predicts words with a given paragraph vector (Le & Mikolov, 2014). To aid readers who are unfamiliar with the previously described techniques, Table 1 provides a summary of them. Word2Vec was employed in our previous study but demonstrated limited effectiveness. Therefore, Doc2Vec was chosen in this study to explore potential improvements.
Tabel 1. Summary of computational methods for topic analysis

3. Research approach
This study adopts and extends approaches by Reference Huang, Chen, Zhang, Wang, Cao and LiuHuang et al. (2022) (Reference Huang, Chen, Zhang, Wang, Cao and LiuHuang et al., 2022) and the research framework builds on our related work (Xiao & McAdams, 2025), with modifications in the network construction and topic selection steps.
3.1. Network construction
The dataset analyzed here consists of the American Society of Mechanical Engineers (ASME) International Design Engineering and Technical Conference proceeding abstracts spanning five years, from 2018 to 2022, comprising a total of 268 papers. Text preprocessing techniques are applied to the dataset to eliminate unnecessary punctuation and words. Initially, the same dataset was used to train a Doc2Vec model, which generates document vectors for each abstract in the dataset. However, the resulting vectors are not diverse, making it a challenge to detect distinct communities. Therefore, more abstracts from previous years of the conference proceedings are included in the training process to avoid data overfitting. Each abstract is considered a single document, and once the documents are transformed into vectors, the cosine similarity is calculated between pairs of vectors to form a square similarity matrix. The square matrix is then used to construct networks where each entry (i,j) represents the cosine similarity between vectors i and j. To construct a network, a threshold is defined by the user such that an edge is added to the network only when the cosine similarity exceeds this threshold. Community detection algorithms are then applied to the resulting networks to divide them into groups, with each group consisting of closely related abstracts. Two programs are involved during the network construction and community detection process - NetworkX and VOSviewer (Xiao & McAdams, 2025). NetworkX is a Python package for network construction (Reference Hagberg, Schult, Swart, Hagberg, Varoquaux, Vaught and MillmanHagberg et al. 2008), and VOSviewer is used for visualization and community detection (van Eck & Waltman, 2014).
3.2. Topic identification and similarity calculations
Previous studies determine a single keyword with the highest Z-Score as a community label (Xiao & McAdams, 2025; Reference Huang, Chen, Zhang, Wang, Cao and LiuHuang et al., 2022). Z-Score is an indication of node importance within a network, with a higher number representing higher importance. However, a single document (node) cannot adequately represent the entire document community, as each document is distinct and has its unique characteristics. Therefore, generalized topics are essential to better represent the entire community of documents. To obtain generalized topics, large language models (LLMs) are employed in our study. Specifically, ChatGPT determines the community labels based on the titles of the documents within each community, and these generated labels serve as the topics. The steps are as follows. First, a list of titles from a single community is input into ChatGPT with the prompt: “Based on the list of titles I have provided, give me a label for this community of titles. The label should be less than 6 words.” Secondly, the same procedures are repeated for the remaining communities within a single network until a list of LLM-generated topics is produced. Lastly, steps 1 and 2 are repeated for the remaining years of networks. However, to obtain the similarity between community labels in subsequent years and construct a Sankey diagram, it is still necessary to calculate the Z-Score for each node and the similarity scores for all pairwise combinations of communities from consecutive years. Equations 1 and , adopted from Reference Huang, Chen, Zhang, Wang, Cao and LiuHuang et al. (2022)’s work, demonstrate these two calculations.


In both equations, the letter z represents the Z-Score, and z’ represents the max-min normalized Z-Score.
${\rm{N}}_{\rm{M}}^{\rm{i}}$
represents the total edge weights between the ith document and all other documents in community M, where M° represents the total number of documents in community M. Additionally, H(Mt) represents the document set in community M at year t, and vWt denotes the vector representation of document Wt (Reference Huang, Chen, Zhang, Wang, Cao and LiuHuang et al., 2022). Once similarity scores are calculated for each pair of communities, a Sankey diagram can be created by forming paths between community pairs whose similarity scores exceed a user-defined threshold.
3.3. Human validation
Since categorization is generally a challenging task, this study also examines the algorithm’s performance by investigating how documents are categorized by humans and comparing these results with those produced by computer algorithms. This human effort serves to validate the categorization process. Once the accuracy of the automated method is confirmed against this benchmark, human involvement may no longer be necessary for future categorizations.
The document categorization is done by five design methodology research field experts. Each person completed the categorization exercise independently. There were no instructions on how to group these documents because the study also aims to explore the diversity of researchers’ decision-making processes. Each expert comes up with their number of clusters for the same dataset and their ways of grouping the documents. Table 2 summarizes each researcher’s strategy to develop the document clusters. According to the table, each researcher performs the classification task using a different approach. Some researchers base their judgments on specific criteria, such as design goals or design stages, while others rely entirely on their intuition. The varying classification methods can lead to very divergent results. However, one of the goals of this paper is to investigate the extent to which decision-making differences impact the results and provide future study directions.
Tabel 2. Categorization strategies from each researcher

4. Results
4.1. Community detection
Figure 1 shows an example network generated from the computer algorithms outlined above. There are a total of 6 communities within that network. A threshold of 0.85 ensures that edges in the network are formed only between nodes with a cosine similarity greater than 0.85. For more information and networks from other years, please refer to the GitHub page (link). Table 3 lists each community label along with the titles of three selected documents from that community.

Figure 1. The 2018 network uses Doc2Vec with a threshold of 0.85
According to Table 3, documents from 2018 are categorized into 6 distinct groups. The first group is labeled “Design Methods and Approaches”. The titles within the first group encapsulate a variety of innovative strategies and methodologies used in design processes. The label “Prototyping and User-Centered Design” is assigned to the second community which highlights the prototyping strategies in the design process. The third community is labeled “Creative Processes in Engineering Design” because most of the papers in that community focus on creativity and ideation within engineering design contexts. The fourth community is labeled “Influence of Human Behavior on Design” because this community includes papers that discuss how human actions, decisions, and social interactions impact the design process. The fifth community is labeled “Innovative Design Techniques” as this community reflects a focus on novel methodologies and strategies that enhance the design process. The last community is labeled “Function Analysis and Product Design” because papers in this community focus on how understanding functions can enhance design. However, the categorizations are not perfect, and some papers are misclassified. When the papers in a community lack a consistent theme, LLM may assign two topics to a single label to capture the diverse themes within that community. For example, “Prototyping and User-Centered Design” contain both prototyping and user-centered design which are not aligned.
Tabel 3. Community Labels and corresponding three selected document titles

4.2. Human classification of documents
The initial classification of the documents varied in both the number of communities and the corresponding community labels assigned by researchers. To enable a more accurate comparison, the classification results were post-processed, with all communities consolidated into six distinct groups. While the original labels defined by each individual differed, those labels referred to analogous categories, therefore, six standardized labels “design method/processes”, “prototyping”, “Ideation”, “designer’s behavior”, “design tools”, “design goal” were defined to be consistently applied across all communities.
In Figure 2, the cells are colored to indicate communities. The figure illustrates variations in decision-making across individuals. Overall, both congruence and divergence can be observed in the categorizing decisions. All researchers grouped four papers related to “designer’s behavior” together. There is also some agreement on prototyping groupings among researchers 2, 3, and 4. Researcher 1’s community configuration differs the most from the other researchers. All researchers show some level of disagreement across all classifications, even in the prototyping grouping, which has the highest level of agreement, the classifications are still not entirely identical.
The differences among humans are already significant, and the gap between computer algorithms and human is even greater. To facilitate a more consistent comparison, the computer-generated topics from 2018 have been further generalized and replaced by human-generated topics. “Design Methods and Approaches” is replaced by “design method/processes”. “Prototyping and User-Centered Design” is replaced by “prototyping”. “Creative Processes in Engineering Design” is replaced by “ideation”. “Influence of Human Behavior on Design” is replaced by “designer’s behavior”. “Innovative Design Techniques” is replaced by “design tools”, and “Function Analysis and Product Design” is replaced by “design goal”.

Figure 2. Comparison of document categorization results among five researchers and computer algorithms
4.3. Topic relationship path
Following the steps outlined in section 3.2, a list of LLM-generated community labels (topics) and their pathways are shown in Figure 3. A threshold of 0.825 indicates that paths are only formed between topics with a similarity score greater than 0.825. The threshold value is determined through iterative testing to better visualize the trends. The behaviors of the topics will be discussed in detail in section 5.

Figure 3. Sankey diagram from Doc2Vec model with a threshold of 0.825
5. Discussion
5.1. Comparison of communities identified by human and algorithm
The comparison between human and computer algorithms demonstrated a significant degree of variability in classification results. It is observed that only “ideation” and “designer’s behavior” show some level of agreement between computer-generated output and human-determined output. Such an agreement is predictable since papers that discuss ideation and designer’s behavior are quite differentiable than papers under other themes due to the unique terminologies they inherently involve. On the other hand, the differences may be due to the different focuses during the decision-making process. For example, while computers categorize documents entirely based on document similarity, which is a summation of word similarities within each document, the human classification process often involves intuition and is based on divergent reasoning. Therefore, documents containing identical words or words with similar meanings are likely to be grouped by computers, even if these papers focus on different aspects.
One limitation of the study is that parameters still have impacts on the results. A small change in the threshold can change the network structure significantly. A key reason for the non-robust behavior is that the cosine similarity across most vector pairs shows little variation. In such a case, the community detection algorithm does not work well. Future research that involves integrating more advanced machine learning such as graph-neutral networks to improve performance, and incorporating manual labor where algorithms improve with human feedback should be explored.
5.2. Research trends
There is some merging, splitting, and discontinuation observed in Figure 3. For example, the topics “Influence of human behavior on design”, “Prototyping and user-centered design”, “Innovative design techniques,” and “Design methods and approaches” from 2018 merge into a single topic, “Dynamic human-computer collaboration,” in 2019. This merging suggests that the research focus has shifted from understanding human behavior, developing new design methods, and studying prototyping strategies into a combined area that focuses on creating dynamic collaborations between human and computer. Additionally, these same topics from 2018 also influenced other areas in 2019. For instance, the same four topics also flow into the 2019 topic of “Team coordination”. These additional flows indicate that some topics from 2018 not only combine into new areas but also branch out, impacting the development of multiple research directions. Several discontinuations are also observed, which means that topics from one year do not connect to any topics in the subsequent year, such as “design cognition”. The discontinuity suggests that certain areas of research temporarily lost prominence.
There are also a few paths without merging or splitting behaviors, where a single, direct flow connects one topic to the next cross years. In these cases, the topics remain consistent over time. For instance, a path that starts with “Creative processes in engineering design” and ends with “Cognitive load and collaborative design,” and the path that starts with “Collaborative dynamics in engineering teams” and ends with “Team collaboration” maintain a steady focus on creativity and teaming, respectively, which suggests that creativity and teaming are stable, well-established areas with ongoing research interest and relatively little change in focus.
6. Conclusion
This study developed a framework for visualizing the research trends within the design methodology community. By applying additional models to the same dataset and employing LLM to determine topics, this study extends our previous research to gain deeper insights into the topic relationships. Although the Sankey diagram demonstrates limited effectiveness in visualizing the trend, it highlights several phenomena, such as the strong correlation between most topics, with only a few topics being relatively independent and minimally influenced by other topics. A limitation of our study arises from the poor performance of the document categorization method. Although our study involves human validation by comparing how humans and computer algorithms categorize documents, the difference is substantial. The disagreement between human and machine-generated results indicates the need for further investigation into the factors influencing community categorization and incorporating more advanced techniques to improve the results.