1. “Kill that Strange Bird”: culture, cognition and meaning
On September 1, 1959, in a village in the Taipei basin, a local research assistant, MC,Footnote 1 Miss Chen, hired by the American anthropologist Arthur Wolf to observe children’s social interactions, documented the following episode in great detail. In this observation, three boys and a seven-year-old girl (all in pseudonyms) played a game called “Kill that Strange Bird.”
Cheng Li-chu (6-year-old) & Cheng Mu-wen (6-year-old) were in front of House 104. They each had a bamboo stick.
Li-chu to Mu-wen: “Let’s go kill that Strange Bird [reference from a recent movie].”
Mu-wen: “I’m going to really kill it this time.” They walked toward the back of the house. As they crossed the ditch, Li-chu shot a stick through his hollow bamboo and said: “I’m going to really kill you this time, Strange Bird!” They walk over to Peng Ah-lien (7-year-old) who was playing house with some other children.
Li-chu lifted his “gun” and said: “Bang! Bang!” at Ah-lien. Mu-wen did the same thing. They repeated this several time, but Ah-lien ignored them.
Li-chu finally said: “Come on, Strange Bird.” Mu-wen repeated this.
Li-chu: “This time we’re going to catch you wherever you go.”
Ah-lien turned her head to them and said: “I’m not afraid of you. I’m going to hide under the water.”
Li-chu: “I’m going to get an anti-aircraft gun. I’ll shoot you through the water and kill you.”
Ah-lien: “No. You can’t.”
Cheng Liang-wen [Cheng Mu-wen’s brother, 8-year-old] came and said: “No, we’re going to use Nike missiles.”
Li-chu: “Yes. We’ll use Nike missiles.”
Liang-wen: “We’re going to shoot them into the water and the water will turn into fire and burn you!”
Mu-wen & Li-chu: “Yeh! Then you can’t hide in the water anymore!” All three of them yelled threats at her [Ah-lien].
Li-chu: “Let’s say you are hiding in a volcano in your nest and there is a fire in there and I’m going to use my missile to fire right into your nest and the mountain will blow you up and you are going to be killed.”
Ah-lien argues with them about where she will hide. They stopped for a while to watch something else.
The entire event unfolded spontaneously among four young children, without any top-down supervision or instructions, which was a very ordinary part of their mundane life. Yet such a simple text preserved incredibly rich information, not just about physical confrontations, or pretentious confrontations, but with psychological, social, cultural and historical significance, all pointing to the mystery of children’s play. The dazzling “fight” in the opening vignette – the quotation marks are necessary because it is a pretend-play – presents a maze of meaning: It weaves together layers of symbolism, metaphors and sentiments embedded in the very specific setting of Cold-War Taiwan. Some layers of meaning and certain intersubjective dimensions of behavior are far from transparent in the text itself. Children’s play seems trivial but contains enormous educational power (Gray, Reference Gray2017). Play is meta-communication, “communication about communication” and inherently paradoxical: Conveying the very message “this is play” signals that the actions that children are about to engage in “do not denote what those actions for which they stand would denote” (Bateson, Reference Bateson2000 [1972], 185). These playful dramas, spontaneously and creatively enacted by and for children, constitute an important platform for moral development, as children learn to cooperate, resolve conflict, read social minds and understand moral norms (Xu, Reference Xu2024). Due to the imaginative nature of play, in which the actions are governed by rules in the players’ minds, these dramas can easily confuse an outside observer (Gray, Reference Gray2017). To decipher children’s moral dramas, an anthropologist needs to grasp subtle cultural meanings and attune to meta-communicative cues. What about AI algorithms? How would they “read” these scenes? Situated at the intersection of anthropology, data science, digital humanities and developmental science, this article critically reflects on AI interdisciplinarity concerning the interpretation and translation of diverse lifeworlds and social experiences. It tells the story of how children, anthropologists and AI make sense of such layered social interactions, and by doing so, interrogates the nature of learning morality.
The opening vignette invites us to explore how, in the process of learning morality, e.g., the good killing the evil, or my side beats your side, human cognition operates within and through cultural reality. Human morality is “originated in our species’ natural history of cooperation and coordination and actualized in our holistic social history” (Xu, Reference Xu2019: 657). In cognitive science, learning is often defined as the process by which an agent forms an internal model of the world (Dehaene, Reference Dehaene2020:3). But learning is also a socially situated process where “agent, activity and the world mutually constitute each other” (Lave & Wenger, Reference Lave and Wenger1991: 33). In other words, children acquire their moral knowledge through active engagements in real-world contexts.
The episode contains complex cultural symbolism, fluid at the moment but also sedimented in history. “The Strange Bird” was not really a bird. It referred to a military aircraft, and it was played by a child, the only female actor, Ah-lien, in this grandiose drama. A bamboo stick was not really a stick, but a gun, which, in the children’s later imagination, upgraded into an anti-aircraft gun; “Nike missiles,” vivid in a six-year-old’s fantasy, is a poignant reminder that quotidian details of everyday life in this small community reflect geopolitical trends at the height of the Cold War era. In August 1958, just about a year before this observation took place, the Second Taiwan Strait Crisis broke out between the People’s Republic of China, ruled by the Chinese Communist Party, and the Republic of China (ROC) under the KMT’s military dictatorship. In October 1958, the United States Army sent out a Nike-Hercules Missile unit to Taipei County, Taiwan, at a site about 13 kilometers from the children’s village, and a year later, the Nike-Hercules Missile was handed over to ROC forces (see Figure 1).Footnote 2

Figure 1. The Nike-Hercules missile unit in Taipei County, Taiwan. Source: Government of the Republic of China, public domain, via Wikimedia Commons.
It is equally important to attend to the cognitive aspects of this episode, from its production, children learning together, to its interpretation, anthropologists learning about children. First of all, it was children’s “shared intentionality” that made their coordination possible. “Shared intentionality” refers to the human capacity to engage with the psychological states of others and enact collaborative interactions, something that human children are exceptionally good at and arguably a part of what makes our species unique (Tomasello & Carpenter, Reference Tomasello and Carpenter2007). No actor in the observation was confused about what “the Strange Bird” meant and everyone knew that to “kill” did not mean, literally and physically, to kill. They all had some shared knowledge about the situation, about their own goals and about what they knew that the others knew. Such shared knowledge also extends to the observer and local research assistant, MC, who was the only eyewitness to this event, and critically, a cultural insider and children’s trusted playmate. She inserted a short note, “reference from a recent movie,” right after the boy announced, “Let’s kill that Strange Bird.” The intended reader of this message was MC’s supervisor, the American anthropologist Arthur Wolf. Wolf was much less immersed in children’s lives than MC and, therefore, might not immediately understand what the boy actually meant.Footnote 3
This is just one among many episodes of children’s play that IFootnote 4 encountered while trying to re-analyze a rare and significant archive of fieldnotes left behind by Arthur Wolf, which I call “the Wolf archive.” These fieldnotes were collected over six decades ago (1958–60) as part of the first ethnographic research focused on ethnic Han children and childhood, but were buried in time until I was invited to bring them to light recently (Xu, Reference Xu2024). To re-interpret this unique and historically significant set of fieldnotes, my team built a text-to-data pipeline to digitize and organize these materials. We used various “distant-reading” Natural language processing (NLP) methods, including the state-of-the-art large language models (LLMs), to complement ethnographic close reading and make sense of children’s social life that we had never and could never have witnessed in person. As I lacked first-person fieldwork experience, my ethnographic interpretation also benefited from reading testimonies from faithful observers like MC and other mediators. I incorporated logical inferences and historical contextualization, ultimately resorting to commonsense – the practical intelligence in human agency that remains to be one of the hardest challenges for language AI (Choi, Reference Choi2022; Hasselberger & Lott, Reference Hasselberger and Lott2023).
I chose this hybrid approach both due to a top-down concern, my long-term interest in methodological pluralism and interdisciplinary work (Xu, Reference Xu2019), and a bottom-up reason, this archive’s dual nature of both “big data” and “thick data” by anthropology’s standard (Wang, Reference Wang2016). In this project, the methodological integration was not a simple addition in which each approach plays its own part without the influence from the other. Instead, it always involved an interpretive loop, where ethnographic interpretation and machine reading mutually informed each other to answer a given question, generate a new question and/or inspire a new analytical direction. For example, the core part of this archive, Child Observation (CO), contains over 1600 of episodes like the opening vignette. I used topic-modeling analysis to discover clusters of high-frequency and co-occurring words in this corpus, as such textual patterns were hard to detect by the human eye. But the ethnographic method remained critical for evaluating the validity and interpreting the meaning of such latent topics. One of such topics, or clusters of words, resembled children’s fighting, a salient theme that emerged in my close reading. To systematically examine fighting in comparison to other themes related to my conceptual interests, I collaborated with a data scientist, the co-author of this paper, through learning from each other, incorporating topic-modeling results and ethnographic insights into S-BERT semantic-similarity analysis. While this informed LLM efficiently discerned conflict (fighting) as a dominant theme in the CO corpus, ethnographic reading discovered a problem, that is, the LLM took children’s playful fights, such as the opening vignette, as real conflicts. This discovery prompted me to identify and analyze a particular subset of playful fights, use a generative LLM, generative pre-trained transformer (GPT), to examine the sentiments and themes of these fieldnotes, and compare GPT outputs with my ethnographic knowledge (for a schematic representation of this workflow, see Figure 2).

Figure 2. A schematic representation of a workflow chart.
Fieldnotes about children’s world are rare and intriguing; for example, children’s moral dramas encapsulate the mystery of human communication, which blends layered cultural knowledge into shared intentionality. Therefore, this human–machine hybrid approach became, simultaneously, a methodological experiment and an epistemological inquiry. Take “kill that Strange Bird” as an example, exploring the interpretive loops between machine learning, children’s social learning and anthropological meaning-making can shed valuable light on the nature of intelligence: How do children create such moral dramas and learn from them? How do algorithms make sense of words like “killing,” “gun” and “bird,” encrypted via layers of cultural symbolism, in the context of children’s play in rural Taiwan during the Cold War? How do we, machine or human, understand concepts like conflict, aggression and violence, evaluate the valence of behavior and deal with the ambiguity of meaning?
2. Sense-making from a human–AI hybrid approach
My engagement with language AI started out as a methodological exploration, taking advantage of new technologies to make sense of historical texts, aligned with “the logic of innovation” in reflections on interdisciplinarity. But due to the unique nature of these texts, systematic ethnographic fieldnotes of children’s socio-moral life, this human–AI collaboration blurs the presumed boundaries between methodological and theoretical interdisciplinarity (Klein, Reference Klein, Frodeman, Klein and Mitcham2010) and embodies a multi-layered interdisciplinary pursuit: Creating a dialogue on learning between children, ethnographers and AI, it not just examines the methodological problem of how to interpret behavioral meaning via texts, i.e., ethnography versus AI. Meanwhile, it examines the theoretical and epistemological question of how the socio-moral knowledge is acquired by human (or artificial) agents. To some extent, it also sheds light ontologically (Barry & Born, Reference Barry, Born, Barry and Born2013), because how we (children/ethnographer/AI) know what we know in the socio-moral domain is intimately connected with the nature of the very entity itself: what is socio-moral intelligence, or what are the different types of such intelligence?
All of these quests are entangled in my unexpected encounter with a rare archive, an encounter that bridges the archive’s past life and afterlife, testifies disciplinary and interdisciplinary developments, and signifies temporality and even serendipity. I encountered this archive, ironically, thanks to my marginal positionality within my discipline: Mainstream cultural anthropologists are skeptical of quantitative methods (Horowitz et al. Reference Horowitz, Yaworsky and Kickham2019) and do not focus on children or how the human mind develops in childhood (Hirschfeld, Reference Hirschfeld2002). But similar to my predecessor Arthur Wolf six decades ago, I advocate methodological pluralism in anthropology, emphasize child development and have additional training in psychology (Xu, Reference Xu2019). Arthur Wolf loved and believed in numbers, but at the heyday of typewriters and 3 × 5 notecards, no efficient computational technologies were available for analyzing his archive, and no anthropologist anticipated the advent of language AI. In a word, the story of my encounter defies a deterministic account of interdisciplinarity and prompts us to take seriously historical contours and contingencies (see also Barry & Born, Reference Barry, Born, Barry and Born2013; Suchman, Reference Suchman, Barry and Born2013).
2.1. The wolf archive: when old fieldnotes meet new lenses
Among the first Euro-American anthropologists to study postwar Taiwan, Arthur Wolf conducted more than two years of dissertation fieldwork (1958–60), together with his then wife Margery Wolf and a team of local research assistants, in a Hoklo Han village called Xia Xizhou. In 1958, the village had about 600 residents, and on average, a household had more than three children. On the eve of Taiwan’s rapid industrialization and economic development, most households were still poor, making a living through farming and factory work. In the context of the Martial Law era, most villagers were Taiwanese (benshengren), descendants of southern Fujian Chinese migrants who had settled in this area in the eighteenth and nineteenth centuries during the Qing dynasty.
Once known under the pseudonym Peihotien in Margery Wolf’s classic ethnographies, this village, portrayed in the Wolfs’ seminal works on marriage, kinship, women and gender, remains an iconic landmark in the study of Han societies. Many did not know, however, that the original purpose of the Wolfs’ trip to Taiwan was to study children, because the Wolfs did not publish any systematic analysis of their fieldnotes about children. Their original research was an improved replication of the Six Cultures Study of Child Socialization (hereafter “SCS”), a landmark project in the mid-twentieth-century American anthropology and interdisciplinary research on childhood, based on mixed-methods ethnographic fieldwork across six societies, led by a team of anthropologists and psychologists (LeVine, Reference LeVine2010). The Wolf archive contains several types of fieldnotes, including household demographic survey, naturalistic observations of children’s social life, standardized interviews with children and mothers and transcripts of projective tests that used culturally appropriate prompts to elicit children’s spontaneous storytelling. This archive, containing rich and systematic information about the everyday lives of over 200 children (ages 0–12 years) in rural Taiwan, was unprecedented at the time, because ethnographic materials on ChineseFootnote 7 childhood before the Wolfs’ research were rare and mostly anecdotal (Xu, Reference Xu2022). It also set up a benchmark difficult to match in anthropology today, due to its sheer volume, interdisciplinary methodology and comprehensive naturalistic observations.
My re-analysis bears witness to historical developments in data processing. Over six decades ago, before the era of personal computers, the demand for analyzing large amounts of fieldnotes was one reason why the publications of SCS data were delayed (LeVine, Reference LeVine2010) and why Arthur Wolf left these materials behind and turned to other projects.Footnote 5 Fortunately, these materials were well preserved in his private library in Northern California. In 2018, when Wolf’s widow, anthropologist Hill Gates, granted me unique permission to use this archive, I decided to use NLP methods to analyze these precious data.
Moreover, AI-interdisciplinary in this project has a unique bottom-up dimension, where the nature of knowledge infrastructure and the form of our knowledge mutually transform each other (Bowker, Reference Bowker, Anand, Gupta and Appel2018). While my predecessor lacked computational tools to process his data at that time, my predicament was how to write an ethnography without first-person fieldwork experience. Fortunately, this archive has a rare duality, combining systematicity and richness, which affords me the opportunity to apply computational analysis to complement close reading. My team built a text-as-data pipeline and incorporated computational methods to analyze these data. Using OCR software, we transcribed all the raw fieldnotes into machine-readable text files, copyedited the texts manually to ensure data accuracy and then built a database through Python programming language to organize these texts. The core part of this archive, CO, contains 1668 documents (about 250 words on average per document), with systematic information such as participants’ IDs, location, date and time. Our database used an indexing system based on chronological order, according to the date and time each observation took place. These observations provided quasi-randomized snapshots of children’s spontaneous interactions, not biased by pre-determined behavioral themes. New computational methods can extract textual patterns from CO that would have been obscured otherwise. Meanwhile, each document contains microscopic and meticulous observation of social interactions and sometimes the observer’s on-the-moment reflections too, which makes ethnographic interpretation possible.
NLP approaches have become increasingly important in many social sciences and humanities disciplines (Justin et al. Reference Justin, Roberts and Stewart2022; Terras et al. Reference Terras, Nyhan and Vanhoutte2016). Cognitive scientists also advocate using LLMs to annotate cultural materials (Dubourg et al. Reference Dubourg, Thouzeau and Baumard2024). In contrast, although texts, especially ethnographic fieldnotes, are essential in socio-cultural anthropology, applying NLP methods has not yet become popular in my discipline. Mainstream anthropological critique tends to see ethnography, qualitative interpretations based on immersive, long-term engagements in often messy social situations, as a competitor epistemology to the “big data” approach (Douglas-Jones et al. Reference Douglas-Jones, Walford and Seaver2021). At a fundamental level, however, ethnographers and language AI both face the challenge of interpreting the meaning of human experience through text. Doubtlessly, language AI such as LLMs embodies a very different set of techniques from ethnography. The very first step of any LLM is to tokenize texts and transform them into embeddings, numerical representations (vectors) in a high-dimensional space and meaning is inferred through patterns of statistical probabilities. This ontology of language as geometry and statistics does not seem natural to the human mind, but some of the ways computational methods tackle the question of meaning-in-context potentially align with, rather than contradict, hermeneutic approaches (Fuller, Reference Fuller2020; Widdows, Reference Widdows2004). Toward the shared goal of meaning-interpretation, ethnographic fieldnotes present a unique opportunity for applying language AI to understanding diverse human experience in cultural contexts.
My hybrid approach facilitates cross-fertilization between anthropology and data science, especially in the era of AI. At the beginning of this project, the affordance of this archive and the impact of NLP in social sciences prompted me to step out of my comfort zone to learn computational methods, with the help of online tutorials, courses and consultants. But at a later step, when I wanted to incorporate more advanced technologies like LLMs, I collaborated with a data scientist. Rather than a mechanistic application of NLP to ethnographic fieldnotes, effective interdisciplinarity should embrace mutually constitutive collaboration of “grasping” (ethnography) and “measuring” meaning (LLMs). Our collaboration involves understanding each other’s domain expertise, finding common ground related to this project – e.g., a specific goal in my project can be achieved through a particular manipulation of an LLM – and acknowledging, not just at an abstract level but through concrete practice, the strengths and limits of each approach. It includes revising our goals and plans to accommodate differences, contingencies and setbacks. It demands openness toward unexpected patterns in the data, updated technology, shifting conceptual interests and the entanglement of all these factors. Moreover, from the original production of the Wolf archive to my re-interpretation today, this journey involves multiple actors and chains of translation. My re-analysis reaffirms, rather than discounts, the inherently humanistic nature of fieldnotes: This is manifest “in the intersubjective experience that made the texts (fieldnotes) possible, in the human expertise essential for constructing and interpreting ‘machine-reading’ patterns and numbers and in the layers of human biases in knowledge production (Xu, Reference Xu2024: 24).”
2.2. Learning about and from children
The ethnography–AI dialogue in this project also benefited from, and in some sense, was a novel extension of my interdisciplinary work in child development, which is an inherently open and hybrid field. My re-analysis of the Wolf archive focuses on patterns of cooperation and conflict in young children’s social world, an important part of moral development, my long-term theoretical interest. Trained in anthropology and psychology, I adopt a theoretical approach of cognitive anthropology that takes into consideration both psychological processes and cultural histories. This basic theoretical premise is different from SCS’ and the Wolfs’ behaviorist paradigm in the 1950s, which focused on behavior itself and treated the mind as a black box. Their paradigm also bypassed the question of learning morality, as children’s behavior was assumed simply as the response to external reward–punishment. Decades of new research have since revolutionized our understanding, revealing that young children have much more complex social cognition than what behaviorists once assumed (Sommerville & Decety, Reference Sommerville and Decety2016), that they are actively motivated to learn about and explore the social world (Gopnik, Reference Gopnik2020) and that social cognition is intimately connected to how children learn, transmit and transform human culture (Hirschfeld, Reference Hirschfeld2002). In particular, the early development of morality, underpinned by multiple domains of social cognition, such as prosociality, empathy, hierarchy and Theory of Mind, has become a central topic in the new interdisciplinary synergy on what it means to be human (Tomasello, Reference Tomasello2019). In the past decade or so, cognitive scientists have advocated for studying children’s social cognition in cross-cultural fieldwork beyond Western laboratories (Barrett, Reference Barrett2020), partly inspired by the SCS’s legacy (Amir & McAuliffe, Reference Amir and McAuliffe2020). Addressing this new trend of largely quantitative, standardized studies, socio-cultural anthropologists emphasized the vital importance of ethnography in generating ecologically valid knowledge about moral development (Xu, Reference Xu2017, Reference Xu2019; Kajanus, Reference Kajanus2024). However, systematic historical records of children’s natural behavior in an entire community are rare. Re-discovering the Wolf Archive, my research brings children’s socio-moral life from the margins of history to the center stage of theorization.
Moreover, in the current moment of AI boom, children’s extraordinary learning capacities also inspired collaboration between cognitive and computer scientists to decipher the ultimate secret of human intelligence (Frank, Reference Frank2023). While cutting-edge AI has unprecedented computational efficiency to extract statistical patterns from enormous amounts of natural language data, its impressive linguistic competence is still dissociated from social-cognitive capacities that seem commonsense to humans (Mahowald et al. Reference Mahowald, Ivanova, Blank, Kanwisher, Tenenbaum and Fedorenko2024). Scientists have found that young children, sometimes even infants, have better “commonsense” in understanding and predicting social behavior than LLMs in similar experimental tasks (Stojnić et al. Reference Stojnić, Gandhi, Yasuda, Lake and Dillon2023; Ullman, Reference Ullman2023). Compared to LLMs, human children develop such commonsense through more flexible and diverse ways of learning (Yiu et al. Reference Yiu, Kosoy and Gopnik2023). Drawing from much smaller amounts of very special “training data,” lived experience with its own features, children can make efficient generalizations in novel situations (Smith & Karmazyn-Raz, Reference Smith and Karmazyn-Raz2022). This has important epistemological implications for anthropology, because the kind of social cognition that human children are exceptionally good at acquiring is precisely the foundation for ethnographic knowledge about society and culture (Xu, Reference Xu2024).
Therefore, motivated by interdisciplinary theoretical interests, I put children’s learning at the center to tackle the question of meaning in an integrative comparison of ethnographic knowing and AI. Such raw fieldnotes documenting children’s moral life in a non-Western context are rare and precious, very different from the kind of data derived from controlled experiments in psychology labs, or the types of texts current LLMs are trained on. The Wolf archive provides a significant opportunity for a natural experiment to traverse disciplinary boundaries at methodological, epistemological and to some extent, ontological levels, all of which intersect at the key nexus of children’s social cognition.
3. To fight or not to fight: navigating a maze of meaning
3.1. Fieldnotes as communicative products
Recently, evolutionary anthropologists have begun to use NLP techniques to analyze ethnographic databases, such as eHRAF (Human Relations Area Files), and examine human morality across cultures (Garfield et al. Reference Garfield, Schacht, Post, Ingram, Uehling and Macfarlan2021). However, this big-data approach rarely addresses how such textual records of human behavior are shaped by the actual process of fieldwork. The case of children’s fights in the Wolf Archive demonstrates that fieldnotes are the communicative products between social minds in a cultural context. Through analyzing interview transcripts with mothers, I discovered a cultural model of parenting in Xia Xizhou, that is, the prohibition of children’s fights via preventative preaching, timely intervention and harsh punishment. This aligns with the overarching ideal of maintaining neighborly harmony in close-knit Taiwanese communities. However, through close reading and comparing three types of children’s own narratives, I identified an opposite pattern: the prevalence of children’s fights.
First, in Child Interview, standardized interviews with children, respondents indicated that they would avenge if hit by another child. Second, responding to a similar physical aggression question in the School Questionnaire, children were hesitant to tell what they really wanted to do but chose “do nothing,” the appropriate answer deemed by adult standards. The striking difference speaks to the inherently intersubjective nature of fieldwork: the interviews were conducted by the Taiwanese research assistant/teen girl MC, children’s trusted playmate and confidante, inside the village and in Taiwanese. The questionnaire survey was administered by Arthur Wolf, a foreigner whom children felt distanced from, in their classroom and in Mandarin, the only approved language in an authoritarian school setting. Lastly, in a projective test that features ambiguous drawings of children’s interactions, conducted by a male Taiwanese research assistant, children would spontaneously tell stories of fighting and punishment. Actually, simple NLP techniques identified that the word “fighting” ranked as the highest frequency verb in this projective test corpus (Jing, Reference Xu2024: 85). These three fieldwork methods, by researchers with different positionalities and in different settings, tapped into different kinds of knowledge that children developed regarding the issue of fighting: Explicit attitude in a first-person scenario (Child Interview), impersonal and normative knowledge (School Questionnaire) and implicit attitude reflected in a third-person scenario (Projective Tests). This case revealed children’s acute sensitivity to communicative cues, including language, contexts, partners’ identity and intentions. Such complex and subtle social cognition is what made the communication between them and the researchers possible. It determines the nature of these fieldnotes.
3.2. From words to deeds
Beyond children’s narratives, I examined CO texts to understand their actual behaviors and motivations, combining top-down and bottom-up approaches, by putting my conceptual goals in dialogue with what emerges as salient form of the data itself. First, approaching these fieldnotes as communicative products, I closely read and manually coded each and every episode in the CO corpus to identify categories and patterns of social behaviors, taking into consideration the various communicative cues in the social context. Fortunately, all these observations were recorded by the research assistant MC, “Older Sister Chen,” a teenage girl whom children trusted and loved. Consistent with what children told MC in interviews, fighting emerged as a salient behavior in natural observations: over 80% of all children (ages 0–12 years) were involved in physical aggression. Unsupervised machine-learning analysis lends additional support to this finding. I applied LDA topic modeling to this corpus, and one of the topics it generated likely depicts physical conflict, manifest as a cluster of co-occurring, high-frequency words. The top ten words, from the highest to the tenth highest frequency, are “hit,” “mother,” “hard,” “angry,” “back,” “head,” “copulate” (cursing) “laugh,” “angrily” and “fight” (Xu, Reference Xu2024: 87; see also “Supplementary Information I”). To further combine ethnographic expertise with big data approaches and efficiently compare children’s fights with other topics in the CO corpus, I resorted to LLMs.
3.3. Semantic search via S-BERT
Before generative language AI became available, I started collaborating with a data scientist and our collaboration unfolded in a reiterative, cyclic process. Drawing on my conceptual interests, ethnographic insights and topic-modeling results, the data scientist fine-tuned S-BERT (Sentence-Transformers), an LLM that derives semantically meaningful sentence embeddings, and is therefore good at capturing contextual nuances of meaning (Reimers & Gurevych, Reference Reimers and Gurevych2019).Footnote 6 We computed six high-dimensional “theme vectors,” each representing an ethnographic theme developed for this corpus. In parallel, each episode of CO was transformed into a corresponding “field observation vector.” Rather than a traditional classification task, we approached these vectors as an asymmetric semantic search method. Together, we identified meaningful patterns in the text by integrating qualitative judgment with computational embedding techniques.
The underlying rationale for the formation of “theme vectors” is to emulate ethnographic epistemology and discern themes within field notes. I chose six themes, family, school, play, cooperation, conflict and shopping, through the combination of inductive and deductive reasoning. Family and school were major settings of children’s interactions and play was a prevalent scenario. Cooperation and conflict reflected my conceptual interest in moral development and occurred a lot in these fieldnotes. I also added the theme of shopping: There were three little shops in the village, common gathering places for children, who were either sent by adults to run errands for the family or buy things for themselves. Even though shopping was not a conceptual topic in my project, it provided a useful reference point to experiment with computational techniques: In contrast to more abstract themes such as cooperation and conflict, “shopping” was relatively easier to discern by the algorithms, associated with a more distinct set of words, i.e., “buy,” “sell,” “money,” “dollar,” “cents,” etc. Also, it makes intuitive sense that a given observation might express multiple themes. Imagine this scenario: a child was sent by his parents to buy some sugar or rice, but on the way to the shop, he got into a fight with another child, so the themes “family,” “shop” and “conflict” were all relevant in this one episode.
We integrated ethnographic close-reading, topic-modeling and word-similarity exploration to compile a list of keywords for each theme. First, I selected some words from topic modeling outputs as part of our keyword list, for example, words in the topic that resembled children’s fighting, as mentioned earlier, and supplied other words from the fieldnotes. We then used the S-BERT pretrained model to generate words similar to these selected words in context, expanded the list and inspected it. The list of corresponding keywords was then used in S-BERT to generate six “theme vectors.” To construct “field observation vectors,” we performed meticulous preprocessing of raw observational texts. Common stop-words were expunged to hone the clarity and pertinence of the data, but we preserved as many words as possible so that the processed texts still contained important information for semantic search tasks, such as negation words to infer behavioral valence, pronouns related to family and kinship, etc. We did not perform stemming or lemmatization for the same purpose of preserving semantic accuracy and complexity.
We then used S-BERT, specifically, multi-qa-mpnet-base-dot-v1, to generate dense vector representations of both the theme-related texts and the unlabeled field observations. Each text was embedded as a 768-dimensional vector using mean pooling. We embedded six “theme-vectors” as reference points in the same semantic space as our 1678 “field observation vectors.” We constructed this shared vector space to facilitate meaningful comparisons between the observed data and our qualitative themes (see Figure 3). To conduct semantic search, we employed FAISS, an open-source similarity search library (Douze et al. Reference Douze, Guzhva, Deng, Johnson, Szilvasy, Mazaré, Lomeli, Hosseini and Jégou2024).Footnote 8 For each field observation, we computed cosine similarity scores (Singhal, Reference Singhal2001) against each of the six theme vectors. This produced six similarity scores per observation, effectively capturing its proximity to each theme. We used the top semantic matches to trace subtle alignments and divergences across the dataset – blending computational distance with ethnographic interpretation. Scaled similarity scores indicate relative magnitude between different themes in a given observation; in other words, the sum of all six scores in a row equals one, providing an automated method to compare across themes and observations quantitatively. The data scientist selected a random subset of observations for validation, and the anthropologist ranked the six themes for each observation. The relative magnitude of these similarity scores largely matched with manual ranking.

Figure 3. A schematic representation of semantic similarity analysis.
3.4. Conflict as the dominant theme
In a given observation, the theme that has the highest similarity score can be considered the dominant theme. Rendered in a heatmap, the darkest color in a row represents the dominant theme in an observation (see Figure 4). In general, conflict and play figured more prominently than the other four themes, shedding light on the playful world of naughty children. The following excerpt from a raw fieldnote (CO #851) is a typical scenario of conflict. Note that in the Wolf archive, each person, adult or child, is indexed by a unique number. In the digitized CO corpus, I assigned each episode a unique ID according to chronological order.
At this point 150 (six-year-old boy) hit 51 (four-year-old boy). 51 started to cry. 49 (51’s older brother) went over and put his arms around 51.
156 (150’s older sister) yelled at 150: “Why did you hit him!” 156 hit 150.
150 looked like he was going to cry and said: “Who told him to hit me first?”
49 to 150: “You already hit him first once.”
153 (150’s grandmother) came out of the house. 156 to 153: “Your 150 hit someone.”
153 picked up a stick and sneaked up behind 150 and hit him with it and said: “Why do you hit people?” 150 jumped startled. Everyone else laughed. 150 looked very unhappy.
153 hit him again and said: “Why did you hit him? You dare to do it again?” (MC)

Figure 4. Heatmap representation of field observation scaled similarity scores (normalized according to the softmax function). For every field observation vector, the sum of similarity scores of all six themes equals 1.
This episode involves layers of moral actions, sentiments and reasoning: 1) an older boy bullied a younger one from another family; 2) the aggressor 150’s older sister 156 scolded and hit him, reflecting two common norms in this particular community: a) older children were supposed to yield to younger ones instead of bullying them; therefore, 150’s behavior was deemed wrong; and b) older children had some authority over their younger siblings therefore 156 could punish 150; 3) the aggressor tried to defend himself, citing the justification that the victim was the one who initiated the aggression; 4) The victim’s older brother stepped up to protect the victim, provided first-hand testimony to refute the aggressor, and tattled to the aggressor’s grandmother, a common tactic in this village; and 5) the grandmother, an adult authority figure, also punished the aggressor. Through such moral dramas of conflict and punishment, children not only learned about care and justice, moral responsibility and agency, authority and hierarchy, but also attempted to assert their voices in various ways.
Conflict being a salient topic in children’s peer interactions did align with my ethnographic close reading, and LLM correctly detected conflict as the dominant theme in many observations, such as #851, with a much higher score compared to the other themes (see Table 1). Intuitively, though, it is not hard to discern the nature of interactions in texts like #851. What about more ambiguous scenarios? Ranked by similarity scores, conflict is the dominant theme in nearly half of all CO (n = 827), even more than play (n = 759) and much more than cooperation (n = 37), shopping (n = 29), school (n = 25) and family (n = 1). Could the similarity scores under the theme of conflict be positively biased in some observations, for example, in children’s playful fights?
Table 1. Cosine similarity scores of selected observations(#851: actual physical conflict; #406: playful dueling; #179: “killing that Strange Bird” game; #468: playful dueling with a disagreement). Conflict is ranked as the dominant theme in all four observations

4. Pretend-play as meta-communication: when AI meets human children
Imagine, the word “hit,” a high frequency word in the CO corpus and a keyword in the conflict theme, can mean many different things: a child accidentally hitting something, a non-social behavior; a child hitting another child, a real fight; or a child pretending to hit another child, and they both knew it was just playful teasing, a meta-communicative act that requires shared intentionality and predicates upon cooperation. When I closely read the observational texts in which conflict was computed as the dominant theme, I found that a small portion of those texts were not about real fighting, but about playful games. In other words, our LLM seemed “confused” by the signature activity of human children, pretend play.
Dueling is a popular type of pretend fight. Take CO #406 as an example, about several boys happily dueling with sticks:
424 (a five-year-old boy) had a stick in his hand and was waving it at 415 (a three-year-old boy from another household).
424 to 415: “Do you dare or not? (Are you afraid or not?)”
415: “I dare.”
424: “Really?” 424 poked the stick straight at his stomach. 415 grabbed it.
They pulled at the stick. 424 got it and said: “Ha! Ha! I won!”
424 to 414 (415’s five-year-old brother): “It’s very easy to take it away from him. I only have to turn around.” 424 demonstrated. 424 started waving his stick in the air yelling: “Hey! Hey!” 414 ran over to his house and 424 followed him. 415 went, too. 414 & 415 stopped to watch 400 (their cousin, a nine-year-old girl) do her homework. 424 went over and dug in some mud with his stick. 424 put his stick away after a while.
414 climbed on a pile of dry wood and said: “I’m going to find myself a sword.”
424: “I want to find one, too.” He climbed on also.
424 to 414: “Let’s compete in a duel.”
414: “Alright.” 424 found a sword, but 414 hadn’t gotten one yet.
424 swished at 415 and said: “Yi! I kill you.” He did this also to MC and 414. They laughed. (MC)
Although our LLM computed conflict as the dominant theme in this observation, not play or cooperation (see Table 1), commonsense human reading can immediately tell the playful nature of this scenario, qualitatively different from the kind of real fights like the aforementioned #851. In this dueling scenario, the mischievous boy 424’s shout, “Yi! I kill you,” reminds me of the beginning vignette, “kill that Strange Bird.” In that episode, CO #179, conflict was computed as the dominant theme too (see Table 1). Perhaps verbs like “kill,” especially in a social context, biased S-BERT toward the theme of conflict or overshadowed subtle linguistic features of playful interactions? Perhaps the construction of theme-vectors did not put enough emphasis on meta-communicative cues of play (Bateson, Reference Bateson2000 [1972]) in the fieldnotes, meaning verbal and nonverbal signals about how a given utterance, gesture and action is meant to be interpreted, contingent upon the relationship between the communicators? To address these concerns and examine how LLMs “understand” children’s pretend-fight, I selected a subset of CO that features this meta-communicative genre and analyzed it via generative AI.
4.1. Evaluating pretend-fights via GPT
The “pretend-fight” subset of CO includes 24 episodes that “confused” S-BERT in the semantic search task. These 24 episodes feature a variety of settings, games, characters and actions: for example, playing “police-and-thief,” playing funeral and other rituals, dueling, playing “mom spanking and scolding children” and playing school. Together, they present a microcosm of children’s social world and contain rich information about moral norms in the local society. Despite the fact that all participants’ real names were replaced by numbers and no identifiable information was present in the data, I only fed a small portion of the CO data (14%) to GPT as a deliberate ethical choice to protect data privacy. I analyzed this sample corpus via GPT. Compared to S-BERT, which outputs embeddings/vectors and is recommended for similarity analysis, GPT is good at summarizing target texts and generating new text. While the complex algorithmic workings of LLMs are a “black box” to the human eye, GPT’s natural language output of explanations and justifications provides a window into its own reasoning (Dillion et al. Reference Dillion, Mondal, Tandon and Gray2025).
Using the OpenAI API, I prompted GPT-3.5-turbo to perform three tasks: 1) sentiment analysis to evaluate the sentiments in each episode and explain the evaluations; 2) topic models to describe the topics of each episode; 3) classification to classify the episode as conflict, cooperation and pretend-play, generate scores under each category (between zero and ten) and explain the scores (see “Supplementary Information II” for the model outputs).Footnote 9 Taken together, the GPT-generated texts resembled an anthropologist’s ethnographic interpretation to some extent, capturing the pretend-play theme, playful sentiment and contextual information of these observations. However, in some cases, the GPT model failed to read nuanced cultural meaning or grasp layered intentionality in children’s tactful coordination. Its outputs also disclosed adult-centered, Western-centered biases and misunderstandings. Of course, my evaluations of GPT biases did not come from a neutral stance, as there is no “culture-free” or “mind-free” human observer. Instead, I drew these conclusions based on my years of immersion in and comprehensive knowledge about the entire archive, not just this subset, my long-term expertise on child development in Chinese communities and cross-cultural comparative contexts, and my first-person experience as a Han Chinese daughter and mother.
For example, CO episode #468 was similar to the aforementioned #406, both about a dueling game and the protagonists overlapped. One difference is that in this episode, children had some disagreements despite the playful situation. Although no real fight occurred in this episode, S-BERT semantic search computed a very high similarity score to “conflict,” even higher than that in #851, where a real physical fight took place (see Table 1). CO #468 unfolds like this:
424 (8-year-old boy) & 475 (10-year-old boy from another household) were dueling with their hands. Each time 475, who is much bigger than 424, touched 424, he would say “You die!” 424 didn’t comment.
414 & 415 (two boys from another household, 5-year-old and 3-year-old) ran up to help 424. They were laughing.
424: “Whenever you touch me, you say I die, but you never say you die when I touch you.” 475 didn’t answer.
Finally, 424 & 414 ran into each other and 424 fell down.
424’s grandmother called him just then, but 424 ignored her. 424 looked like he hurt a little and stopped playing for a while and watched. Then he joined in again and repeated his earlier complaint to 475 again. 475 was too busy fighting to answer. (MC)
GPT generated this sentiment analysis output:
The sentiment in this text is mostly negative. There is aggression and conflict between the characters, with 475 repeatedly threatening 424 with death. 424 seems to be feeling frustrated and hurt by 475’s actions, as he points out the unfairness of the situation. The fact that 424’s grandmother calls him and he ignores her also suggests a lack of respect or concern for others. Overall, the tone of the text is tense and confrontational.
However, the “mostly negative” sentiment evaluation is contestable, as “aggression and conflict” is actually pretend-play and “repeatedly threatening … with death” is a misunderstanding of a game rule: If one touches another, one declares victory (“You die!”). The moral judgment, “a lack of respect or concern for others,” reflects adults’ assumption that children ought to obey their family elders. Besides sentiment, the topic models output defined this episode as “about a physical altercation,” which obscured its playful nature. In the next step, classification task, prompting the model with the meta-communicative category “pretend-play” improved its understanding to some extent: It returned a high pretend-play score (7): “The children are engaging in a playful activity of dueling with their hands, using imaginary scenarios of ‘You die!’ when touched,” and mentioned the laughing and playful manner. However, it also returned an equally high conflict score (7), framing the dueling game as “physical conflict,” and a low cooperation score (2), discounting the action of two little boys helping the child 424 as “no true cooperation.”
Similar to that of #468, the classification results of all 24 episodes reveal a general pattern of high pretend-play, high conflict and relatively low cooperation with larger variance (see Table 2). These results reflect the GPT model’s adult-biased moral stance toward children’s social world: Playful teasing in rough-and-tumble play is a common type of activity through which young children learn to read others’ minds, regulate their emotions and bond with others (Johanna et al. Reference Johanna, Winkler and Cartmill2020). Despite acknowledging these as pretend-play, the GPT model readily classified children’s physical interactions as conflict and aggression and obscured the cooperative dimensions. The asymmetry between cooperation and conflict manifests not only in the scores but also in the wording: GPT characterized several playful interactions as not “true cooperation,” but it never used the “true/not true” qualification for conflict.
Table 2. Descriptive stats of classification analysis outputs from GPT-3.5-turbo

Beyond adult-centric morality, GPT outputs also revealed Western-centric cultural and moral biases. In one episode, several children played the game “carrying and killing a pig.” GPT model evaluated this game as “cruel and violent,” which missed the cultural meaning: Children were mimicking the popular festive ritual in northern Taiwan, where pigs were slaughtered and displayed as sacred offerings for local temples. Another episode begins with this meta-communicative cue: “The children had evidently been playing ‘school,’ with 138 (a 7-year-old girl) as teacher.” In this game, a group of children creatively mimicked their classroom scenario, the teacher announcing grades to students and instructing them to stand in a line according to their grades. Then the teacher told the student with the best grade, a five-year-old girl, that she could hit the other students’ hands, as her reward and the others’ punishment. This part was probably children’s improvisation during the game, rather than an imitation of what really happened, and the “hit” was likely light, playful hitting. The GPT output judged these children in a negative moral tone: “While the children are playing a game together, there is a lot of competition and selfish behavior displayed,” “selfish” because children were “focused on their own grades and status.” The disapproval of academic competition and hierarchy was likely based on a Western standard, as the GPT’s training data on schooling mostly comes from English sources, whereas in Chinese societies individual merit in academic competition was considered a fundamental virtue (Xu, Reference Xu2019).
4.2. Killing what bird?
As a final example, let us go back to the opening vignette, “kill that Strange Bird” (CO #179). GPT did recognize children’s interactions as pretend-play, with a high score of 8; however, it took a literal stance in the evaluation of negative sentiment, “The children are making violent threats towards the Strange Bird character, showing aggression and a lack of empathy.” Its classification of conflict, with a high score of 8, resembles that of S-BERT similarity score (see Table 1) and discloses a superficial, or at least distanced, moral judgment that likely missed the protagonists’ (children’s) subjective experience: “The use of weapons and violent language indicates a high level of conflict in the interaction.” In terms of concrete meaning, GPT figured out that the “Strange Bird” was a character played by one child and the other children wanted to shoot it using weapons, as mentioned in the text. But it did not discern the bottom layer of children’s figurative speech, the bird as a metaphor for a military airplane. This metaphor was never explicitly stated in the observation, yet the participating children understood it well, probably as a slang term of children’s subculture in that particular spatial–temporal context. A careful ethnographer can also infer the meaning based on children’s meta-communicative cues, such as “talking about using antiaircraft guns and Nike missiles to kill the bird.”
Playful teasing in such pretend-fight scenarios seems trivial and mundane, but it is ubiquitous and prevalent in children’s world. It is intrinsically fun and rewarding for them. Such lived experience, simultaneously enacting and comprehending pretend-fights, likely constitutes an important type of “training data” that fuels children’s learning about morality, society and culture. These moral dramas, loaded with cultural meaning, require and demonstrate children’s extraordinary intelligence in reading other people’s mental and emotional states, entertaining counterfactuals, exploring causal reasoning and human agency and addressing misunderstanding and disagreement (Briggs, Reference Briggs1992; Buchsbaum et al. Reference Buchsbaum, Bridgers, Weisberg and Gopnik2012). Moreover, interpreting children’s pretend-play provides a unique window into ethnographic epistemology. To use a familiar analogy from Clifford Geertz (Reference Geertz1973), the key to good ethnography is to distinguish “winking” from “blinking,” that is, to interpret the precise intention of human communication, given the context, the social code, the sender and intended receiver of the message, as well as their relationship. Taken together, although prompting the GPT model generated improved knowledge about pretend-fights compared to our previous approach of semantic search, gaps still exist between such AI and children’s socio-moral sensibilities or ethnographic “thick description” (Geertz, Reference Geertz1973).
5. Concluding thoughts
This paper introduces a novel synergy between cutting-edge language AI and anthropology, through tracing children’s “mindsteps” (Briggs, Reference Briggs1992: 26) in their moral dramas. My quest began as a methodological experiment, combining ethnographic expertise and a variety of language AI technologies to analyze natural observation texts about children’s everyday life in rural Taiwan during the Cold War, as part of re-discovering a unique and historically significant set of fieldnotes. Due to the special nature of these fieldnotes, systematic texts with rich yet understated cultural meaning and human motives, as well as the special protagonists of these fieldnotes, young children – the most voracious learners, my quest led to epistemological and ontological reflections. Mundane fieldnotes about children’s “fighting,” as products of complex human communication, pose challenges of meaning interpretation for computationally powerful LLMs and prompted me to examine the similarities and differences between AI and ethnographic sense-making. Through deciphering moral dramas and interrogating how we know what we know, this layered encounter between children, ethnography and AI raises further questions on the nature of intelligence, especially socio-moral intelligence, a core part of humanity.
Language AI has become a new analytical tool across the humanities (Terras et al. Reference Terras, Nyhan and Vanhoutte2016) and social sciences (Grossmann et al. Reference Grossmann, Feinberg, Parker, Christakis, Tetlock and Cunningham2023), but it has yet to gain traction in socio-cultural anthropology. With our ambitious mission of examining human diversity in its fullest sense, rich theoretical traditions of meaning interpretation in cultural contexts, the ethnographic method attuned to first-person experience and local knowledge, and the centrality of texts – fieldnotes – anthropology can make critical and productive engagements with language AI. In particular, raw texts of naturalistic observations record the messiest human experiences and are mediated by intersubjective encounters; therefore, they encapsulate rich and often ambiguous behavioral information, suitable for exploring the power and limitations of AI. Using fieldnotes of children’s naturalistic play, I have demonstrated how ethnographic close-reading, machine-learning techniques (e.g., unsupervised topic modeling), transformer models (e.g., S-BERT) and generative models (e.g., GPT) can complement and augment each other’s value through iterative integration. Going beyond a superficial methodological interdisciplinarity, that is, adding a thin layer of qualitative analysis to computational results or vice versa, I engaged with different ways of knowing, integrating experience-near and experience-distant interpretation in thick contextualization. Discerning the meaning and sentiment of ambiguous texts, such as children’s pretend play, demands an in-depth understanding of the historical–cultural foundations and the social–cognitive mechanisms that make such moral dramas possible. I traced how texts came into being, drew attention to fieldnotes as communicative products, and highlighted meta-communicative cues to compare ethnographic and algorithmic epistemology.
Even though this project focused on a unique, unusually large and systematic corpus of data by the standard of anthropology, this kind of interdisciplinary vision is still valuable for analyzing more typical fieldnotes. On the one hand, despite its larger-scale and more substantial involvement of research mediators, the final products, English texts in this archive, were mostly written by one person, Margery Wolf, not too different from more conventional anthropological fieldnotes that are usually written by a single author. With such consistency in author and writing style, many techniques used in my project can be applied to typical fieldnotes, e.g., topic modeling, semantic search, sentiment analysis, etc. On the other hand, regardless of how exactly they were collected, all fieldnotes are written by and for ethnographers with certain contextual knowledge and expectations about a particular cultural community. However, most of such contextual knowledge is tacit, not explicitly stated in the text. Therefore, the end of the text itself marks the beginning of the fuzzy business of meaning interpretation, which requires anthropologists’ “context-revealing deep probing” to complement language AI’s “pattern-seeking deep learning” (Meng, Reference Meng2021).
Finally, this human–machine hybrid reading of children’s play prompts us to reflect on culturally loaded moral concepts such as conflict, aggression and violence: Where do we draw the line between “playful” and “serious” fights? What do all these abstract concepts mean in the concrete, experiential world, especially when concepts travel from one cultural context to another, i.e., from the contemporary Euro-American world to postwar Taiwan, from popular Internet documents to rare fieldnotes of naturalistic observations and from the adult world to the children’s world?
Aligning AI systems with human morality and values has generated intense debates. Although recent research has shown progress and promises of cutting-edge LLMs in moral reasoning, it remains a fundamental challenge to address cultural misalignments and group biases (Dillon et al. Reference Dillion, Mondal, Tandon and Gray2025; Jiang et al. Reference Jiang, Hwang, Bhagavatula, Bras, Liang, Levine, Dodge, Sakaguchi, Forbes, Hessel, Borchardt, Sorensen, Gabriel, Tsvetkov, Etzioni, Sap, Rini and Choi2025), not to mention the limitation of dissociation between linguistic competence and social cognition (Mahowald et al. Reference Mahowald, Ivanova, Blank, Kanwisher, Tenenbaum and Fedorenko2024). Also, current LLMs are mostly trained in texts produced by, about and for adults, and guided by adult moral knowledge. Children’s moral drama blends multiple actions, emotions and motives, much like a kaleidoscope with constantly shifting patterns and colors, full of mischief to confuse adults or LLMs. Latest research has demonstrated that even a small amount of young children’s experiential data can train AI systems to improve performance (Vong et al. Reference Vong, Wang, Orhan and Lake2024) and scholars have mapped out ways to systematically bridge the gap between children and LLMs (Frank, Reference Frank2023). If we further incorporate new modalities of data and new techniques to feed LLMs with more relevant information about children’s experience, culture and their developing moral cognition, what new insights would that generate, concerning the nature of humanity and AI, and where would that lead us or leave us? My quest to use new lenses to interpret old fieldnotes raises more questions than I set out to answer.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/cfc.2025.10008.
Acknowledgements
Our deepest gratitude goes to Dr. Hill Gates, the owner of Arthur Wolf’s archive of historical fieldnotes on children in rural Taiwan, for granting Jing Xu the unique permission to use this archive. We are also grateful to the participants in Arthur Wolf’s original research in the mid-twentieth century, as well as the research assistants who worked with Jing Xu to digitize Wolf’s fieldnotes decades later. At the University of Washington, we thank the eScience Institute for providing a unique platform to promote cross-disciplinary collaboration, and we appreciate the generous support from Dr. Stevan Harrell throughout this project. Special thanks to Dr. Xiaojun Zhang at Tsinghua University and Dr. Robert Weller at Boston University for their feedback and encouragement when Jing Xu presented part of this research. Jing Xu is also indebted to Constellate, a text analysis pedagogical platform, whose free courses informed portions of the analyses in this article. Last but not the least, we thank the editors and reviewers for their constructive comments and guidance that improved this article.
Funding statement
This research was supported by a Wenner-Gren Foundation Hunt Postdoctoral Fellowship, a National Academy of Education/Spencer Postdoctoral Fellowship and a Chiang Ching-kuo Foundation Research Grant, awarded to Jing Xu. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Competing interests
The authors declare that there are no conflicts of interest regarding the publication of this article.
Jing Xu is an anthropologist at the University of Washington. Her work examines the interplay of cultural and psychological processes in shaping the development of morality across spatial and temporal contexts, from an interdisciplinary and mixed-methods approach. Lately, her research has ventured into a new direction, the intersection of human and AI ethics from the perspective of learning. She is the author of The Good Child: Moral Development in a Chinese Preschool (Stanford University Press, 2017) and “Unruly” Children: Historical Fieldnotes and Learning Morality in a Taiwan Village (Cambridge University Press, 2024). She has published peer-reviewed articles in English and Chinese journals spanning several disciplines, including American Anthropologist, Scientific Reports, Developmental Psychology, Child Development Perspectives, Ethos, Feminist Anthropology, Evolutionary Human Sciences, Journal of Chinese History and The Sociological Review (《社会学评论》). She is currently an associate editor of American Anthropologist.
Jose Manuel Hernandez is a data scientist and statistician whose work bridges industry and academia. He is currently a Principal Data & Applied Scientist at Microsoft, where he leads statistical modeling and data science strategy on large-scale data challenges. Trained as a quantitative methodologist (PhD, Measurement and Statistics, University of Washington), his work is grounded in Bayesian modeling, causal inference and natural language processing, with a focus on questions at the intersection of technology, society and policy. He applies these methods across disciplines, with projects spanning eviction research, misinformation and labor conditions in the boxing economy. He has published and presented across fields including statistics, computer science and the social sciences, with a focus on blending methodological rigor and socially impactful research.





