Introduction
As news reports of highly pathogenic avian influenza (HPAI) infections in farm animals such as cows and chickens and of transmission to humans increased in the spring of 2024, so did the “bird flu” chatter on social media outlets such as X (formerly Twitter), Instagram, and TikTok. With its proliferation in the 21st century, social media has become an important tool to monitor disease outbreaks and users’ perceptions, allowing public health officials and researchers to determine public sentiment on emergent disease threats and potential interventions.Reference Wilson, Lehmann, Saleh, Hanna and Medford1 As seen in recent public health emergencies such as the coronavirus disease 2019 (COVID-19)Reference Medford, Saleh, Sumarsono, Perl and Lehmann2–Reference Lanier, Diaz, Saleh, Lehmann and Medford9 pandemic and outbreak of mpox,Reference Cooper, Radunsky and Hanna10,Reference Leslie, Okunromade and Sarker11 analyzing social media posts on outbreaks can provide vital insight into the public’s response to infection prevention methods such as testing and vaccinations, the handling of infections already present, the spread of mis- and dis-information, and how government and politics play into current and future public health emergencies.
On June 05, 2024, the World Health Organization (WHO) confirmed the first death of a man in Mexico with H5N2 avian influenza infection,Reference Organization12 although it was later confirmed that the death was not caused by the H5N2 virus but by existing co-morbidities.Reference Organization13 Using this event as a point of reference due to the increase in media coverage, we utilized X to collect tweets focused on the topic of HPAI. By analyzing the tweets using topic modeling, sentiment, and emotion analyses, we hypothesize that we can provide the public health community and government officials useful insights into the fears, concerns, reactions, and possible mis- and dis-information distributed by X users regarding the current HPAI outbreak.
Methods
Data collection and preprocessing
We collected all English-language original tweets using keywords indicative of HPAI including the following terms: “H5N1, H5N2, bird flu, bird influenza, avian flu, avian influenza, A(H5N1), A(H5N2), and influenza A virus” using the X application programming interface v214 and the Tweepy Python library v4.14.0.Reference Tweepy15 We chose X as the social media platform of choice due to the availability of numerous Python libraries for the platform as well as the use of short texts, compared to visual media such as images or video of other platforms. The tweets were collected from June 04, 2024, to June 08, 2024, a time span which included the news of the first human fatality associated with HPAI in Mexico along with numerous news outlets and media discussing the ongoing H5N1 HPAI variant outbreak in the United States. Various metadata were also collected such as the author’s username, self-reported location, whether the author was “verified,” the author’s user description, and counts of likes, retweets, impressions, and quotations for each tweet. All data collection, processing, and analyses were conducted using the Python programming language (version 3.10.5).16
To prepare the data for analysis, we preprocessed by removing duplicate tweets, embedded URLs, emojis, common symbols such as “#” and “@,” and expanding common contractions. The plain text of the tweets was then cleaned further using the natural language processing library spaCy.17 Next, we removed common stop words and created bigrams and trigrams using the Gensim libraryReference Řehůřek18 to prepare for topic modeling.
Analyses
Topic modeling
From the Gensim library, we utilized a latent Dirichlet allocation (LDA) model estimation algorithm to perform topic modeling. Using a corpus based on the trigrams derived from our dataset, we trained LDA models comprising topic numbers from 1 to 40. With Umass coherence scores used to quantitatively determine the optimal number of topics, we ultimately chose a model with 5 topics. Combining the top 20 keywords for each topic with the respective tweets, we utilized OpenAI’s ChatGPT-419 large language model (LLM) to determine appropriate descriptions for each of the 5 topics using the following prompt: “Using the uploaded csv file, can you please give a description of the 5 topics (0-4) using the given keywords, tweets, and percentage of contribution to each topic?”
Sentiment and emotion analysis
In addition to topic modeling, the sentiment and emotion of the tweets were analyzed to provide additional insight into the public opinions about HPAI. The sentiment of each tweet was determined using the SentiStrength library.Reference Thelwall, Buckley, Paltoglou, Cai and Kappas20,Reference Hung21 With values ranging from −4 for extremely negative sentiments to +4 for extremely positive sentiments, the SentiStrength library is optimized to perform sentiment analysis on short, informal text such as those in tweets and other forms of social media. To determine the emotion present in the tweets, we used the Text2Emotion library,Reference Aman Gupta, Sharma and Bilakhiya22 which uses natural language processing techniques to determine words in the text that express emotion. The text was then categorized, based on probability scores, into 5 emotions: happy, angry, surprise, sadness, and fear.
User demographics
Although demographics such as age, sex, race, and ethnicity are generally not readily available from tweets, these data can be inferred from user profiles, usernames, profiles, and user images. Using the machine learning-based M3-Inference library,Reference Wang, Hale and Adelani23,24 we determined a user’s age range (≤ 18, 19–29, 30–39, ≥ 40), likely binary gender (female or male), and whether a user was “an organization,” meaning the user is likely not an individual person, but rather a corporate entity such as a news organization or company. A user’s ethnicity (Hispanic, non-Hispanic White, non-Hispanic Black, or Asian) was inferred using the Ethnicolr25 library. This library uses the user’s first and last name to help determine an accurate ethnicity based on the state of Florida’s voting registration data and US census data.
Ethical approval
All data included in this study were publicly available and therefore patient consent or approval from an institutional review board were not required.
Results
User demographics
During our study period, we collected 14,796 English-language tweets. After preprocessing and the removal of duplicate tweets, we used a final dataset for analyses of 13,319 tweets from 10,421 unique X users. Of the users, 8,150 (77.8%) were individual users and not an organization. Of the users identified as individuals, 5,725 (70.6%) were identified as male, 2,966 (36.6%) were 18 years old or younger, and 2,189 (27%) were 40 years old or older. We were able to identify race in only 84% of individual users (6,846) using the Ethnicolr library, and of those, 78% (5,341) were labeled as non-Hispanic White (Table 1).
a We were able to identify race in only 6,846 individual users using the Ethnicolr library.
Sentiment and emotional analyses
The majority of tweets in our dataset were negative in sentiment. Out of the −4 to +4 scale, 50.4% of the tweets had sentiments less than 0, and the entire data set had a mean sentiment score of −0.68 (1.18). Neutral tweets, those with sentiment scores of 0, accounted for 36.0% of the tweets, while positive tweets (tweets with sentiments greater than 0) only accounted for 13.6% of the dataset (Figure 1).
Of the 5 emotions that could be identified by the Text2Emotion library: happiness, anger, surprise, sadness, and fear, most of the emotions identified in the tweets were negative. Anger (36.4%), fear (29.5%), and sadness (18.5%) comprised most of the tweets exhibited by language such as “F[censored] YOUR BIRD FLU PROPAGANDA,” “Biden et al. are allowing deadly H5N1 to spread unabated! H5N1 has a 50%–60% fatality rate in people-this is catastrophic!,” and “I’m very nervous about H5N1 (and now H5N2) and I’m having the same bad feeling I had before the lockdowns…” In contrast, surprise and happiness made up 9.6% and 5.9%, respectively (Figure 2) as shown by tweets such as “Grateful for health officials in Mexico for transparency,” “Really nice map showing where H5N1 has been detect[ed] in mammals in the U.S.,” and “San Francisco leads with advanced #H5N1 surveillance in wastewater. Commendable efforts!”
Topic modeling
Utilizing the LDA algorithm and coherence measures, we chose a model with 5 distinct topics (Table 2) to describe the overall themes shown in our data set. Included in these topics was the largest topic containing 3,290 (24.7%) tweets discussing issues with infection controls, spread, and infection reporting as shown by key terms including “confirm,” “infect,” and “spread” and tweets such as “US government enhances protective measures against H5N1 virus in dairy cattle @USDA…” One topic of note (2,326 tweets, 17.5%), entitled “Concerns about Health-Related Misinformation and the Need for Accurate Information,” included key terms such as “scam,” “hoax,” “plandemic,” and “election.” These tweets demonstrated dramatic reactions to the current avian influenza outbreak such as “I wonder when there will be dancing nurses for the upcoming Bird Flu hoax? DO NOT COMPLY” and “BIRD FLU IS NOT A THING! THEY JUST WANT TO TAKE AWAY YOUR FOOD, YOUR FARMS, YOUR STABILITY, YOUR JOBS, YOUR WAY OF LIFE, YOUR FREEDOM!” Other topics in the data set include “personal reactions to a possible avian influenza crisis,” “public sentiment and readiness towards avian influenza,” and “the public’s responses to health measures and policies by health authorities.”
Discussion
As social media becomes engrained into everyday life, it has become a source of news and information for many. An estimated 50% of US adults receive their news from social media at least some of the time,Reference Liedke and Wang26 the distribution of mis- and dis-information can have devastating effects. Anti-vaccination movements, pseudoscience-based treatments, and inferior infection control efforts for infectious diseases such as COVID-19 and human papillomavirus (HPV) have spread on social media in recent years.Reference Naeem, Bhatti and Khan27–Reference Gisondi, Barber and Faust29
In conjunction to the spread of health misinformation, the use of rhetoric such as “DO NOT COMPLY,” “lock us down again,”, the “weaponization” of HPAI, and that the HPAI outbreak is a “hoax”, “plandemic,” or “scamdemic” may be fueled by the upcoming US presidential election. As was shown in the emotion analysis of the tweets, 65.9% of the tweets had evidence of anger or fear. The emotional language used in these tweets provides fodder to further spread the mis- and dis-information using negative tones and expressive language reminiscent of political propaganda. The use of this language aims to discredit information presented by clinicians, public health organizations, and government officials, who hope to protect the public from further harm.
However, social media does have positive effects. During the COVID-19 pandemic, the use of social media was helpful to spread public health information and dispel misinformation.Reference Medford, Saleh, Sumarsono, Perl and Lehmann2 Additionally, one study demonstrated the use of social media to improve negative opinions on vaccinations for HPV stemming from mis- and dis-information.Reference Kim, Schiffelbein, Imset and Olson30 Our study did find evidence of the use of the X platform to spread important medical notifications on HPAI infections, news about where the infections are taking place, and measures being taken by health officials to prevent the spread of the disease. This is evident in tweets such as “While the risk of avian influenza infecting #dairy cattle in IL remains low, State Veterinarian Mark Ernst emphasizes the importance of vigilance and biosecurity among farmers,” “USDA: H5N1 now in 80 dairy herds across 9 states,” and “Concerning to see such a large increase in the number of herds infected with #H5N1 #birdflu Re-emphasizing the importance of testing in animals and humans as well as implementing protective measures.” The use of social media, such as X, allows for public health organizations on both global and national levels such as the US Centers for Disease Control and Prevention and the WHO as well as more local organizations such as local health departments and even individual clinician groups to provide the public with fact based, scientifically sound infection prevention and vaccination information in case of further spread of the avian influenza virus.
Our study has several limitations that could limit its ability to be generalized. The first of which is that we collected only English-language tweets because the natural language processing libraries used are trained specifically for English. With the news of the first death associated with the H5N2 variant of HPAI occurring in Mexico, a country in which many speak Spanish, our ability to gain a complete picture of the public’s reaction to this event is limited. Similarly, we did not limit our search to tweets only from Mexico. Location data for tweets are generally poorly collected, with precise locations being turned off in user accounts by default resulting in approximately 1.0% of tweets having a precise geotag and only 30%–40% of tweets having some location information presented in the user profile, according to X documentation.31
Additionally, the Ethnicolr library used to identify the race and ethnicities of the X users in the data set was only able to identify the race of 6,846 users, resulting in probable undercounting. This can be caused by a lack of discernable first and last names in the X user’s profile. In 2022, ownership of Twitter changed, and the company was transformed into X. With that, changes in policies and in the number of X users could have affected how representative our data set was to represent the general public. According to recent research by the Pew Research Center, after the transition to X, 20% of US adults on the platform create approximately 98% of all tweets,Reference Chapekis and Smith32 indicating that only a small portion of the public opinion is represented by this platform.
Through the analyses of over 13,000 tweets discussing the HPAI outbreak in the spring of 2024, we were able to show that a large portion of the tweets represented feelings of anger and fear or included the dissemination of mis- and dis-information about a “plandemic” and urged others “Do Not Comply” with the information presented by government officials about the outbreak. A bright spot in the analyses did show that information important to public health was also presented on X via tweets indicating locations of outbreaks among dairy and poultry farms and reminders urging workers to use preventative measures when working near affected livestock. Future monitoring of social media posts regarding outbreaks of infectious diseases will further deepen our knowledge of how the public will and does respond to public health emergencies.
Acknowledgments
All authors contributed to the design, review, and editing of the manuscript.
Financial Support
This study was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health (UL1 TR003163) (C.U.L.).
Competing Interests
All authors report no conflicts of interest relevant to this article.
Research Transparency and Reproducibility
Due to restrictions from X Corp and the X Developers license, the availability of our data set is limited. Tweet IDs/user IDs are available from the authors with written permission from X Corp.