4.1 Introduction
Issues that need to be considered when planning to conduct a linguistic study include the ethical implications of collecting and analysing data, the institutional requirements placed on the project and any legal requirements relevant to the proposed work. When the study involves a corpus, the aim of collecting, analysing and quoting from large amounts of data can give rise to difficult questions. Do we need to gain permission from every person involved in creating each text in a corpus? Should we anonymise all references to names and identities across millions of words of text? Is there any point in making a corpus text anonymous when a version of it may exist online which can easily be retrieved with a search tool?
In this chapter we begin with an overview of the ethical aspects of creating corpora from different kinds of data sources relevant to health(care) and then focus on two studies that involved different approaches to addressing ethical issues.
Broadly speaking, ethics in the context of research involves ensuring that no harm is caused to people when conducting a study. In healthcare research, harm does not just include negative physical consequences, as may be caused, for example, when testing a new drug. It may also include potential psychological and social harm (Demjén et al., Reference Demjén, Atkins and Semino2023: 37).
For example, reporting or quoting from interviews or online forum posts can potentially lead to a loss of privacy or anonymity for the people involved, which can in turn cause distress, damage social relationships and reduce people’s willingness to engage with healthcare services.
In healthcare research, three main considerations, known as the ‘Belmont Principles’, are widely applied to protect the people involved in the research (Beauchamp and Childress, Reference Beauchamp and Childress2001): the first is respect for persons, the second is beneficence and the third is justice. Respect for persons involves treating people as autonomous agents and protecting anybody who may have reduced autonomy (e.g., children or prisoners). Beneficence involves avoiding harm to people and, as much as possible, maximising their well-being. Justice involves treating people equitably and making sure that the research benefits the groups who most need or are entitled to be benefited.
Different types of data and analyses in corpus-based research on health and illness pose different degrees and kinds of ethical concerns. This will affect whether you need to apply to a relevant research ethics committee for approval for your study, and the complexity of the process involved (for a more detailed overview, see Demjén et al., Reference Demjén, Atkins and Semino2023: 38ff.).
Minimal ethical issues are normally posed by material that is in the public domain and that was produced for wide audiences by people working in a professional capacity. This includes news reports, policy documents, websites of health-related organisations and so on. Historical materials, such as the nineteenth-century anti-vaccination literature described in Chapter 3, also fall into this category. While ethics approval from a relevant institutional committee may still be required to collect and analyse the data, the risk of any harm arising from the research is very low.
In contrast, major ethical issues, in contrast, are posed by material collected from private and confidential communicative settings involving individuals and their personal lives. This includes, for example, letters from consultants to patients and any recording of interactions between patients and healthcare professionals. In such cases the risks to patients in terms of loss of privacy and anonymity are very high, and obtaining ethics approval from the relevant bodies is likely to be a lengthy and complex process. As interactions from healthcare settings are also often time-consuming to collect, corpus-based studies of such material are rare. An exception is the study of a large corpus of transcripts of Emergency Department interactions discussed in Chapter 3.
User-generated content from social media platforms and online forums tends to fall somewhere in the middle with regard to ethical issues, depending on a variety of factors. In the next section, we consider health-related online forums specifically and explain how access and ethics approval were obtained for the purposes of a corpus-based study of online interactions about the experience of pain.
4.2 Ethical Considerations When Studying Online Forums: Research on a Forum Dedicated to Pain
Health-related online forums can play an important part in people’s experience of illness, as they make it possible to discuss one’s condition and concerns with others who share similar experiences while remaining anonymous. Anonymity in online interactions is associated with what Suler (Reference Suler2004) calls a ‘disinhibition effect’ and can be exploited for the purposes of spreading disinformation or creating conflict (Graham and Hardaker, Reference Graham, Hardaker, Culpeper, Haugh and Kádár2017). However, anonymous online environments also make it possible for people to (i) share experiences that are too private or harrowing to discuss with family and friends; (ii) give and receive honest and disinterested information, advice and support; and (iii) create valuable bonds that improve people’s quality of life and make it easier to cope with the difficulties associated with illness (Eysenbach et al., Reference Eysenbach, Powell, Englesakis, Rizo and Stern2004; Smith-Merry et al., Reference Smith-Merry, Goggin, Campbell, McKenzie, Ridout and Baylosis2019; Yip, Reference Yip2020).
Interactions on online forums can be studied from a variety of disciplinary perspectives and by means of different qualitative and quantitative methods. Among these, corpus methods are particularly relevant, as online forums are typically are typically sources of large volumes of machine readable text. Analysing those texts systematically usually requires the ability to handle millions of words of data. On the other hand, the collection of large quantities of data from online forums makes it difficult, or even potentially disruptive to the forum itself, to attempt to obtain consent from individual contributors to collect forum posts for inclusion in a corpus (Hunt and Brookes, Reference Hunt and Brookes2020: 70ff.). What are the implications of this for planning corpus-based studies of online forums?
4.2.1 Online Forums and Ethics
In order to decide whether and how to conduct a corpus study of online forum data, it is necessary to consider several interacting factors:
Access: does the forum require a members’ login to read contributions and to post contributions?
Terms and conditions of use: do participants give consent to their contributions being used for research purposes when they register on the forum and, if so, what is the nature of that consent?
Perceptions of privacy: is there evidence that participants perceive their interactions to be private?
Anonymity: to what extent and how will data collection, analysis and dissemination of results ensure the anonymity of forum contributors?
Risk of harm: to what extent could participants be harmed if their identity were to be (even partially) revealed as a result of the research?
Potential benefit: to what extent and how could participants or other related groups benefit from the research?
In addition to the academic literature (e.g., Mackenzie, Reference Mackenzie2017; Demjén et al., Reference Demjén, Atkins and Semino2023: 38ff.), guidance on these ethical issues is provided and regularly updated by, for example, the Association of Internet Researchers (Franzke et al., Reference Franzke, Bechmann, Zimmer and Ess2020) and the British Association for Applied Linguistics (BAAL, 2021).
In the rest of this section, we present one solution to the ethical issues involved in analysing online forums dedicated to health: obtaining posts written by people who had previously consented for their contributions to the forum to be used for research purposes. Our example involves an online forum dedicated to pain, but the same considerations would apply regardless of the health condition involved.
4.2.2 The Pain Concern Forum
The online forum we consider in this section is called the Pain Concern forum (Collins and Semino, 2024). It is run by the UK-based charity Pain Concern and is hosted by private company HealthUnlocked. Pain Concern advocates for better services and a better understanding of the needs of people living with pain. As part of its activities, it runs a helpline and produces leaflets, a magazine and an internet radio programme. The corpus analysed by Collins and Semino consists of contributions posted on the forum between May 2012 and October 2020, for a total of 89,741 posts and 8,543,729 tokens (token count as retrieved via CQPweb).
The forum is one of more than 300 health-related online communities hosted on the website of HealthUnlocked, whose stated aim is to ‘transform individual health experiences into support, insight and understanding for others. We do this by enabling people to share personal health experiences and information online using our site (“Our Site”). In turn this provides support, aids self-management, and improves interactions with professionals, with the aim of improving day-to-day health and well-being’ (https://support.healthunlocked.com/article/148-privacy).
With regard to access, posts on HealthUnlocked are only partly public, in that non-members have a restricted view – enough to determine whether they are sufficiently interested to create an account, but not to read all the material posted on any community. Creating an account is free and gives access to all the communities hosted on the site.
As a business, HealthUnlocked operates by selling access to material posted on the site to commercial or research organisations. This is reflected in the terms and conditions for access to the site. When they sign up for an account, users are asked whether they are willing to give permission for their posts to be used for research purposes by HealthUnlocked partners. The preference expressed at first registration can subsequently be changed. The analysis of the Pain Concern online forum was carried out as part of a contractual arrangement between HealthUnlocked and the ESRC Centre for Corpus Approaches to Social Science (CASS) at Lancaster University, which also involved the online forum of the charity Anxiety Support (Collins and Baker, Reference Collins and Baker2023; see Chapters 5, 7, and 9). As part of the contract, the researchers received an anonymised download of contributions from users who had consented to their contributions being used for research purposes at the point when the data was downloaded.
This kind of blanket consent reduces the need to consider the content of contributions for evidence of perceptions of privacy, even though researchers should be mindful of this in the analysis. As for anonymity, users create their own usernames as part of their profiles and can decide whether to share characteristics such as age, gender and ethnicity. The download from the forum that the researchers were provided with by HealthUnlocked associated each user with a numerical ID from which it was not possible to recover original usernames. Where potentially identifying information was accidentally retained in the download, the researchers removed any cases that were quoted in their analyses. In addition, the use of corpus methods further reduces the chances that a forum user’s identity may be recoverable, as the findings of corpus linguistic analyses often involve aggregate information about frequencies of uses of words, phrases, collocational pairs and so on. Because of the settings of the HealthUnlocked site, it is also not possible to use search engines to trace back to the original post any verbatim quotation included in presentations or publications. Given that this corpus was obtained from HealthUnlocked, it cannot be redistributed, meaning that only researchers who paid to access the same data with HealthUnlocked could easily reach past the examples and aggregate data that the CASS team are allowed to publish. Yet even where such access is paid for, the data from HealthUnlocked is pre-anonymised, as noted. While the probability that a user may be identified as a result of research on the forums run by HealthUnlocked is extremely small, it nonetheless needs to be considered. Here the topic of the forum is relevant. Pain is a highly sensitive and potentially distressing experience, but it is not strongly stigmatised. In addition, potentially stigmatising statements (e.g., regarding suicidal ideation) were avoided in the selection of any verbatim quotations as part of the dissemination of the research. Conversely, as with all projects discussed in this book, Collins and Semino (2024) aimed to share relevant findings with Pain Concern and other relevant organisations and individuals, in order to maximise the potential benefits of the study, such as fostering better understanding of the experience of people affected by pain and improving language-based diagnostic questionnaires.
When forum contributors are given the chance to opt in or out of having their data used for research purposes, as in the case of HealthUnlocked forums, a potential limitation of working with the data is that some contributions are missing. As a result, certain ‘conversations’ on the forum, where people respond to one another’s messages, may be incomplete, which makes it difficult to fully take into account the context of a forum post. Moreover, researchers have no way of knowing how many people opted out of having their data included. The findings of research are therefore limited to the posts of people who gave permission for their data to be used, as opposed to every person who posted to the forum, and this needs to be noted accordingly in any subsequent analysis.
Another potential issue with this kind of pre-anonymised data is the lack of full access to user information. Researchers are presented with information about posters’ age, sex, and the country where they lived, for example, on the basis of what they provided when they signed up for an account. In the case of the Pain Concern forum, this information was useful in comparing the language use of different demographic groups, although researchers had no way of ascertaining the accuracy of the data provided about the posters. Additionally, as the analysis progressed, it became clear that some posters had created multiple accounts on the forum, sometimes leaving the forum, deleting their account and then rejoining later on. Because the data was pre-anonymised, it was not possible to identify every case where this occurred, so some posters would have been treated as more than one person, introducing an element of inaccuracy into our analysis. Again, such issues should be noted when reporting corpus findings.
To conclude on a practical note, as indicated, the contractual arrangement that facilitated the handling of ethical issues with regard to the Pain Concern forum required the payment of a fee to HealthUnlocked. Such data is therefore not easily accessible to all researchers, because access involves payment. It is therefore necessary to consider whether such payment is an allowable expense for research-funding applications and if it can be accommodated in the budget, as was the case for the grant that supported the project outlined in this section.
Overall, however, where funding is available, obtaining pre-anonymised forum data from contributors who had consented to their posts being used for research purposes makes it relatively easy to reconcile ethical considerations with the collection of large quantities of user-generated online data. With that said, it should be noted that this is not necessarily cheap. For the work undertaken by the team on two forums, a total of £17,000 was paid for access to forum data. While a good option for data access, the cost of such access probably precludes it as an option for many researchers.
4.3 Ethical Considerations When Working across Contexts and with Partners: Research on Public Discourses of Dementia
The work described previously provides, by way of a case study, insight into the kinds of ethical considerations that typically attend to a study of online health-related support groups. Such ethical considerations have, as we have seen, posed particular challenges for researchers using corpus linguistic techniques who, as part of their deliberations, are required to make judgments regarding, for example, the degree to which such contexts represent public or private settings. However, corpus research on health communication of course addresses a wide range of contexts and genres, including but also going beyond online support groups (see Brookes et al., Reference Brookes, Atkins, Harvey, O’Keeffe and McCarthy2022). As a way of addressing some of the various ethical considerations linked to such contexts and genres, as well as to cases of research programmes that involve the participation of community and industry partners as research co-designers, we now consider the ethics underpinning the ‘Public Discourses of Dementia’ project.
4.3.1 Public Discourses of Dementia
The Public Discourses of Dementia project is a research programme which aims to identify and challenge stigma around dementia through analysis of the linguistic and visual and representations of dementia, and people diagnosed with dementia, across a wide range of contexts of public communication. The contexts under examination include, among others, newspaper articles, social media texts (i.e., tweets/posts on Twitter/X), campaigns produced by charities, and interactions in online support groups. Importantly, the researchers involved in the project have also worked closely with different partner groups, including people with a dementia diagnosis, dementia charities, healthcare professionals, and representatives from the media. This partner engagement began with the design of the project, which took place in conjunction with the researchers and partners, and has continued throughout the research programme. Further work is planned that will allow this collaborative approach to continue beyond the current project. The goal of this partnership is the co-production of sets of guidelines intended to support public communicators in writing about (and otherwise representing) dementia, and people diagnosed with it, in ways that can help to reduce the stigma surrounding the syndrome while promoting genuine awareness of it.
In collaboration with these project partners the academic research team was able to determine the contexts to be studied as part of the research. As discussed earlier in the chapter, some of these contexts, as sites of public communication, posed minimal ethical issues. For example, newspaper articles and public health campaigns produced by dementia charities, as texts that are freely available and designed for public consumption, did not require informed consent to be obtained from the producers of such texts or copyright holders prior to collection for analysis. However, redistributing such data would be a different matter and might require permissions to be sought, as would sharing screenshots of copyright-protected websites or copyright-protected images featured in such texts.
On the other hand, online support groups and social media texts required greater ethical consideration. Regarding the collection of texts from online support groups, the research was, much like the project examining language around pain described previously, based on data supplied by HealthUnlocked, which, as noted, was covered by blanket consent. For the collection of data from Twitter/X, approval needed to be sought from the institutional review board of the researchers’ university and an application also had to be made to the Twitter/X API in order to enable programmatic access to posts on the site. Posts were then only collected from accounts that made their contributions publicly viewable (i.e., which did not require an account to access). In the case of both the HealthUnlocked data and the Twitter/X data, it was decided to remove identifying information from the corpus output (e.g., in concordance lines and extracts), to avoid supporting the easy identification of site users by readers of the academic publications in question. Our advice to researchers would be that while the underlying corpus data should remain inviolate, data presented from the corpus may, with issues of ethics in mind, be edited in a way which still allows the point illustrated to be made but does not allow for potentially sensitive information, irrelevant to that point, to be revealed. Where such editing of corpus data happens, the nature and motivation for it should be noted.
4.3.2 The Ethics of Co-design
Funding agencies and charitable organisations (in the United Kingdom, among other places) have increasingly required academic teams to clearly demonstrate how individuals with direct experience related to the research topic have been, or will be, actively engaged throughout the research process (Manafò et al., Reference Manafò, Petermann, Vandall-Walker and Mason-Lai2018). This involvement may cover various stages, from initial proposal development to more extensive roles in data collection, analysis and the dissemination of findings (Grand et al., Reference Grand, Davies, Holliman and Adams2015). In the context of health-related research, this practice is commonly known as Patient and Public Involvement (PPI). In recent years, as well as being involved in research in the aforementioned ways, it has also become more frequent for these individuals to play essential roles also in the design of research, including being part of academic teams applying for research funding (Swarbrick et al., Reference Swarbrick, Doors, Davis and Keady2019).
While such engagement can bring substantial benefits to the research process (Pizzo et al., Reference Pizzo, Doyle, Matthews and Barlow2015), and indeed to researchers themselves (Biggane et al., Reference Biggane, Olsen and Williamson2019), it does not come without challenges, including those of an ethical nature. Among other things, it is incumbent upon researchers to ensure the following: ‘(1) People with lived experience should feel enabled, not disabled, to take part; (2) Support and facilitation should be provided to meet the needs and abilities of the individual, not the condition; and (3) The relationship between academic researchers and those with lived experience should be based on a collaborative and reciprocal partnership’ (Swarbrick et al., Reference Swarbrick, Doors, Davis and Keady2019: 3166). The benefits of PPI can be maximised, for the research, the researchers and partners (including people with ‘lived experience’) when such involvement begins early in the research cycle (Varkonyi-Sepp et al., Reference Varkonyi-Sepp, Cross and Howarth2017). In the Public Discourses of Dementia project, the researchers were keen to involve partners in the research as soon as possible, beginning, as noted, with ideation and research design.
The researchers on this project therefore had to start by clarifying what was meant by ‘co-design’. They also had to set out which partners would be involved and how. Making these decisions early required initiating partner involvement early. It also, crucially, was important for being able to communicate openly with partners about the nature of what would be required of them. A key part of this communication related to ensuring that their needs and desires regarding their project involvement were both understood and met. Moreover, such decisions were important to settle on early, of course, for the purposes of applying for ethical approval through the institutional ethics review board. The process of making such decisions, and broadly mapping out the nature of partner involvement, was supported by consulting the work of others who have undertaken such work previously, to understand and potentially draw inspiration from how they undertook such engagement in an ethical way (see also Sendra, Reference Sendra2024).
At this point, it was important to consider the particular needs of the partner groups that were to be involved in the work, especially for the individuals with a diagnosis of dementia. Rather than applying a generic approach to PPI, the project team was helpfully able to draw on a framework designed for, and on the basis of, engagement with people with dementia. In particular, they drew upon elements of Swarbrick and co-authors’ (Reference Swarbrick, Doors, Davis and Keady2019) CO-researcher INvolvement and Engagement in Dementia (COINED) model. This model, which itself was co-designed with people with lived experience of dementia, sets out principles which should underpin the ways in which academics and partners (jointly termed ‘co-researchers’ within this model) collaborate for the purposes of designing and piloting materials, collecting data, understanding the findings, sharing the findings, translating findings into practice, evaluating the impacts of the research, and planning and undertaking future work. Furthermore, and importantly, the model is underpinned by ongoing consultation with co-researchers with lived experience. In the case of the Public Discourses of Dementia project, such ongoing consultation takes place through regular contact with a project advisory board comprising all key partner representatives. Where applicable, project funds are made available to board members to cover, for example, any reasonable travel and accommodation costs they might incur in travelling to meet in-person with other project team members. Project funds are also made available to enable ongoing training and support for all co-researchers involved in the research.
Partner engagement in the Public Discourses of Dementia project is something that has occurred throughout the life cycle of the research programme, then, and is planned to continue beyond the end of the project too. As noted, partners actively contributed to the design of the project, including determining the shape of the planned impact programme (i.e., the co-development of communication guidelines). This consultation also helped determine which contexts of public communication would be researched, ensuring that the findings, and resultant guidelines on which these were based, would represent key communicative contexts in terms of the formats that partners engaged with and which might be said to have a significant influence on shaping public attitudes towards dementia. In this way, the development of the advisory board was an iterative process, with partner ambassadors being invited on the recommendations of people with lived experience of dementia (e.g., in particular charities and advocacy groups), as well as in order to reflect the kinds of data to be studied (e.g., with the involvement of a journalist working with the BBC and another with a local newspaper).
In a practical sense, identifying and reaching out to partners may represent a challenge for the academic research team. As noted, some of the partners were approached on the recommendation of other advisory board members. In some of these cases, the partner representatives in question had offered to act as a point of contact and to facilitate initial engagement, which was arguably likely to be more effective than the academic research team simply getting in contact ‘out of the blue’. In other cases, the academic research team were able to make contact on the basis of longer-term relationships with the partners in question, developed in many cases over many years and on the basis of meetings at events such as conferences and through participation in (dementia-themed) research groups. A particular challenge that has been raised in relation to research involving people with a dementia diagnosis concerns ensuring that the individuals in question have the appropriate capacity for consent (Cacchione, Reference Cacchione2011). In this case, care was taken to approach people with lived experience of dementia through existing networks and infrastructure through which such individuals had already regularly participated in research, and indeed who had continued to participate in other research projects right up to and during the Public Discourses of Dementia programme.
The approach taken to engagement and co-design with the project partners in this programme of work was thus intended to ensure that the work was carried out in accordance with the Belmont Principles described at the beginning of this chapter. The project team respected the autonomy of the partners but was also mindful that some of the partners may have reduced autonomy (e.g., regarding issues surrounding capacity for consent in people with a diagnosis of dementia) and took special steps in this case, including by drawing on existing infrastructure at the university to make sure that the research process did not risk becoming exploitative of such individuals. This is also connected to the issue of beneficence – which, as noted, involves avoiding harm to people and maximising their well-being. In fact, this issue is at the very heart of the Public Discourses of Dementia project, which through linguistic analysis and the provision of evidence-based support for public communicators has sought to reduce the harmful stigma that surrounds dementia in public life, and to diversify the discourse and help facilitate more person-centred, life-affirming discourses in the process. Finally, the project design was also underpinned at all of its stages by the principle of justice. This, as mentioned, involves treating people equitably and making sure that the research benefits those most in need or most entitled to benefit. As the process of research co-design importantly involved people with a dementia diagnosis, the research team was able to design the project in a way that meant its aims and benefits were, as noted, responsive to the needs and perspectives of those living with dementia. Moreover, by continuing this engagement throughout the course of the project (and, importantly, beyond it), the team was able to ensure that such needs and perspectives also shaped how the project was carried out, while also confirming that the ‘impact’ activities undertaken at the conclusion of the academic research reached those most in need.
4.4 Conclusion
This chapter has, we hope, underscored the importance of ethical considerations in (corpus) linguistic research on healthcare communication. Ethical considerations are not merely procedural requirements but are fundamental to the integrity and societal value of research. The two case studies discussed in this chapter – the corpus-based study of an online pain forum and the Public Discourses of Dementia project – exemplify, among other things, the practical application of these ethical principles. They illustrate how ethical challenges can be navigated through, for example, careful planning, obtaining informed consent, ensuring anonymity and involving partners in the research process. These studies also emphasise the significance of patient and public involvement in enhancing the relevance and impact of research. Indeed, the importance and benefit of involving partners in the research design process cannot be overstated. Ethics in the social sciences often focuses on not doing harm. However, as we note in this chapter, it is also crucial to weigh this against the potential benefits of carrying out the research. Analysts need to ensure that their research findings can reach those who can benefit from them most, a topic we discuss in Chapter 12, which considers dissemination and impact.
Due to the larger scale necessitated by a corpus analysis, ethical considerations can appear especially daunting, particularly when collecting data related to sensitive health topics. By considering different types of linguistic data, ranging from material in the public domain to highly sensitive private communication, this chapter has also highlighted the contextual sensitivity that is required of such ethical scrutiny. While there is no one-size-fits-all solution to ethics, we hope that the case studies outlined in this chapter have helped provide some ideas about best practices while also pointing readers towards existing ethics guidelines that can be consulted. As an example, the aforementioned Belmont Principles can provide a useful framework for such reflexive ethical decision-making in research.