Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-01-10T22:00:12.779Z Has data issue: false hasContentIssue false

Statistical consulting guidelines for new researchers in psychiatry and mental health – beyond ChatGPT

Published online by Cambridge University Press:  28 November 2024

Rights & Permissions [Opens in a new window]

Summary

Until recently, statistical consultants did not have to worry about being replaced by artificial intelligence. There was no statistical analogue to ‘Dr Google’ before ChatGPT arrived on the scene. Although ChatGPT (most of the time) adequately responds to basic queries such as the assumptions of different statistical tests or summarises relevant manuals on statistical software providing clear instructions with point-and-click software such as SPSS, there are many important aspects of statistical consulting that ChatGPT does not cover. This tutorial article is about these aspects: a summary of what statistical consulting is, its purpose and possible settings during the empirical research cycle, the role and responsibilities of the consultant and the client, how to ensure a good consulting experience, how to prepare for a consulting session, typical questions and more. The article was written for researchers who are considering contacting a statistician for the first time and aims to facilitate a good and fruitful consulting experience for all parties involved.

Type
Research Methods
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
Copyright © The Author(s), 2024. Published by Cambridge University Press on behalf of Royal College of Psychiatrists

LEARNING OBJECTIVES

After reading this article you will be able to:

  • explain the various roles of statistical consultation in the research process

  • schedule moments in the research timeline when to consult a statistician and prepare the right questions to ask in the consultation (initial/progress/results meeting)

  • discuss the connection between design, analysis and results to obtain a valid conclusion.

Although luckily there is no stigma in asking for statistical or methodological advice, statistical consultants often come across new clients who are hesitant and insecure about seeking help. As these negative feelings could be caused by not knowing what to expect from statistical consulting – to our knowledge, there are no publications available for prospective clients of statistical consulting – the purpose of this article is to introduce statistical consulting to a broad audience. Specifically, our aim is to provide guidance on why, when and on what to consult, with a particular focus on junior researchers and clinicians in psychiatry, clinical psychology, psychiatric epidemiology and related fields who want to set up a research project.

In the following sections, we will give a comprehensive overview of different aspects and topics related to statistical consulting, including what statistical consulting is, how to choose a consultant, when to schedule meetings and how to prepare for the first meeting, which administrative, ethical and organisational topics to consider for a mutually satisfactory working relationship, and some typical questions and issues related to statistical consulting.

What is statistical consulting?

A statistical consultant can be described as a person who, usually in exchange for payment, offers specialised advice in the field of statistics to clients who are not working in this field. Statistical issues that clients bring in for consultation can pertain to any of the three stages of the empirical research cycle in which a statistician is typically involved: (1) the problem refining and study design phase, (2) the data analysis phase and (3) the results, interpretation and discussion phase (Fig. 1). In summary, one could think of a statistical consultant as a problem solver (Chatfield Reference Chatfield1995). The core process of statistical consulting can be described as understanding and translating a client's research problem into a statistical model, performing the analysis to obtain a statistical solution and finally translating the statistical solution into a solution that the client understands (Fig. 2).

FIG 1 The empirical research cycle, highlighting the stages at which the statistical consultant typically is involved.

FIG 2 Derr's model of the process of statistical consulting (based on Derr, Reference Derr2000).

What is a good result from the consultation?

Generally speaking, the aim of statistical consulting is to obtain from the consultant a satisfactory solution that is as good as possible given the constraints of the project. This involves two aspects, which may conflict with each other: (a) a satisfactory solution from the client's perspective and (b) a good solution from the consultant's perspective. Therefore, these two aspects should be balanced in most consultations. The optimal the solution should be:

  • timely: a good solution within the given time period is preferred to a very good solution that takes way more time than the deadline of a project; the solution should be tailored to the needs of a client;

  • high quality: a solution of high scientific quality is usually preferred to one of lower scientific quality, but it is important that the statistician is able to explain the solution in the client's language; if the client does not understand the solution sufficiently, it will not be used and the client may not be satisfied;

  • not too complex: in contrast to a complex statistical model, the analyses for a simple model are easier to perform and results might be better generalisable to new data – analyses for a complex model might be more prone to errors, and results may overfit the data at hand: ‘keep it smart and simple’ (KISS) is a principle that is highly recommended for a consultation process; and

  • directional, offering clear next steps for the client: to develop a good solution, the statistician and the client need to have an idea which steps the client will perform after obtaining the solution; for example, based on the prediction results provided by a regression model, the client may want to assign patients to either treatment A or treatment B. If this is the goal of the analyses, the statistician may recommend validation of the results in a new sample before the model is applied to real-world situations.

Roles and responsibilities of statistical consultants

A statistical consultant's responsibilities might range from being a helper, to acting as a coach on statistics, to taking the lead on the statistical issues of a project.

As a ‘pair of hands’, the statistician receives clear instructions from the client on what has to be done; collaboration is usually not required and communication between the client and statistician is restricted during the project (Bisgaard Reference Bisgaard and Bisgaard2005). In such a situation, the tasks are typically simple and may involve executing ‘dirty’ work that the client finds unpleasant (such as constructing a large table with descriptive data or performing numerous cross-tabulations).

As a statistical coach, the consultant takes on a collaborative role: the problems they must address require teamwork; the technical and human aspects of problem-solving and implementation of solutions are frequently given equal weight; the consultant uses their expertise to assist clients in solving problems rather than solving them themselves.

As an expert consultant, the statistician brings special skills to the client's problem: the client or other members of the organisation provide the consultant with data and information; the consultant develops a solution to the problem, analyses the data and formulates a conclusion; at the end of the project, the consultant hands over the ‘deliverable’ (e.g. report or presentation – see ‘The deliverables’ section below).

Some of these roles can be filled by either a human consultant or a chatbot such as ChatGPT and we explore this further in the next section.

From project to project, a statistician's role (be it human or not) can change. It is crucial that the consultant is informed of the client's needs and expectations (Kenett Reference Kenett and Thyregod2006), preferably before the project officially starts. Knowing these roles can help clients to formulate their needs and expectations clearly and communicate them to the statistical consultant as early as possible in the consulting process.

The human consultant versus ChatGPT

Recently, some of our clients have come to our statistical consulting service after admittedly having first tried to solve the problem with ChatGPT. Perhaps not coincidentally, we have also seen a decrease in basic questions from students (such as how to perform a t-test or what the assumptions are of a linear regression). Clearly, ChatGPT is filling a need for (basic) statistical assistance. In Table 1 we compare ChatGPT with a human consultant, summarising what each can do, their strengths and weaknesses, and ethical aspects. The table was a collaboration between us and ChatGPT, and it turned out that both parties came up with many of the same ideas. The prompt given to ChatGPT (https://chatgpt.com) was ‘What is the difference between ChatGPT and a statistical consultant when answering research questions?’. The entire conversation can be found in the Supplementary Material available at https://doi.org/10.1192/bja.2024.61.

TABLE 1 Differences between ChatGPT and a human statistical consultant

a The order of most of the strengths of ChatGPT correspond to the weaknesses of the human consultants and vice versa.

In summary, we can see from Table 1 that for some basic queries and instructions it may be helpful to turn to ChatGPT, but we recommend caution and double-checking the answers obtained. We do not recommend providing ChatGPT with sensitive data or using it to try to solve complex questions. In ChatGPT's words, ‘ChatGPT is a powerful tool for quick information and general guidance, while a statistical consultant provides the expertise, personalised attention, and detailed analysis required for complex and high-stakes research projects. Using both resources strategically can maximise the efficiency and quality of your research’ (see the Supplementary Material available at https://doi.org/10.1192/bja.2024.61).

How to choose a consultant (when a choice can be made)

Questions to ask

In many ways, consulting a statistician is not unlike a first meeting with a therapist/mental health professional. For this reason, when a choice of different consultants is available, there are several angles to consider by asking the following three main questions.

‘Do I have good and open communication with the statistical consultant?’

Unfortunately, although this is one of the most important aspects of successful statistical consultations, it is impossible to know in advance. Nevertheless, good communication is key in statistical consulting and both parties must make an effort to communicate clearly. In statistical consulting courses, students are taught that a consultant should be available, affable and able, in that order (Hand Reference Hand, Adèr and Mellenbergh2008). For example, a less experienced statistician who nevertheless can solve the client's problem and communicates well with the client may be a better choice than an expert who could only meet with the client in 2 months’ time and has difficulties in relaying information on the client's level. However, as with all professions, large individual differences exist between statisticians regarding communication preferences and style.

In case of bad communication with the statistician in a ‘multiple appointments’ setting, there are a few things a client can do. First, informing the statistician at the first meeting (or when making the appointment) about the client's own expertise with methodology, statistics and research in general helps the statistical consultant to set the level of difficulty in explaining statistical terminology. Second, providing feedback after the appointment regarding communication and level of clarity can resolve initial misunderstandings and miscommunication issues. Third, if the problem persists beyond the first meeting, sometimes referral to a different consultant is the best solution. Unless there is a complete breakdown of communication, we recommend discussing this openly with the original statistical consultant. They could recommend a different consultant with relevant expertise and different communication style.

‘Does the statistical consultant have experience with the chosen methodology?’

This is relevant if the research question is already well-defined and it involves non-standard statistical methods. Looking up the statistical consultant's expertise or previous work may be helpful in choosing a consultant. Although a common core of statistical concepts and methods exists, there is such an abundance of statistical methods and software that on occasion a statistical consultant cannot give an immediate answer or refers the client to a different statistician. It is also not uncommon that statisticians consult their colleagues regarding complex methodological questions encountered during consulting. Sometimes, complex questions in consultation sessions pave the path for new methodological research or method, often in collaboration with the client. For example, in Pazmino et al's (Reference Pazmino, Lovik and Boonen2021) study, the clinicians wanted to include patient-reported outcomes in the standard disease activity measure of patients with rheumatoid arthritis. For this an exploratory factor analysis (a dimension reduction method that allows a larger number of variables to be dealt with simultaneously) was needed that could also deal with missing data and multiple measurements from the patients. Such a method did not exist and was developed after changing an ad hoc consultation session into long-term collaboration between the statistician and the rheumatology team. In addition to the clinical paper (Pazmino Reference Pazmino, Lovik and Boonen2021) a methodological paper was published (Lovik Reference Lovik, Nassiri, Verbeke, Wiberg, Culpepper and Janssen2018), and the collaboration resulted in several other conference papers and publications.

‘Does the statistical consultant have experience with research in my field or topic?’

It is not uncommon for statisticians to specialise to some extent. A typical example could be a statistician specialising in cancer research. This could happen because of personal interest, institutional or company requirements or other reasons. Expertise in the client's topic can help with communication and provide better support for the project, since the consultant may be aware of typical research designs, problems and questions related to the field or topic. However, by preparing the meeting well (as described below in ‘Why is it important to consult a statistician before data have been collected?’), the need for knowing the topic can be mitigated.

Additionally, it should be noted that statistics is a broad field with numerous applications, therefore the approach of statisticians may differ depending on their orientation. For example, a biostatistician who works mainly with medical doctors and is trained in methods applicable to randomised controlled trials may consider a different approach compared with a psychometrician who works with social scientists and focuses on survey methodology and latent variable models. These differences influence their (a) vocabulary (for example, outcome or response in medical research is often called a dependent variable in psychology), (b) preferred choice of variable (a continuous variable versus a binary variable based on a specified cut-off), (c) preferred methodological toolbox (psychiatric epidemiologists might prefer stratified regression analyses whereas a psychologist might choose a moderation analysis) and (d) commonly used statistical software (for example, in medicine SAS and Stata are very common whereas psychologists often prefer SPSS). Experienced statisticians usually are at least somewhat familiar with the main methods outside their subfield and will adapt to the client.

The importance of the choice

For one-off consultations these three considerations may be of lesser importance, but for longer projects and settings such as the ‘ideal scenario’ we describe the ‘Multiple appointments’ section below, the choice of the right collaborators can ensure a more efficient and pleasant working experience.

Additionally, for ‘return customers’, that is, those who have had previous experience with consulting, earlier experience with the same consultant could be helpful in making a choice. It should be noted that if the questions relate to the same project, some consulting groups prefer to keep the project with the same consultant. However, if the meeting is about a new project, switching consultants is relatively straightforward.

When to make a first appointment and how to prepare

When to plan the first appointment

We strongly recommend that the client consults the statistician for a first appointment in the design phase of the project, that is, after the research problem has been defined but before data collection begins. The statistician can then be consulted to assist in:

  • refining or reformulating the research question in such a way that it can be statistically answered

  • choosing an appropriate study design, including the number of measurements

  • choosing between several types of measurement instrument

  • calculating the minimum sample size needed.

In addition, the statistician can assist in writing an ethics application, especially the statistical analysis plan.

It is important that the research problem is clear before contacting the statistical consultant. In addition, it is the responsibility of the client to check whether the research problem is innovative and adds to the literature in the field (note that this is not needed for replication studies). If the client has not yet performed a literature search or the problem to be studied is not clear, it is better to wait and spend more time on problem definition.

Why is it important to consult a statistician before data have been collected?

Based on our experience, there are instances when the design or measures utilised in a study make it more challenging or even impossible to provide an answer to a research question. Examples that we have encountered in practice are:

  • in a single-group pre-test/post-test design, it was not possible to assess the efficacy of a treatment

  • in a cross-sectional design measuring different age groups, it was not possible to assess the influence of age over time

  • a randomised controlled trial with 15 persons in each treatment group was underpowered to assess a medium-sized treatment–covariate interaction effect

  • in a within-person design without varying the order of three different treatments, it was impossible to disentangle the effect of the order from the effect of each treatment

  • when a continuous variable (e.g. age) was measured only in categories (e.g. between 8–11 years; 12–15 years; etc.) it was often more difficult to assess the influence of the variable than when age was measured numerically (e.g. in years or days)

  • when measuring a biomarker (such as cortisol) only once a day and at different time points, it was very difficult to assess the effect of the biomarker.

Most frequently, a first appointment is scheduled in the data analysis phase of a study. The statistician advises on the analysis strategy and (optionally) performs the actual data analyses. Sometimes, the statistician advises the client to reformulate the research question or hypotheses, for example because of a mismatch between hypotheses and study design (Box 1: Case A). In addition, a statistician may adapt the analysis strategy to answer the research question (Box 1: Case B).

BOX 1 Reformulating a question or analysis strategy in a 2 × 2 factorial design: case examples

Both cases are based on real situations that I (Elise Dusseldorp) encountered as a consultant in practice. I have changed the research topic, so it is not possible to trace back to the actual clients. The topic described is inspired by Lader & Bond (Reference Lader and Bond1998).

Case A

An early-career researcher came to me with a question about how to perform an analysis with predefined contrasts. She explained that her data were from a randomised controlled experiment with 4 conditions, and she was interested in the difference in the level of anxiety between the conditions. She hypothesised that the average anxiety in conditions 2, 3 and 4 (with some intervention) would be lower than in condition 1 (no intervention); and that condition 4 (with a combined intervention) would result in the lowest level of anxiety. She asked me how she could test this with predefined contrasts. When I asked her what type of interventions the patients received, she explained that it was a pharmacological intervention, a psychological intervention, both or nothing. It turned out she had a perfectly balanced 2 × 2 factorial design, varying the factor ‘pharmacological intervention’ and the factor ‘psychological intervention’. I asked her if she wanted to know the answers to the following questions:

  • Does the pharmacological intervention have an effect on anxiety?

  • Does the psychological intervention have an effect on anxiety?

  • Does the effect of one intervention depend on the effect of the other (in other words, is the combination of both interventions more effective than the single interventions)?

Indeed, those were the very questions she wanted to answer and changed her research questions and hypotheses accordingly. For a visualisation of the change in design, see the top part of Fig. 3.

Case B

An experienced researcher performed a randomised experiment with a 2 × 2 factorial design resulting in 4 conditions (groups). He wanted to know whether (a) the pharmacological intervention (INT A) is effective (condition 2 v. 1), (b) the psychological intervention (INT B) is effective (condition 3 v. 1) and (c) the combination of both interventions (INT A + INT B) is effective in reducing anxiety (condition 4 v. 1). He came to me to check his analysis plan. He proposed performing an analysis of variance (ANOVA) with condition as a factor with 4 levels (Fig. 3, left-hand side: 1-variable setting). He wanted to use post hoc pairwise comparisons to test whether the average anxiety level is different (using the comparisons above).

It took some time to explain to him that this analysis strategy does not answer his question: when the test of condition 4 v. 1 turns out to be statistically significant, he cannot conclude that the combination is better than a single treatment (Fig. 3, right-hand side). I proposed an alternative strategy that matched the design better (Fig. 3, left-hand side): I suggested he perform an ANOVA with 2 factors: one representing ‘pharmacological intervention’ (INT A, yes or no) and one representing ‘psychological intervention’ (INT B, yes or no). In the analyses, the two main effects and their interaction effect are included (i.e. a full factorial design), which allows testing of the hypotheses described above. With the interaction effect, he could test whether a synergistic effect was present: whether the combination had a larger effect than a single intervention would have.

FIG 3 One- and two-variable setting and analytic strategies for each. Cond., condition; INT, intervention; ANOVA, analysis of variance.

In rare instances, the first appointment is made when a review of a manuscript has been received that criticises the analysis strategy. Then, the statistician may advise a different strategy, which may have serious consequences for the study: redoing the analyses, rewriting the results, new interpretation of the results and new discussion. Such situations often can be avoided if a statistician is involved earlier in the project.

How to prepare for the first meeting

Typically, after introductions and small talk to put a potentially nervous client (or consultant) at ease, the first thing statistical consultants usually ask about is the background and aims of the project in broad outline, to grasp the context of the study. We suggest preparing information on the following topics:

  • background: what the project is about in broad outline and without jargon

  • setting: type and setting of the study, to assess whether the aim is to answer a causal question or establish associations

  • research questions and/or hypotheses

  • variables: not just ‘depression’ but which instrument (e.g. Patient Health Questionnaire (PHQ-9)) and variable type (e.g. with the PHQ-9: discrete continuous, using the total score; binary, with a set cut-off score, such as ≥10; or ordinal, using the five categories of symptom severity, from ‘absent’ to ‘severe’), perhaps also some descriptives or visualisations such as graphs of the main variables

  • planned statistical analyses (if any)

  • specific questions for the consultant.

Note that not all these topics are always applicable or even available, and some of them could form the goal of the initial meeting. For example, if the goal of the meeting is to design the study, the variables to be collected may not have been defined yet. If the variables are not yet selected, including how the outcome is measured, the statistical analyses also cannot be specified. Nonetheless, a short summary of the background is always appreciated. Some clients bring graphs and other illustrations, or even a brief presentation with slides to explain their research.

These preparations can greatly enhance the success of a first consultation. In subsequent meetings a short summary to remind the consultant about the project and what was discussed can be quite useful. An overview of what has happened since the last meeting and what the researcher did in between meetings also makes the collaboration run more smoothly.

Multiple appointments

When the client has the opportunity to plan multiple appointments with the statistical consultant, a logical timeline is that these take place during the following three phases of the project:

  1. 1 the design and/or refining the problem phase

  2. 2 the data analysis phase, and

  3. 3 the results, interpretation and discussion phase,

which is in line with the empirical research cycle shown in Fig. 1. Meetings during the three phases can be limited to one appointment only or a series of appointments, depending on the project and the agreement made between the client and the consultant. For example, if the client is new to data analysis, several meetings during the data analysis phase for the consultant to guide the client through the analysis could be convenient. Such arrangements are not uncommon when the research group has either a long-term collaboration with or access to a (staff) statistician within the same institution/company.

An important aspect of planning a research project with multiple appointments with the statistical consultant is the optimal use of the project time span and making most of the time between appointments. For the former, good planning during the study design phase and open discussion of the client's level of expertise and where assistance may be needed are essential. This leads to the latter, i.e. the time between appointments, which could be used to learn statistics (for example, if the client is new to research). To spend this time most usefully, the client and the statistician could devise a strategy together outlining which aspects, methods and software the client could best focus on. For example, if the client is working with online surveys, it is less productive (in the short term) to spend time on learning methods for analysing clinical trials. Another example of using waiting time optimally is to plan the initial data analysis (when inspection of the data occurs) before the consultation during the data analysis phase takes place. This way, data quality issues (such as missing data), data management (such as the creation of new variables) and other relevant problems can be anticipated and discussed more easily. A good starting point and framework for initial data analysis is a paper by Huebner et al (Reference Huebner, Le Cessie and Schmidt2018). This paper guides the reader through the steps of initial data analysis, providing clear explanations and a ready to use toolbox, also explaining principles and ethical issues and pointing out pitfalls for first-time data analysts.

In Fig. 4, we provide a scheme for an ideal scenario with three meetings planned before, during and at the end of the project's life cycle, with the relevant phase as well as the aim and scheduling of each meeting.

FIG 4 The three-meetings scenario.

Beyond the usual purpose, the objective of the initial meeting is also for the consultant and the client to get to know each other, make agreements about the long-term partnership and create a shared vocabulary incorporating elements from both the primary field of the client (e.g. psychiatry) and statistics. For this reason, we recommend having this meeting in person if possible.

Negotiating a satisfactory exchange

One of the most difficult parts in the consultation process is the aligning of expectations (Derr Reference Derr2000). It is important to establish early on the roles and responsibilities in the research project of the client, of the statistical consultant and possibly of other persons from the organisation who will be involved. Issues that need to be discussed concern: the division of tasks, authority and participation in decision-making; the mode and frequency of communication; the deliverables; the timeline of the project (including go/no-go decisions within this timeline); and ownership rights. For each of these issues, we provide some recommendations or questions that need to be clarified and an example.

Division of tasks, authority, participation in decisions

  • Will the consultant perform the analyses? If so, see the next section for which steps to take.

  • Does the client need to be able to perform the analyses? If so, which software will be used?

  • Is the client empowered to make decisions or does the client need to discuss this with people higher up in the organisation?

  • Will the consultant be involved in the writing of a manuscript? If so, which parts?

Mode and frequency of communication

  • Are meetings in person or online? We recommend having at least one meeting in person (see ‘Multiple appointments’ above).

  • Are follow-up questions possible by e-mail?

  • Are regular meetings needed? Planning a series of meetings ahead can be very efficient in a longer project with multiple parties involved.

The deliverables

Examples of deliverables are:

  • a PowerPoint presentation for a conference

  • a manuscript for a peer-reviewed journal

  • supplementary material containing the data analysis code, so that the analyses can be reproduced

  • a read-me file with the input and output data of the analyses

  • an interactive web-based application of a statistical model (e.g. a Shiny app).

The timeline of the project

Ideally, the client has well-defined timelines for all project activities and deliverables, along with leeway in case something goes wrong. Sometimes a larger project is divided in smaller parts, each having a certain time frame. The outcome of one part of the project may be needed for a later part of the project. For example, in one project a statistical consultant was involved in creating a prediction model for job reuptake. The predictors used in the model were created from a newly developed questionnaire. The first part of the project concerned validating the questionnaire using confirmatory factor analysis. The factors resulting from this analysis were used as predictors in the model of the second part of the project. Before starting, the client was uncertain about the reliability and validity of the questionnaire. The client realised that if the outcome of the confirmatory factor analysis was disappointing, it would not be wise to continue with the second part. Therefore, a go/no-go decision was planned after the first part of the project. Only if the outcome of the first part was satisfactory would the consultant continue with the second part.

Ownership rights, acknowledgement and order of authors

Academic institutions ensure that intellectual property is properly organised. Each university has its own legal department and rules for research projects. In the context of inter-university cooperation, ownership rights to data, code, analysis results and final reports are frequently outlined in contracts. If this is lacking, researchers (both clients and consultants) may encounter unexpected situations. For example, a consultant might share the results of a project with others while the client still wants to publish about them.

In case of a manuscript for a peer-reviewed journal, the ownership may be transferred to the journal. However, the order of the authors often reflects their contribution to the project. It is advantageous when the client decides ahead of time what the order of the authors will be, and that the order is based on each author's (anticipated) commitment to the project. It is crucial to distribute authorship fairly. In the hierarchical structure of most universities this may be difficult to achieve. For example, in one project the client (a PhD candidate) will write the paper and be listed first. The promotor expects to be listed last, but the co-promotor wants to be listed second. This leaves the statistical consultant listed third, after the co-promoter, even though the client expects weekly guidance from the consultant. We encourage the current practice of many journals to specify explicitly the contribution of each author. If a consultant has given only incidental advice to the client, a way of rewarding the consultant can be in the paper's acknowledgements section.

A statistical consultant may serve as a ‘quality gatekeeper’. In some situations, the project has already been finished, the paper has been written and the statistician is asked only to briefly review the paper and approve it. To show that a statistician has approved the work, the consultant is also added as an author. Obviously, we do not recommend this practice. There is little room for the consultant to really contribute and even if the consultant sees a flaw in the analysis, there is often no time left to re-analyse the data and rewrite the paper.

How to prepare the data if the statistical consultant will perform the data analysis

Assuming an agreement has been reached between the client and the statistician that the latter will perform some (or all) of the data analysis, there are a few steps that can make the collaboration go more smoothly.

First, before any data are transferred, a clear (and preferably written) agreement is needed about who has access to what data and for what purposes during what time period. It should be noted that in many institutions such a data transfer agreement is mandatory, not optional. However, as such agreements are usually between institutions, if the consultant works for the same institution or company, it may not be compulsory (but recommended). Before agreeing to share the data, the client should make sure that transferring the data is allowed (and legal) and feasible for all involved parties. Note that drafting and signing such an agreement can take time and this time can be used to prepare the data and the relevant documentation for the data transfer.

Once all administration has been completed, data can be transferred. The client should ensure that all unnecessary information is removed. In most cases, data should also be anonymised or at least pseudonymised. Consent or confidentiality forms and ethical approvals could supplement the data, together with a separate document explaining each variable (variable codes, names and labels) and the coding of each value. For example, if ‘sex’ is coded with 0 and 1, the accompanying document should describe which value is used for men and women. Such a document can also provide information about how missing values are coded.

An exception to this setting can occur with biobank and register-based studies, where data are kept in a secure environment in which all data analyses are performed. In such situations, after completing the necessary administrative work, the statistician is granted access to the secure environment to perform the (often predefined) analyses. The advantage of such settings is that ethical and data protection guidelines are followed strictly, data are often already cleaned and of high quality and working in the secure environment requires good preparation of the data analysis.

Some typical topics and questions

Although the reasons for statistical consultations are many and even experienced consultants are regularly confronted with surprising questions and topics, there are some typical questions, topics and reasons for seeking statistical consultation. Many of these are time-dependent, related to increased awareness of certain methodological issues (such as multiple comparison corrections or dealing with missing data) and/or driven by new reporting mandates. For example, several journals now ask authors to explicitly describe how missing data were handled, which has stimulated questions and requests for statistical advice on this topic.

Nevertheless, some questions, topics and objectives are timeless. One such typical reason for ad hoc statistical consultations originates from reviewers’ comments and how to respond to them. These queries occur both after the review for obtaining ethical approval for the study before data are collected and after peer review of the final manuscript. It is not uncommon for reviewers to ask for a specific statistical method to be performed or to point out methodological issues, leaving researchers eagerly looking for advice in a hurry. The questions often pertain to the feasibility of additional analyses, arguments for choosing a specific method and fact-checking or help with responses to the reviewers.

Another client favourite is the problem of ‘how to make the results statistically significant (to be publishable)’. If the data collection and statistical analysis were done correctly, this is often impossible without embarking on questionable research practices or fraud and therefore such conversations rarely end with a happy client. With more and more journals open to publishing null results (Pocock Reference Pocock and Stone2016) and advocating pre-registration (Harrington Reference Harrington, D'Agostino and Gatsonis2019), hopefully this quandary will soon cease to exist. Related to this issue, more and more questions are about writing a statistical analysis plan for an ethics application, a grant proposal or pre-registration.

A selection of typical topics and questions for each study phase is provided in Table 2. In some cases, we give both general (broader) and specific questions related to a topic.

Table 2 Typical topics and questions from statistical consulting sessions by study phase

Troubleshooting – statistical consultation when something has gone wrong

Sometimes, statistical consultants are (only) contacted when something goes wrong. In such situations, the main reason for the consultation is troubleshooting and finding emergency solutions. Depending on the phase in which the root of the problem lies (the refining the research problem and study design phase, the data analysis phase, or the results, interpretation and discussion phase), substantial revision of the project may be needed. For example, if the problem concerns only the interpretation of the results, only a small revision of the results and the discussion section may be needed, whereas if the problem lies in the study design, new analyses may be needed (if the project can be saved at all).

In case a troubleshooting consultation is needed, it is best that all people involved in decision-making are present (see also the section above, on division of tasks).

Conclusion and further reading

Clients hold many misconceptions about statistical consulting (a few of which are listed in Table 3): we hope that this article has dispelled these and informed readers of the process – and value – of involving a statistical consultant in their research projects.

TABLE 3 A few misconceptions about statistical consulting

For those interested in further reading, there are many outstanding introductory statistics books available focusing on applying statistical methods, but most do not discuss statistical consulting and/or collaboration with a statistician. Here we list a few of the resources that elaborate on the topics explored in this article.

  • Culliford (Reference Culliford2022) offers an intermediate level book with a strong focus on – and advocating – collaboration between clinical researchers and statisticians. The book covers some more advanced methods in a non-technical way and has an entire chapter dedicated to a fictional (but realistic) scenario describing the conversations and collaboration between a clinical researcher and a statistician.

  • Concerning typical consulting questions, Allison & Gorman (Reference Allison and Gorman1992) published a nice overview with the most common statistical questions and answers. Although their overview is old, the collected questions are still very relevant, ranging from questions about sample size to the number of predictors in an interaction, and from reporting results to data visualisation.

  • One of the most common consulting topics is sample size calculations. The curious reader might want to look at Chow et al (Reference Chow, Shao and Wang2020) or Julious (Reference Julious2023), both providing a comprehensive overview of many commonly employed methods in an accessible way.

  • A thorough, domain-independent overview of what to expect when consulting a statistician has been published by the American Statistical Association's Section on Statistical Consulting (American Statistical Association 2013). Updated several times since their first publication in 2005, these guidelines address topics covered in our article and also discuss financial terms, confidentiality agreements and ethics.

  • More and more journals have their own statistical reporting guidelines, which also advise on how to report statistical analyses. One example we referred to above is Harrington et al (Reference Harrington, D'Agostino and Gatsonis2019). These might be a good starting point for topics to be discussed with the statistical consultant.

Supplementary material

Supplementary material is available online at https://doi.org/10.1192/bja.2024.61.

Data availability

Data availability is not applicable to this article as no new data were created or analysed in this study.

Acknowledgements

We thank Dr Jing Zhou and a colleague who prefers not to be named for their insightful comments and feedback on an earlier draft of this article. We are also grateful to the three anonymous reviewers for their feedback and suggestions for earlier versions of the article.

Author contributions

Both authors contributed to the conception, design, drafting and revising of the manuscript.

Funding

This research received no specific grant from any funding agency, commercial or not-for-profit sectors.

Declaration of interest

None.

MCQs

Select the single best option for each question stem

  1. 1 In which part of the empirical research cycle is the statistician least likely to be involved?

    1. a Study design

    2. b Data collection

    3. c Hypothesis testing

    4. d Data analysis

    5. e Study design and data collection.

  2. 2 Which of the following topics is unlikely to be discussed in the data analysis phase?

    1. a Assumptions of planned methods not being met

    2. b Outliers and how to deal with them

    3. c Power analysis

    4. d Dealing with missing data

    5. e Multiple testing correction.

  3. 3 Which of the following statements is false concerning a good solution?

    1. a A good solution can be obtained in a timely fashion

    2. b In a good solution, it is clear to both the statistician and the client which steps the client will perform after obtaining the solution

    3. c A good solution should be of high scientific quality but use language that the client understands

    4. d A good solution always uses the newest, state-of-the-art method even if the client does not understand it

    5. e None of the above statements is false.

  4. 4 Which topics should the client and consultant discuss when working in collaboration?

    1. a Only division of tasks, authority and participation in decisions and deliverables

    2. b Only time frame of the project, frequency of communication and deliverables

    3. c Only division of tasks, authority, participation in decisions and deliverables

    4. d Only mode and frequency of communication, deliverables and who owns the end product

    5. e None of the above responses is correct.

  5. 5 Which of the following statements is true?

    1. a Ensuring good communication is primarily the responsibility of the statistical consultant

    2. b Statistical consulting is only for those who have limited experience with statistics

    3. c Ensuring good communication is primarily the responsibility of the client

    4. d The client should never give feedback to the statistical consultant about their communication style

    5. e There is no question that is ‘too basic’: you can ask what a standard deviation is if you need to.

MCQ answers

1 b 2 c 3 d 4 e 5 e

References

Allison, DB, Gorman, BS (1992) Some of the most common questions asked of statisticians: our favorite answers and recommended readings. Genetic, Social & General Psychology Monographs, 119: 2.Google Scholar
American Statistical Association (2013) What to Expect When Consulting a Statistician. ASA (https://community.amstat.org/cnsl/forclients/expect-content). Accessed 30 Nov 2023.Google Scholar
Bisgaard, S, Bisgaard, SE (2005) ENBIS workshop on statistical consulting and change management (ENBIS Annual Conference (Newcastle, 14–16 September 2005). European Network for Business and Industrial Statistics.Google Scholar
Chatfield, C (1995) Problem Solving: A Statistician's Guide. CRC Press.CrossRefGoogle Scholar
Chow, SC, Shao, J, Wang, H, et al (2020) Sample Size Calculations in Clinical Research (3rd edn). Chapman & Hall/CRC Press.Google Scholar
Culliford, D (2022) Applied Statistical Considerations for Clinical Researchers. Springer.CrossRefGoogle Scholar
Derr, JA (2000) Statistical Consulting: A Guide to Effective Communication. Duxbury Press.Google Scholar
Gelman, A (2019) Don't calculate post-hoc power using observed estimate of effect size. Annals of Surgery, 269: e910.CrossRefGoogle ScholarPubMed
Hand, DJ, Adèr, HJ, Mellenbergh, GJ (2008) Advising on Research Methods: A Consultant's Companion. Johannes van Kessel.Google Scholar
Harrington, D, D'Agostino, RB Sr, Gatsonis, C, et al (2019) New guidelines for statistical reporting in the Journal. New England Journal of Medicine, 381: 285–6.CrossRefGoogle ScholarPubMed
Huebner, M, Le Cessie, S, Schmidt, CO, et al (2018) A contemporary conceptual framework for initial data analysis. Observational Studies, 4: 171–92.CrossRefGoogle Scholar
Julious, SA (2023) Sample Sizes for Clinical Trials (2nd edn). Chapman & Hall/CRC Press.CrossRefGoogle Scholar
Kenett, R, Thyregod, P (2006) Aspects of statistical consulting not taught by academia. Statistica Neerlandica, 60: 396411.CrossRefGoogle Scholar
Lader, MH, Bond, AJ (1998) Interaction of pharmacological and psychological treatments of anxiety. British Journal of Psychiatry, 173(suppl 34): 42–8.CrossRefGoogle Scholar
Lovik, A, Nassiri, V, Verbeke, G, et al (2018) Combining factors from different factor analyses based on factor congruence. In Quantitative Psychology: The 82nd Annual Meeting of the Psychometric Society, Zurich, Switzerland, 2017 (eds Wiberg, M, Culpepper, SA, Janssen, R, et al): pp. 211–7. Springer.CrossRefGoogle Scholar
Pazmino, S, Lovik, A, Boonen, A, et al (2021) Does including pain, fatigue, and physical function when assessing patients with early rheumatoid arthritis provide a comprehensive picture of disease burden? Journal of Rheumatology, 48: 174–8.CrossRefGoogle ScholarPubMed
Pocock, SJ, Stone, GW (2016) The primary outcome is positive – is that good enough? New England Journal of Medicine, 375: 971–9.CrossRefGoogle ScholarPubMed
Figure 0

FIG 1 The empirical research cycle, highlighting the stages at which the statistical consultant typically is involved.

Figure 1

FIG 2 Derr's model of the process of statistical consulting (based on Derr, 2000).

Figure 2

TABLE 1 Differences between ChatGPT and a human statistical consultant

Figure 3

FIG 3 One- and two-variable setting and analytic strategies for each. Cond., condition; INT, intervention; ANOVA, analysis of variance.

Figure 4

FIG 4 The three-meetings scenario.

Figure 5

Table 2 Typical topics and questions from statistical consulting sessions by study phase

Figure 6

TABLE 3 A few misconceptions about statistical consulting

Supplementary material: File

Lovik and Dusseldorp supplementary material

Lovik and Dusseldorp supplementary material
Download Lovik and Dusseldorp supplementary material(File)
File 20.4 KB
Submit a response

eLetters

No eLetters have been published for this article.