Hostname: page-component-745bb68f8f-g4j75 Total loading time: 0 Render date: 2025-01-25T21:50:58.614Z Has data issue: false hasContentIssue false

Measurement of long-term outcomes in observational and randomised controlled trials

Published online by Cambridge University Press:  02 January 2018

Richard Hodgson*
Affiliation:
Lyme Brook Centre, Stoke on Trent
Chris Bushe
Affiliation:
Eli Lilly, Lilly House, Basingstoke
Robert Hunter
Affiliation:
University Department of Psychological Medicine, Gartnavel Royal Hospital, Glasgow, UK
*
Dr Richard Hodgson, Lyme Brook Centre, Bradwell Hospital, Talke Road, Stoke-on-Trent, Staffordshire ST5 7TL, UK. Email: richarde.hodgson@northstaffs.nhs.uk
Rights & Permissions [Opens in a new window]

Abstract

Background

Randomised controlled trials (RCTs) are the gold standard for evaluating treatment efficacy. However, the outcomes of RCTs often lackclinical utility and usually do not address real-world effectiveness

Aims

To review how traditional RCTs may be triangulatedwith other methodologies such as observational studies and pragmatic trials by highlighting recently reported studies, outcomes used and their respective merits

Method

Literature review focusing on drug treatment

Results

Recently reported observational and some pragmatic studies show a degree of consistency in reported results and use outcomes that have face validity for clinicians

Conclusions

No single experimental paradigm or outcome provides the necessary data to optimise treatment of mental illness in the clinical setting

Type
Review Articles
Copyright
Copyright © Royal College of Psychiatrists, 2007 

Evaluating treatment outcomes in mental illness presents unique and formidable challenges. The natural course of many psychiatric disorders is cyclical with spontaneous remission a distinct possibility (Reference CiompiCiompi, 1980). Environmental factors are important but poorly understood. Mental illness continues to be characterised in terms of symptoms despite advances in understanding pathogenesis. Currently, most published pharmacotherapy clinical trial data derive from trials performed to prove efficacy and safety to regulatory authorities. Thus clinicians making treatment decisions are commonly presented with a series of randomised controlled trials (RCTs) undertaken to meet regulatory requirements, with outcomes that are neither pragmatic nor easily transferable to clinical practice.

It is assumed that psychiatrists will base their treatment on the best available evidence but what is the best available evidence for a given clinician? Many factors are relevant and include personal experience, the literature, anecdote, opinion leaders, the pharmaceutical industry, guidelines and cost. However, little is known about actual prescribing and other treatment decisions (Reference Hoblyn, Noda and YesavagaHoblyn et al, 2006). Clinicians, purchasers and user advocates are also demanding more pragmatic end-points, and longer trials have shown the utility of relapse rates, hospitalisation and discharge rates as outcome measures (Reference Csernansky, Mahmoud and BrennerCsernansky et al, 2002).

Thus in 2007 ‘best available evidence’ is generally accepted as the RCT, but the available RCT evidence is at best incomplete, and at worst, flawed (Reference BlackBlack, 1996). The aim of this paper is to show practising clinicians the spectrum of quantitative evidence and pragmatic outcomes.

EVOLUTION OF CLINICAL TRIALS

Since the 1940s the RCT has been the principal method of comparing the efficacy of all forms of medical treatment, and the basic concept has been developed and refined to further reduce bias. This has been evident in psychiatry with the development of rating scales and classification systems which enhance reliability, if not always validity. The RCT has informed the development of evidence-based medicine, meta-analysis and the Cochrane Collaboration. Evidence-based medicine resulted in part from the realisation that clinical practice is often poorly informed by the best available evidence, and that many widely used treatments are either untested or have been shown to be ineffective (Reference LenzerLenzer, 2004). Evidence-based medicine has also been seen as a means by which policy makers, sometimes with academic support, control clinical freedom (Reference Williams and GarnerWilliams & Garner, 2002). Although RCTs have resulted in the discontinuation of fashionable but ineffective treatments such as insulin coma therapy (Reference Ackner and OldhamAckner & Oldham, 1960), they are not without problems (Reference Thornley and AdamsThornley & Adams, 1998). More recently other paradigms, including observational and pragmatic studies (Reference Roland and TorgersonRoland & Torgerson, 1998), have gained in acceptance and been recommended as having a useful role in evaluation of treatment by the National Institute for Health and Clinical Excellence (National Institute for Clinical Excellence, 2002).

RANDOMISED CONTROLLED TRIALS

In general an RCT assesses efficacy – whether the treatment works in a controlled environment – not whether it works in the real world (effectiveness) (Table 1). Many factors affect the relationship between efficacy and effectiveness. This is acknowledged in the CONSORT criteria for RCTs by the need to assess the generalisability of the results, although a frame-work for assessing and reporting this is lacking (Reference Bonell, Oakley and HargreavesBonell et al, 2006). Trials have been criticised for not adhering to CONSORT guidelines, but even apparent adherence can lead to challenges (Reference El-Sayeh, Morganti and AdamsEl-Sayeh et al, 2006).

Table 1 Comparison of key features of randomised controlled trials and observational studies

Randomised controlled trial Observational study
Modest numbers of patients Large number of patients
Modest duration Longer duration
High drop-out rate Lower drop-out rate
Statistically significant results Clinically meaningful results
Structured dosing regimen Naturalistically selected dosing
Randomisation Naturalistic treatment selection
Maximises internal validity Maximises external validity
Minimal bias and variability Generalisability
Homogeneous patient population Heterogeneous patient population
Artificial adherence and population Adherence not mandated, ‘real’ patients
Demonstrates efficacy Assesses effectiveness
Excludes confounding treatments Concomitant treatments allowed
Complex applied scales Outcomes used in everyday clinical practice
Outcomes generally symptom focused Outcomes include cost, adherence, resource use

Patient recruitment and selection bias

Whether clinically significant selection bias occurs during recruitment to clinical trials is contentious. Although Burns (Reference Burns2006) reported that the basic demography of patients in a large naturalistic study was similar to that of a widely reported RCT, other authors have noted that the more chaotic patient who is difficult to manage will not be entered into a clinical trial as, even if they consent, they will undoubtedly drop out of follow-up (Reference Lester and WilsonLester & Wilson, 1999; Reference Harrison-Read, Lucas and TyrerHarrison-Read et al, 2002). Trials rarely report the number of patients considered or screened for a trial who are never included. Although this is a CONSORT requirement, clinicians will make prescreening decisions regarding eligibility that are never reported. This is a potential source of bias and might limit extrapolation of results. It is likely that these difficulties are a serious unreported bias in published RCTs for psychological treatments. For example, reviews of the impact of day hospital treatment have failed to take entry criteria into account, leading to potentially erroneous conclusions (Reference Thornicroft and StrathdeeThornicroft & Strathdee, 1994). The need for informed consent might inadvertently affect the generalisability of data from RCTs. All trials of intramuscular olanzapine (Reference Meehan, David and TohenMeehan et al, 2001; Reference Wright, Birkett and DavidWright et al, 2001) were conducted in patients who gave informed consent and, although positive, the results cannot be interpreted as indicating that the drug will be as effective in patients who are highly disturbed.

Although biases are reduced in RCTs they are not eliminated, and indeed specific biases may even be created. Aside from the increased practical difficulties of including older adults in clinical trials, only 4.2% of older patients with major depression meet the increasingly rigorous inclusion and exclusion criteria of phase 3 studies (Reference Yastrubetskaya, Chiu and ConnellYastrubetskaya et al, 1997). Women have sometimes been underrepresented in RCTs primarily because of concerns regarding conception while on trial medication, although this may be changing.

Patients with comorbid disorders are usually excluded from RCTs and this does not allow trials to reflect the rate of substance misuse and physical ill health in people with mental illness (Reference Phelan, Stradins and MorrisonPhelan et al, 2001). Previous exposure to trial medication is often unreported, but McQuade et al (Reference McQuade, Stock and Marcus2004) reported that 25% of patients in this randomised trial had prior exposure to one of the evaluated drugs. Generally, RCTs do not control for previous number of admissions or other markers of ‘difficult to treat’ patients (Reference Hodgson, Belgamwar and Al-tawarahHodgson et al, 2005). This might lead to newer treatments being tried in patients who are more difficult to treat, which may lead to suboptimal results for newer treatments (Reference Davis, Chen and GlickDavis et al, 2003).

Rating scale outcomes

The outcome measures used in RCTs affect the generalisability of the results. Although these outcome measures have been refined over decades to improve reliability, in studies their use may affect the face validity of the results. Clinicians would have difficulties in understanding what a fall of 20% in score on the Positive and Negative Syndromes Scale (PANSS; Reference von Knorring and Lindstromvon Knorring & Lindstrom, 1995) means in clinical practice. Indeed Kane et al (Reference Kane, Honigfeld and Singer1988) suggested this as an outcome only for treatment-resistant patients and a recent analysis (Reference Leucht, Kane and KisslingLeucht et al, 2005) has shown that a drop of 50% in PANSS score may better equate to a Clinical Global Impression Scale (CGI; Reference Haro, Kamath and OchoaHaro et al, 2003) rating of ‘much improved’.

Pragmatic outcomes

Rating scales might not reflect clinical reality and there may be dissonance between rating scale response and a pragmatic clinical end-point such as discharge from hospital (Reference McCue, Waheed and UrcuyoMcCue et al, 2006). Pragmatic research and outcomes focus on whether an intervention works under real-life conditions and whether it works in terms that matter to the patient. However, if broader concepts are used, such as remission, relapse or rehospitalisation, then other problems emerge. Rehospitalisation is easily measured, but in an individual trial may be mediated by other variables such as admission criteria. Remission or response rates might have more clinical utility but have been criticised on the grounds of variability of results if an arbitrary cut-off is used, although sensitivity analysis can be used to assess the effect of changing parameters (Reference Linden, Adams and RobertsLinden et al, 2006; Reference van Os, Drukker and Campovan Os et al, 2006).

Rates of discontinuation of treatment may be a proxy for treatment effectiveness (Reference HodgsonHodgson, 2005; Reference Lieberman, Stroup and McEvoyLieberman et al, 2005; Reference Kinon, Liu-Seifert and AdamsKinon et al, 2006). Kinon et al (Reference Kinon, Liu-Seifert and Adams2006) undertook a meta-analysis of RCTs of atypical antipsychotics using reported discontinuation as an outcome and found far more variability between drugs than might have been anticipated from the head-line results, which usually (marginally) favour the sponsor's product (Reference Heres, Davis and MainoHeres et al, 2006). Further exploration of these pragmatic end-points in long-term studies facilitate a better understanding of the face and predictive validity of rating scales. Any dissonance between comparator drugs using varied end-points might be cause for concern. A recent non-inferiority RCT comparing two atypical antipsychotics at 1 year showed consistency of superiority for one in parameters ranging from PANSS score to discontinuation and hospitalisation rates (www.clinicalstudyresults.org/drugdetails/?drug_name_id=187&sort-c.company_name&page=1&drug_id=509). However, use of outcomes such as hospitalisation might preclude cross-service comparisons. Quality of life has also been used as an outcome but although such measures are laudable, in practice the outcomes are difficult to measure and may not be amenable to change (Reference Boardman, Hodgson and LewisBoardman et al, 1999).

Tolerability

Published RCTs have been criticised for inadequate reporting of side-effects and adverse events (Reference Ioannidis and LauIoannidis & Lau, 2001; Reference Papanikolaou, Churchill and WahlbeckPapanikolaou et al, 2004). The incidence is usually reported but duration and severity are not. These are important variables and may make the difference between persevering with medication or abandoning a therapeutic trial. For data such as prolactin levels RCTs often report mean cohort values rather than pragmatically useful categorical rates (Reference Bushe and ShawBushe & Shaw, 2007).

Study length and drop out

Typically patients in secondary services receive treatment for periods of time that far exceed those of RCTs, which are often as short as 4 weeks. The Schizophrenia Outpatient Health Outcomes (SOHO) study (Reference Haro, Novick and SuarezHaro et al, 2006) demonstrated continued improvement over 3 years. Short RCTs will not assess all tolerability issues and whether improvement is maintained. However, RCTs are getting longer (Reference Lieberman, Phillips and GuLieberman et al, 2003; Reference McQuade, Stock and MarcusMcQuade et al, 2004). The corollary of longer study periods is lower follow-up rates and, paradoxically, high follow-up rates might be an indicator of a biased study population. Drop-out rates over 6 weeks are on average 35% and at 6 months can be around 72% (Reference Leucht, Barnes and KisslingLeucht et al, 2003; Reference McQuade, Stock and MarcusMcQuade et al, 2004), making interpretation of data complex.

Randomised controlled trials are designed to minimise bias and in creating this artificial environment treatment effects may be obviated. Although the true masking of many trials has been debated (Reference MoncrieffMoncrieff, 1997), clinicians cannot intervene in trials in a timely or appropriate manner. Doses and visits are predetermined, as is the ability to respond to potential side-effects. These issues are relevant to the placebo arm, as often placebo group patients are receiving a psychoactive drug such as lorazepam (Reference Meehan, David and TohenMeehan et al, 2001; Reference Wright, Birkett and DavidWright et al, 2001). Randomised controlled trials are often designed to fulfil regulatory requirements to obtain marketing authorisations for a new drug. There will be significant delays between study conception, recruitment, follow-up and publication of results. Clinicians often anticipate this with off-label prescribing (Reference Hodgson and BelgamwarHodgson & Belgamwar, 2006). The reality is that few RCTs are ever undertaken by pharmaceutical companies after launch. This is for many reasons, including the relatively short patent life. Thus, when such RCTs are performed there is often a perceived need for the data to be available quickly. Rarely are these trials long term.

Evolution of the RCT paradigm is seen in the CATIE trial (Reference Lieberman, Stroup and McEvoyLieberman et al, 2005; Table 2). In addition to traditional outcome measures, continuation on an antipsychotic was used as an outcome. Such an outcome should resonate with clinicians as medication is most commonly discontinued owing to lack of effectiveness or side-effects (Reference HodgsonHodgson, 2005). Meta-analysis shows that lack of effectiveness is the major reason for discontinuation and differentiates between atypical antipsychotics in RCTs. In contrast, discontinuation for side-effects is relatively uniform (Reference Kinon, Liu-Seifert and AdamsKinon et al, 2006).

Table 2 Key recent observational and pragmatic studies and randomised controlled trials in schizophrenia

Reference Methodology Study size and follow-up Setting Key outcome measures Key findings Funding source
Hodgson et al (Reference Hodgson, Belgamwar and Al-tawarah2005) Observational 502 patients up to 7 years England Medication discontinuation Lowest discontinuation rate with clozapine, then olanzapine, then risperidone Unrestricted grant from pharmaceutical industry
Haro et al (Reference Haro, Novick and Suarez2006) Observational 10 000 patients for 3 years 10 European countries Medication discontinuation and remission Lowest discontinuation rate and highest remission rate with clozapine, then olanzapine, then risperidone Pharmaceutical industry
Taylor et al (Reference Taylor, Shajahan and Carleton2006) Observational 958 patients for up to 3 years Scotland Duration of treatment Duration of treatment longest with clozapine, then (in rank order) olanzapine, risperidone, amisulpiride and quetiapine Independent
Tiihonen et al (Reference Tiihonen, Walhbeck and Lönnqvist2006) Observational 2230 first-episode patients up to 7 years Finland Discontinuation and hospitalisation rates Lowest relapse with oral medication for clozapine, then (in rank order) olanzapine, thioridazine, perphenazine, risperidone and chlorpromazine Government
Jones et al (Reference Jones, Barnes and Davies2006) RCT 227 for 56 weeks England Quality of life and symptoms No difference between first- and second-generation antipsychotics Government
Lieberman et al (Reference Lieberman, Stroup and McEvoy2005) RCT 1493 patients up to 18 months USA Medication discontinuation Olanzapine most effective. No difference between other study medication Government
McEvoy et al (Reference McEvoy, Perkins and Gu2006) RCT 400 first-episode patients for 1 year USA Duration of treatment No difference between olanzapine, quetiapine and risperidone Pharmaceutical industry
McCue et al (Reference McCue, Waheed and Urcuyo2006) Pragmatic Hospitalised patients for at least 3 weeks USA Hospital discharge and BPRS Haloperidol, olanzapine and risperidone more effective than aripiprazole, quetiapine and ziprasidone Independent

RCT, randomised controlled trial; BPRS, Brief Psychiatric Rating Scale

For the reasons above, RCTs fail to provide the clinician with all the necessary information to prescribe confidently. In order to prescribe a new product the clinician uses previous experience, critical review of early results and the experience of others. In other words the clinician is in effect, albeit informally, undertaking a naturalistic/observational study. The definition of an observational study can be problematic, but in the context of this paper we have identified the key element as a research design where the allocation of treatment is not fully under the control of the researcher (Table 1).

OBSERVATIONAL STUDIES

Limitations

There are notable long-term observational follow-up studies in psychiatry (Reference CiompiCiompi, 1980; Reference HardingHarding, 1988) which illustrate the natural history of schizophrenia over decades. Given this expertise, it is perhaps surprising that there are so few studies looking at treatment effects over the longer term, especially as many potential outcome measures could be collected routinely. Observational studies have design faults that limit their interpretation (Table 1). Most importantly, true randomisation cannot occur in an observational study. However, the strengths of observational studies mirror the weaknesses of RCTs, and it is for this reason that National Institute for Health and Clinical Excellence (NICE) has argued for well-conducted observational studies to demonstrate effectiveness. Observational studies might also represent the only method for studying certain aspects of treatment when masking is not possible or ethical concerns preclude randomisation (Reference Cook and CampbellCook & Campbell, 1979). Indeed, in service evaluation studies randomisation may interfere with the dependent variable and observational studies often exploit service inequalities (Reference Dean, Phillips and GaddDean et al, 1993). Another potential bias in observational studies is rating bias, although the SOHO study has shown high correlations between clinician and patient ratings. With end-points such as hospitalisation, bias is minimised, especially if these data are collected routinely (Reference Hodgson, Lewis and BoardmanHodgson et al, 2001).

Observational studies have been criticised because they are believed to overestimate treatment effects. However, recent comparison between RCTs and observational studies does not support this view (Reference Benson and HartzBenson & Hartz, 2000; Reference Concato, Shah and HorwitzConcato et al, 2000; Reference Kasper, Rosillon and DuchesneKasper et al, 2001). Concato et al (Reference Concato, Shah and Horwitz2000) challenge the accepted hierarchy of clinical designs by reviewing outcomes from various methodologies in a variety of study areas and conclude that observational studies neither over- nor underestimate treatment effects to any significant degree. They opine that observational studies are more likely to produce homogeneous results as they include a broad spectrum of the population at risk. In addition, there is less chance of systematic treatment biases because of the broad treatment population.

Recent observational studies

The CATIE study (Reference Lieberman, Stroup and McEvoyLieberman et al, 2005), an RCT sponsored by the National Institute of Mental Health, compared the outcome of atypical antipsychotics with the typical antipsychotic perphenazine and also incorporated a switching strategy to evaluate clozapine. The results mirror those of Tiihonen et al (Reference Tiihonen, Walhbeck and Lönnqvist2006) in that clozapine and olanzapine were the only oral atypical antipsychotics to demonstrate lower discontinuation rates when compared with oral first-generation and other second-generation antipsychotics. The study reported by Tiihonen et al (Reference Tiihonen, Walhbeck and Lönnqvist2006) is particularly noteworthy as it follows a nationwide cohort of over 2000 people with first-episode schizophrenia for up to 7 years. In addition to showing differences in rehospitalisation and relapse rates between commonly available antipsychotics in Finland, it also shows the effectiveness of medication in reducing suicide and physical morbidity (adjusted relative risk 37.4, 95% CI 5.1–276 and 12.3, 95% CI 6.0–24.1 respectively). The relative therapeutic effects of the drugs studied did not vary whether discontinuation or rehospitalisation was considered, and this is echoed in the SOHO study (Reference Haro, Novick and SuarezHaro et al, 2006). Another long-term study of over 500 patients in England (Reference Hodgson, Belgamwar and Al-tawarahHodgson et al, 2005) demonstrated the same rank order of effectiveness of oral atypicals using medication discontinuation as an outcome. In this study it was apparent that clozapine was being used for a treatment-resistant cohort. Taylor et al (Reference Taylor, Shajahan and Carleton2006) studied duration of treatment as a proxy in a Scottish population over 3 years and reported similar results to Tiihonen et al (Reference Tiihonen, Walhbeck and Lönnqvist2006) and Hodgson et al (Reference Hodgson, Belgamwar and Al-tawarah2005).

McCue et al (Reference McCue, Waheed and Urcuyo2006) in a randomised open-label study of atypical antipsychotics and haloperidol in in-patients using the Brief Psychiatric Rating Scale (BPRS; Reference Overall and GorhamOverall & Gorham, 1962) and time to discharge as outcome measures found similar effectiveness between haloperidol, olanzapine and risperidone and that these drugs were significantly better than aripiprazole and quetiapine. However, there was a dissonance between time to discharge and the BPRS outcomes, which might suggest that rating instruments are not sensitive to important changes that influence management, at least in the short term. Although haloperidol was equal to risperidone and olanzapine it was associated with more extrapyramidal side-effects. Jones et al (Reference Jones, Barnes and Davies2006) failed to detect any differences in effectiveness between first- and second-generation antipsychotics and reported no difference in extrapyramidal-type side-effects, in stark contrast to many other RCTs. A recent RCT of 400 first-episode patients (Reference McEvoy, Perkins and GuMcEvoy et al, 2006) compared olanzapine, quetiapine and risperidone over 1 year and failed to detect a difference in discontinuation rates between these drugs although olanzapine had a significantly greater effect on positive symptoms. Discontinuation was associated with poor response (P<0.001) and poor medication adherence (P=0.02).

In general, RCTs are powered for one primary outcome which does not always reflect primary clinical concern (Reference McQuade, Stock and MarcusMcQuade et al, 2004). As observational studies are larger, there is more scope for legitimate subgroup analysis, such as treatment effect on those with comorbid disorder. The 3-year results of the SOHO study provide insights into social function and factors associated with relapse and remission. These are consonant with other independent studies and increase the face validity of this study. Although the SOHO study demonstrates relatively high switching rates for some medications, 65% of patients achieved remission, which resonates with the results of other long-term studies (Reference CiompiCiompi, 1980; Reference HardingHarding, 1988).

Observational studies and safety

Although often not acknowledged as such, post-marketing surveillance is essentially an observational study, albeit often poorly conducted (Reference Vray, Hamelin and JaillonVray et al, 2005). However, post-marketing surveillance often reports important safety information that was not apparent from RCTs. The association between blood dyscrasias, clozapine and remoxipride are prime examples. In general, RCTs provide useful information on common adverse events, but identifying the relative risk of uncommon adverse events is realistically possible only in observational trials. In this regard, adverse event reporting in observational trials has been shown to enhance safety during the trial and facilitate the role of data monitoring committees and institutional review boards confronted with multiple reports of adverse events (Reference Califf and LeeCaliff & Lee, 2001).

COMMON METHODOLOGICAL ISSUES

Analysis

Both RCTs and observational studies present difficulties in analysis. In RCTs high attrition rates have led to intention-to-treat analyses with a variety of statistical techniques evolving to accommodate these drop outs. These include last-observation-carried-forward (LOCF) analysis and mixed model repeated measures (MMRM); LOCF assumes that data are missing completely at random and that the patient's condition would remain constant; both assumptions are unlikely; MMRM is valid under less restrictive assumptions with use of missing data dependent on other measured factors (Reference Mallinckrodt, Sanger and DubeMallinckrodt et al, 2003).

Randomised controlled trials have highlighted relatively high switching rates between therapies and potentially confounding baseline variation, with lower rates measured in observational studies. Baseline variation can be accommodated in analysis but, as with drop out from RCTs, it cannot be assumed that this variation is random and may reflect clinical practice. For example, in the study reported by Hodgson et al, (Reference Hodgson, Belgamwar and Al-tawarah2005) and the SOHO study (Reference Haro, Novick and SuarezHaro et al, 2006) young men with multiple illness episodes were more likely to receive clozapine.

Switching treatments within an observational study can be studied using marginal structural models (MSM), a new class of causal models that allow for improved adjustment of confounding in longitudinal data analysis in naturalistic settings by consistently estimating the parameters of the inverse-probability-of-treatment weighted estimators (Reference Mortimer, Neugebauer and van der LaanMortimer et al, 2005); MSM are an extension of propensity scoring to longitudinal data. Whereas propensity scoring controls for selection bias by reweighting observations to produce ‘balance’ between groups, MSM do the same but in a longitudinal fashion; MSM allow estimation of the causal effect of treatments in longitudinal naturalistic data when patients switch or stop treatment, even in the presence of missing (at random) data and time-varying confounding variables.

Patient concordance and sample size

In estimating treatment effects both RCTs and observational studies are challenged by patient concordance. Drug levels, which are highly variable for many psychotropics, are not routinely used, with pill counting being a common concordance measure in RCTs. However, poor adherence may underestimate treatment effects. Patient and clinician choice is important in determining outcome (Reference BlackBlack, 1996) and controlling for these variables in RCTs limits the exploration of these factors. Zelen (Reference Zelen1979) has advocated a methodology that has the advantage that, before providing consent, a patient will know whether an experimental treatment is to be used. Further development of patient and clinician preference trials has been described (Reference Korn and BaumrindKorn & Baumrind, 1991; Reference Wennberg, Barry and FowlerWennberg et al, 1993). McCue et al (Reference McCue, Waheed and Urcuyo2006) demonstrate that physician knowledge of a treatment might enhance optimum treatment dosing.

The nature of observational studies allows large sample sizes that add to the power of the study, facilitate subgroup analysis and provide data for robust sample size estimates for RCTs. Although in general appropriate sample sizes are important in RCTs, the superiority of those with large sample sizes over those with smaller samples has been challenged with regard to overestimating treatment effects (Reference Contopoulos-Ioannidis, Gilbody and TrikalinosContopoulos-Ioannidis et al, 2005).

Publication bias and sponsorship

Publication bias might also affect the two methodologies. Given the hierarchy of evidence, journals may be less willing to accept observational studies (Reference BartonBarton, 2000). Journals are less likely to publish negative studies and both methodologies are potentially biased by the study sponsor, with positive results often being associated with the vested interest of the sponsor (Reference Als-Nielsen, Chen and GluudAls-Nielsen et al, 2003). However, a review of atypical antipsychotic trials and funding sources indicates that this is not invariably so (Reference Heres, Davis and MainoHeres et al, 2006). Moreover, government-funded trials cannot be assumed to be unbiased (Reference CoyneCoyne, 2006)

THE WAY FORWARD

The pre-eminence of RCTs and regulatory requirements has led to maintenance of the status quo in clinical drug trial development. Once a drug receives its marketing authorisation then further trial work is often aimed at developing markets rather than ascertaining whether the drug is effective. These concerns are just as relevant to psychotherapy and other non-pharmacological interventions. Making the trials as much like routine practice as possible may help to make RCTs more feasible and enhance external validity (so-called pragmatic trials; Reference HotopfHotopf, 2002). Although pragmatic trials may eschew some features of RCTs, such as double blinding, careful consideration may significantly reduce bias (Reference Schulz, Chalmers and HayesSchulz et al, 1995). Patient recruitment is broad and may not be diagnostically driven (e.g. frequent attendees at a general practitioner surgery or people who self-harm). Outcomes, such as a reduction in suicide or episodes of violence, are clinically significant. Patient preference is an important variable in treatment choice which is negated in a traditional RCT, but patient preference trials have been reported (Reference Ward, King and LloydWard et al, 2000) and may be particularly relevant when masking is not possible. The CATIE study (Reference Lieberman, Stroup and McEvoyLieberman et al, 2005) has many features of a pragmatic trial, such as narrow exclusion criteria and medication discontinuation as an outcome.

Randomised controlled trials and observational studies are not mutually exclusive, and there are examples from other areas of medicine of two designs running in parallel. For example, several studies quoted in Benson & Hartz (Reference Benson and Hartz2000) in coronary artery disease illustrate the merits of enhancing an RCT by the addition of observational data from a concurrent registry of all non-randomised patients in the same centres. This approach improves the quality of observational research, since the same rigorous attention to detail in defining eligible patients, maintaining follow-up and recording outcomes is applied in both the randomised and the observational cohorts. The observational cohort may still suffer from selection bias, but there is a greater likelihood that its causes can be identified. The corollary also applies in that the observational cohort inform on the typicality of the experimental group.

Rapid changes in methodologies without bridging links with older methodologies may preclude legitimate comparison and subsequent meta-analysis. However, advances in the understanding of the biological and psychological mechanisms of mental illness will also dictate the evolution of relevant end-points. This is typified by the increasing interest in cognitive outcomes (Reference Stroup, McEvoy and SwartzStroup et al, 2003) for which NICE recommends audits and provides standardised templates. This is another potential for supplementing treatment information and should facilitate the collection of data pools that inform treatment practice. The introduction of new treatment presents the possibility of mirror image studies (Reference Hodgson, Carr and WealleansHodgson et al, 2002) that allow some measure of utility, although regression towards the mean precludes overinterpretation of the results.

CONCLUSIONS

The RCT has served medicine well but evaluation of treatment needs reviewing for the 21st century. Outcomes need to be more clinically relevant and comparable with those from other trial methodologies. Biases in recruitment need to be addressed and post-marketing surveillance needs a more robust approach, as does monitoring of fidelity to treatment or service delivery models. In part this could be achieved with naturalistic studies, audits and mirror image studies. Without such additional information, treatments cannot be tailored effectively to the patient. Dogma should not be allowed to drive the experimental paradigm agenda as no current research design provides comprehensive clinical information.

Acknowledgements

R. Hunter has received funding from NHS Quality Improvement Scotland and the Chief Scientist Office, Edinburgh.

Footnotes

Declaration of interest

R. H. and R. H. have received funding from several pharmaceutical companies. C. B. is an employee of Eli Lilly UK. Funding detailed in Acknowledgements.

References

Ackner, B. & Oldham, J. A. (1960) Insulin treatment of schizophrenia: a controlled study. Lancet, i, 711.Google Scholar
Als-Nielsen, B., Chen, W., Gluud, C., et al (2003) Association of funding and conclusions in randomized drugtrials: a reflection of treatmenteffect or adverse events? JAMA, 290, 921928.Google Scholar
Barton, S. (2000) Which clinical studies provide the best evidence? The best RCTstill trumps the best observational study. BMJ, 321, 255256.CrossRefGoogle ScholarPubMed
Benson, K. & Hartz, A. J. (2000) A comparison of observational studies and randomized, controlled trials. New England Journal of Medicine, 342, 18781886 Google Scholar
Black, N. (1996) Why we need observational studies to evaluate the effectiveness of health care. BMJ, 312, 12151218 Google Scholar
Boardman, A. P., Hodgson, R. E., Lewis, M., et al (1999) The North Staffordshire Community Beds Study: longitudinal evaluation of psychiatric in-patient units attached tocommunitymental health centres. I. Methods, outcome and patient satisfaction. British Journal of Psychiatry, 175, 7078.Google Scholar
Bonell, C., Oakley, A. & Hargreaves, J. (2006) Assessment of generalisability in trials of health interventions: suggested framework and systematic review. BMJ, 333, 346349 Google Scholar
Burns, T. (2006) NICE guidance in schizophrenia: how generalisable are drug trials? Psychiatric Bulletin, 30, 210212.Google Scholar
Bushe, C. J. & Shaw, M. (2007) Prevalence of hyperprolactinaemiain a naturalistic cohort of schizophrenia and bipolar outpatients during treatment with typical and atypical antipsychotics. Journal of Psychopharmacology, in press. doi: 10.1177/0269881107078281.CrossRefGoogle Scholar
Califf, R. M. & Lee, K. L. (2001) Data and safety monitoring committees: philosophy and practice. American Heart Journal, 141, 154155.Google Scholar
Ciompi, L. (1980) The natural history of schizophrenia in the long term. British Journal of Psychiatry, 136, 413420.Google Scholar
Concato, J., Shah, N., & Horwitz, R. I. (2000) Randomized, controlled trials, observational studies, and the hierarchy of research designs. New England Journal of Medicine, 342, 18871892.Google Scholar
Contopoulos-Ioannidis, D. G., Gilbody, S. M., Trikalinos, T. A., et al (2005) Comparison of large versus smaller randomized trials for mental health-related interventions American Journal of Psychiatry, 162, 578584.CrossRefGoogle ScholarPubMed
Cook, T. D. & Campbell, D. T. (1979) Quasi-Experimentation: Design and Analysis for Field Settings. Houghton Mifflin.Google Scholar
Coyne, J. C. (2006) Cochrane reviews v industry supported meta-analyses. We should read all reviews with caution. BMJ, 333, 916.Google Scholar
Csernansky, J. G., Mahmoud, R. & Brenner, R. (2002) A comparison of risperidone and haloperidol for the prevention of relapse in patients with schizophrenia. New England Journal of Medicine, 346, 1622.CrossRefGoogle ScholarPubMed
Davis, J. M., Chen, N., & Glick, I. D. (2003) A meta analysis of the efficacy of second generation antipsychotics. Archives of General Psychiatry, 60, 553564 Google Scholar
Dean, C., Phillips, J., Gadd, E. M., et al (1993) A comprehensive community based service for people with acutesevere episodes of illness. BMJ, 307, 473476.Google Scholar
El-Sayeh, H. G., Morganti, C. & Adams, C. E. (2006) Aripiprazole for schizophrenia. Systematic review. British Journal of Psychiatry, 189, 102108.Google Scholar
Harding, C. M. (1988) Coursetype in schizophrenia: an analysis of European and American studies. Schizophrenia Bulletin, 14, 633643.Google Scholar
Haro, J. M., Kamath, S. A., Ochoa, S., et al (2003) The Clinical Global Impression-Schizophrenia scale: a simple instrument to measure the diversity of symptoms present in schizophrenia. Acta Psychiatrica Scandinavica, 107, 1623.CrossRefGoogle Scholar
Haro, J. M., Novick, D. & Suarez, D. (2006) Remission and relapsein the outpatientcare of schizophrenia. Journal of Clinical Psychopharmacology, 26, 571578.Google Scholar
Harrison-Read, P., Lucas, B., Tyrer, P., et al (2002) Heavy users of acute psychiatric beds: randomized controlled trial of enhanced community management in an outer London borough. Psychological Medicine, 32, 403416.Google Scholar
Heres, S., Davis, J., Maino, K., et al (2006) Why olanzapine beats risperidone, risperidone beats quetiapine, and quetiapine beats olanzapine: an exploratory analysis of head-to-head comparison studies of second-generation antipsychotics. American Journal of Psychiatry, 163, 185194.Google Scholar
Hoblyn, J., Noda, A., Yesavaga, J. A., et al (2006) Factors in choosing atypical antipsychotics: toward understanding the bases of physicians' prescribing decisions. Journal of Psychiatric Research, 40, 160166.CrossRefGoogle ScholarPubMed
Hodgson, R. E. (2005) Long term outcomes and atypical antipsychotic discontinuation rates. Do different methodologies give similar results? European Neuropsychopharmacology, 15 (suppl. 3), 467.Google Scholar
Hodgson, R. E. & Belgamwar, R. (2006) Off-label prescribing by psychiatrists. Psychiatric Bulletin, 30, 5557.CrossRefGoogle Scholar
Hodgson, R. E., Lewis, M. & Boardman, A. P. (2001) Theprediction of readmission toacutepsychiatric wards. Social Psychiatry and Psychiatric Epidemiology, 36, 304309.Google Scholar
Hodgson, R. E., Carr, D. & Wealleans, L. (2002) Brunswick House: aweekend crisis housein North Staffordshire. Psychiatric Bulletin, 26, 453455.Google Scholar
Hodgson, R. E., Belgamwar, R., Al-tawarah, Y., et al (2005) The use of atypical antipsychotics in the treatment of schizophrenia in North Staffordshire. Human Psychopharmacology: Clinical and Experimental, 20, 141147.Google Scholar
Hotopf, M. (2002) The pragmatic randomised controlled trial. Advances in Psychiatric Treatment, 8, 326333.Google Scholar
Ioannidis, J. P. & Lau, J. (2001) Completeness of safety reportingin randomizedtrials: an evaluation of 7 medical areas. JAMA, 285, 437443.CrossRefGoogle Scholar
Jones, P. B., Barnes, T. R. E., Davies, L., et al (2006) Randomized controlled trial of the effecton quality of life of second-vs first-generation antipsychotic drugs in schizophrenia: costutility of the latest antipsychotic drugsin schizophrenia study (CUtLASS1). Archives of General Psychiatry, 63, 10791087.Google Scholar
Kane, J. M., Honigfeld, G. & Singer, J. (1988) Clozapine in the treatment-resistant schizophrenic: a double-blind comparison with chlorpromazine. Archives of General Psychiatry, 45, 789796.Google Scholar
Kasper, S., Rosillon, D. & Duchesne, I. (2001) Risperidone Olanzapine Drug Outcomes studiesin Schizophrenia (RODOS): efficacy and tolerability results of an international naturalistic study. International Clinical Psychopharmacology, 16, 179187.CrossRefGoogle Scholar
Kinon, B. K., Liu-Seifert, H., Adams, D. H., et al (2006) Differential rates of treatment discontinuation as a measure of treatment effectiveness for olanzapine and comparator antipsychotics for schizophrenia. Journal of Clinical Psychopharmaclogy, 26, 632637.CrossRefGoogle ScholarPubMed
Korn, E. L. & Baumrind, S. (1991) Randomised clinical trials with clinician-preferred treatment. Lancet, 337, 149152.CrossRefGoogle ScholarPubMed
Lenzer, J. (2004) Pfizer pleads guilty, butdrug sales continue to soar. BMJ, 328, 1217.Google Scholar
Lester, H. & Wilson, S. (1999) Practical problems in recruiting patients with schizophrenia into randomised controlled trials. BMJ, 318, 1075.Google Scholar
Leucht, S., Barnes, T. R. E., Kissling, W., et al (2003) Relapse prevention in schizophrenia with new-generation antipsychotics: a systematic review and exploratory meta-analysis of randomized, controlled trials. American Journal of Psychiatry, 160, 12091222.Google Scholar
Leucht, S., Kane, J. M., Kissling, W., et al (2005) What does the PANSS mean? Schizophrenia Research, 79, 231238.Google Scholar
Lieberman, J. A., Phillips, M., Gu, H., et al (2003) Atypical and conventional antipsychotic drugs in treatment-naive first-episode schizophrenia: a 52-week randomized trial of clozapine vs chlorpromazine. Neuropsychopharmacology, 28, 9951003.Google Scholar
Lieberman, J. A., Stroup, T. S., McEvoy, J. P., et al (2005) Clinical antipsychotic trials of intervention effectiveness (CATIE) Investigators: Effectiveness of antipsychotic drugs in patients with chronic schizophrenia. New England Journal of Medicine, 353, 12091223.Google Scholar
Linden, A., Adams, J. L. & Roberts, N. (2006) Strengthening the case for disease management effectiveness: un-hiding the hidden bias. Journal of Evaluation in Clinical Practice, 12, 140147.Google Scholar
Mallinckrodt, C. H., Sanger, T. M., Dube, S., et al (2003) Assessing and interpreting treatment effects in longitudinal clinical trials with missing data. Biological Psychiatry, 15, 754760.Google Scholar
McCue, R. E., Waheed, R., Urcuyo, L., et al (2006) Comparative effectiveness of second-generation antipsychotics and haloperidol in acute schizophrenia. British Journal of Psychiatry, 189, 433440.CrossRefGoogle ScholarPubMed
McEvoy, J. P., Perkins, D. O., Gu, H., et al (2006) Olanzapine, quetiapine, and risperidone in the treatment of first-episode psychosis: effectiveness and factors influencing adherence to treatment. European Neuropsychopharmacology, 16 (suppl. 4), S425426.Google Scholar
McQuade, R. D., Stock, E., Marcus, R., et al (2004) A comparison of weight change during treatment with olanzapine or aripiprazole: results from a randomized, double-blind study. Journal of Clinical Psychiatry, 65 (suppl. 18), 4756.Google ScholarPubMed
Meehan, K., David, S., Tohen, M., et al (2001) A double blind randomised comparison of the efficacy and safety of intramuscular injections of olanzapine, lorazepam or placebo in treating acutely agitated patients diagnosed with bipolar mania. Journal of Clinical Psychopharmacology, 21, 389397.CrossRefGoogle ScholarPubMed
Moncrieff, J. (1997) Lithium: evidence reconsidered. British Journal of Psychiatry, 171, 113119.CrossRefGoogle ScholarPubMed
Mortimer, K. M., Neugebauer, R., van der Laan, M., et al (2005) An application of model-fitting procedures for marginal structural models. American Journal of Epidemiology, 15, 382388.Google Scholar
National Institute for Clinical Excellence (2002) Guidance on the Use of Newer (Atypical) Antipsychotic Drugs for the Treatment of Schizophrenia. NICE.Google Scholar
Overall, J. E. & Gorham, D. R. (1962) The Brief Psychiatric Rating Scale. Psychological Report, 10, 799812.Google Scholar
Papanikolaou, P. N., Churchill, R., Wahlbeck, K., et al (2004) Safety reporting in randomized trials of mental health interventions. American Journal of Psychiatry, 161, 16921697.Google Scholar
Phelan, M., Stradins, L. & Morrison, S. (2001) Physical health of people with severe mental illness. BMJ, 322, 443444.Google Scholar
Roland, M. & Torgerson, D. J. (1998) What are pragmatic trials? BMJ, 316, 285.Google Scholar
Schulz, K. F., Chalmers, I., Hayes, R. J., et al (1995) Empirical evidence of bias. JAMA, 273, 408412 Google Scholar
Stroup, T. S., McEvoy, J. P. & Swartz, M. S. (2003) The National Institute of Mental Health Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) Project: schizophrenia trial design and protocol development. Schizophrenia Bulletin, 29, 1531.Google Scholar
Taylor, A. M., Shajahan, P. & Carleton, R. (2006) Comparing the use and outcomes of antipsychotics in the real world. European Neuropsychopharmacology, 16 (suppl. 4), S414415.CrossRefGoogle Scholar
Thornicroft, G. & Strathdee, G. (1994) How many psychiatric beds. BMJ, 309, 970971.Google Scholar
Thornley, B. & Adams, C. (1998) Content and quality of 2000 controlled trials in schizophrenia over 50 years. BMJ, 317, 11811184.Google Scholar
Tiihonen, J., Walhbeck, K., Lönnqvist, J., et al (2006) Effectiveness of antipsychotic treatments in a nationwide cohort of patients in community care after first hospitalisation due to schizophrenia and schizoaffective disorder: observational follow-up study. BMJ, 333, 224.Google Scholar
van Os, J., Drukker, M. A., Campo, J., et al (2006) Validation of remission criteria for schizophrenia. American Journal of Psychiatry, 163, 20002002.Google Scholar
von Knorring, L. & Lindstrom, E. (1995) Principal components and further possibilities with the PANSS. Acta Psychiatrica Scandinavica Supplementum, 388, 510.Google Scholar
Vray, M., Hamelin, B. & Jaillon, P. (2005) The respective roles of controlled clinical trials and cohort monitoring studies in the pre-and postmarketing assessment of drugs. Therapie, 60, 339349.Google Scholar
Ward, E., King, M., Lloyd, M., et al (2000) Randomised controlled trial of non-directive counselling, cognitive–behaviour therapy, and usual general practitioner care for patients with depression I: clinical effectiveness. BMJ, 321, 13931399.CrossRefGoogle ScholarPubMed
Wennberg, J. E., Barry, M. J., Fowler, F. J., et al (1993) Outcomes research, PORTs, and health care reform. Doing more good than harm: the evaluation of health care interventions. Annals of the New York Academy of Science, 703, 5662.Google Scholar
Williams, D. D. R. & Garner, J. (2002) The case against the evidence': a different perspective on evidence-based medicine. British Journal of Psychiatry, 180, 812.CrossRefGoogle ScholarPubMed
Wright, P., Birkett, M., David, S., et al (2001) Double blind, placebo-controlled comparison of intramuscular olanzapine and intramuscular haloperidol in the treatment of acute agitation in schizophrenia. American Journal of Psychiatry, 158, 11491151.Google Scholar
Yastrubetskaya, O., Chiu, E. O. & Connell, S. (1997) Is good clinical research practice for clinical trials good for clinical practice? International Journal of Geriatric Psychiatry, 12, 227231.Google Scholar
Zelen, M. (1979) A new design for randomized clinical trials. New England Journal of Medicine, 300, 12421245.Google Scholar
Figure 0

Table 1 Comparison of key features of randomised controlled trials and observational studies

Figure 1

Table 2 Key recent observational and pragmatic studies and randomised controlled trials in schizophrenia

Submit a response

eLetters

No eLetters have been published for this article.