The trial the field thought couldn't happen
In December 2023, Scott Aaronson and colleagues at the Sheppard Pratt Institute for Advanced Diagnostics and Treatment in Baltimore published online — and in June 2024 in print — the first peer-reviewed clinical trial of psilocybin in bipolar disorder. The paper appeared in JAMA Psychiatry under the title "Single-Dose Synthetic Psilocybin With Psychotherapy for Treatment-Resistant Bipolar Type II Major Depressive Episodes: A Nonrandomized Open-Label Trial" (Aaronson et al., 2024). Fifteen adults with DSM-5 bipolar II disorder in a current depressive episode of at least three months' duration, with a mean of 4.27 prior failed pharmacotherapies, received a single 25 mg dose of synthetic psilocybin (COMP360, COMPASS Pathways) after a two-week medication washout, embedded in a protocol of three preparation sessions, an eight-hour dosing day with two therapists present, and three integration sessions over the following fortnight.
At week three, the Montgomery–Åsberg Depression Rating Scale (MADRS) had fallen by 24.0 points (SD 9.23) from baseline — a 76.3 percent mean reduction. The Cohen's *d* was 4.08, a figure roughly four times the conventional threshold for a "large" effect. All fifteen participants had a lower MADRS score at week three than at baseline. Twelve of fifteen (80 percent) met response criteria (≥50 percent MADRS reduction); eleven (73 percent) met full remission (MADRS ≤10). At week twelve, twelve of fifteen still met both response and remission criteria simultaneously. The Young Mania Rating Scale was administered at every study visit. It did not increase. The trend across the twelve weeks was a non-significant decrease (F₃,₄₂ = 2.85; p = .049, no significant pairwise differences). The Columbia Suicide Severity Rating Scale did not rise. Zero participants developed a treatment-emergent hypomanic or manic episode. Zero suicide attempts occurred during the study window.
This is the first published peer-reviewed psilocybin trial that did not exclude bipolar disorder from enrolment. The exclusion has been a structural feature of the modern psychedelic regulatory science literature since the contemporary trial programmes began in the mid-2000s. The decision to test it belongs to Aaronson and the Sheppard Pratt group, not to a marquee Stanford or Hopkins programme. The drug was supplied by COMPASS Pathways. The trial registration is NCT04433845. The result was published before its critique was — the April 2024 letter to JAMA Psychiatry (PMID 38598200) raised exactly the concerns about open-label expectancy that the paper's own discussion section had already conceded. What follows in the rest of this article is what the trial showed, what it cannot show, why the field's two-decade exclusion was reasonable in 2006 and is no longer settled, and where the empirical, mechanistic, and ethical lines now sit.
Why bipolar patients were excluded for two decades
The exclusion is older than the modern trials. Cohen, in his 1960 survey of forty-four LSD investigators reporting on roughly five thousand individuals receiving LSD or mescaline, estimated the incidence of psychotic reactions persisting beyond forty-eight hours at 0.8 per 1,000 in patients and the suicide-attempt rate at 1.2 per 1,000 (Cohen, 1960). Schaefer and Greenberg's 1974 case literature added documented manic reactions to LSD. Frecska and colleagues (2007) reviewed the historical record of psychedelic-induced affective switches. The signal was sparse and never quantified denominator-first, but it was there. When the modern trial programmes began — Johns Hopkins under Roland Griffiths in the early 2000s, Imperial College under David Nutt and Robin Carhart-Harris later that decade, Usona, COMPASS — the asymmetric harm profile of a manic episode (which can take months to recover from and can destroy lives) against an absence of trial-grade safety data justified universal exclusion. In the 2006 design ethos, this was decision-theoretically correct. It has been reproduced in every major published trial since.
The case-report literature is what the field has been relying on. Honyiglo and colleagues' 2019 review documented multiple psilocybin- and ayahuasca-induced manic episodes, often in individuals with previously unrecognised bipolar diatheses. Gard and colleagues' 2021 medRxiv preprint, a systematic review of published case studies, sifted 541 hits down to forty-three non-duplicate adverse-reaction reports, with fifteen describing mania or manic-like behaviour persisting beyond acute intoxication. Schweizer and colleagues' developing 2022 meta-analytic work and the subsequent 2026 Molecular Psychiatry systematic review and meta-analysis (doi:10.1038/s41380-026-03657-6) place the total documented case count of psychedelic-induced mania across sixty years of literature in the tens to low hundreds — small in absolute terms, but emerging from a denominator that is not, and may never be, fully knowable. Hinkle and colleagues' 2024 systematic review and meta-analysis in JAMA Psychiatry (214 studies, 3,504 participants) made the structural point explicitly: bipolar disorder was an exclusion criterion in nearly every included study, so the existing trial safety data are by construction silent on the question.
The decision-theoretic logic of the original exclusion was sound. In the absence of prospective trial data, an asymmetric harm profile justifies precaution. The data has now begun to change. The reasoning has not. The point of this article is to read carefully what has changed and what has not.
The observational dataset reshaping the question
The empirical bridge between the case-report literature and the prospective trial is observational. The single largest dataset is Morton and colleagues' 2023 paper in the Journal of Psychopharmacology — an international web-based survey of 541 adults with self-reported bipolar disorder who reported having used psilocybin (Morton et al., 2023, J Psychopharmacol 37(1):49–60). One-third (32.2 percent) reported new or increasing symptoms after psilocybin use. The specific adverse events were: mania symptoms in 14.2 percent, sleep disturbance in 10.4 percent, anxiety in 9.4 percent, depression in 8.9 percent. Emergency medical services were required in 3.3 percent. Overall, respondents reported the drug as more helpful than harmful, with themes of decreased depression, increased emotional processing, and novel perspective dominating the qualitative reports. DellaCrosse and colleagues' 2022 PLOS One qualitative follow-up of fifteen of these respondents added the contextual texture: structured intention, low-arousal setting, and supportive integration were the variables that distinguished good outcomes from bad in the self-administering bipolar population.
What observational self-report data can and cannot tell us is the central interpretive question. They cannot establish causality. They cannot adjudicate between psilocybin-induced mania and naturally occurring manic episodes in a population whose disease causes mania at baseline. They cannot capture the people who chose not to respond to the survey. What they do establish is a bound: the rate of self-reported mania in bipolar individuals who self-administer psilocybin is on the order of fourteen percent, not eighty percent. The field's catastrophising of universal mania induction — the implicit prior that justified the universal exclusion — has been, at minimum, overcalibrated. A 14.2 percent self-reported rate, with a 3.3 percent emergency-services rate, is a real signal that demands respect; it is not a fingerprint of a drug that reliably destabilises bipolar patients.
Aaronson and colleagues address this directly in their JAMA Psychiatry discussion. The adverse outcomes seen in Morton's recreational-use sample — mania, sleep disruption, hospitalisation — were not seen in their fifteen-person clinical trial. "Administration of a psychedelic agent under carefully controlled and supportive conditions," the authors write, "may yield distinct effects compared to self-report surveys on recreational use of psychedelics by people with BD." This is the central interpretive crux of the contemporary literature. The set-and-setting argument, developed in the OOTW Journal piece on set and setting science, is not a soft framing claim about ambience. In bipolar disorder it appears to be a structural mediator of who develops mania and who does not.
The Aaronson trial — what it actually showed
The Sheppard Pratt protocol is worth describing in detail, because what it generalises to is exactly what it specified. Inclusions: DSM-5 bipolar II disorder, current depressive episode ≥3 months, Hamilton Rating Scale for Depression score ≥18, YMRS <10, at least two prior failed adequate pharmacotherapies. The mean baseline duration of the current depressive episode was 30.2 months — strikingly chronic. The mean age at bipolar II onset was 29.1 years; mean illness duration 9.4 years; lifetime suicide attempts in five of fifteen; lifetime psychiatric hospitalisations in seven of fifteen. Exclusions: bipolar I disorder, any psychotic features, schizoaffective or paranoid or borderline personality disorder, substance use disorder in the prior twelve months, concurrent lithium, and prior psychedelic-induced mood elevation. Each criterion was load-bearing.
The dose was single, 25 mg, synthetic psilocybin (COMP360), administered after a two-week medication washout. Three preparation sessions preceded the dosing day. Two therapists were present throughout the eight-hour dosing session. Three integration sessions followed in the following fortnight. The primary outcome was the change in MADRS from baseline to week three. Mean change: −24.00 points (SD 9.23); 95 percent CI −29.11 to −18.89; p < .001. Cohen's *d* = 4.08. Response (80 percent) and remission (73 percent) at week three were sustained in 80 percent of participants at week twelve. The Quality of Life Enjoyment and Satisfaction Questionnaire–Short Form improved with a Cohen's *d* of 2.30 at week twelve. The 5D Altered States of Consciousness scale's global intensity at dosing correlated with week-twelve MADRS at r = −0.69 (p = .005); the Visionary Restructuralisation subscale at r = −0.79 — the strongest single predictor of clinical benefit in the trial, paralleling the mystical experience as mediator literature that has characterised the unipolar depression trials.
The YMRS did not rise at any visit. The CSSRS did not rise. Zero hypomanic switches were recorded. Zero suicide attempts. The most common adverse event was day-of-dosing headache (four of fifteen), resolving within twenty-four hours. Six of fifteen restarted at least one medication during the twelve-week follow-up; nine did not. The three participants who restarted before week three or after relapse were coded as nonresponders in the conservative analysis.
The effect size deserves a sentence on its own. Cohen's *d* = 4.08 places this trial's signal among the largest ever recorded in a depression intervention study. The original Carhart-Harris and Goodwin unipolar psilocybin trials in treatment-resistant depression returned effect sizes in the *d* = 2–3 range — among the largest in modern psychiatry, and which OOTW Journal covered in the depression trials piece. Aaronson 2024 exceeds even those, in a population the field had explicitly believed would generate a worse signal because of mania risk. That this is *n* = 15 and open-label, with all the inflation that implies, is the topic of section ten. That it is the signal we now have to interpret is the topic of the rest of this article.
The TrkB and BDNF hero mechanism
The mechanistic case for psilocybin in bipolar disorder is, paradoxically, cleaner than the case for unipolar major depression. The reason sits in two papers that converged in 2023.
The first is Fernandes, Molendijk, Köhler and colleagues' 2015 meta-analysis in BMC Medicine. The authors aggregated fifty-two studies of peripheral brain-derived neurotrophic factor (BDNF) levels in 6,481 participants with bipolar disorder and healthy controls. The finding: peripheral BDNF is reduced in bipolar disorder in mania and in depression, to similar magnitudes, relative to healthy controls. The deficit correlates negatively with severity in both poles. BDNF increases after successful treatment of acute mania. Munkholm, Vinberg, and Kessing's 2016 meta-analytic update confirmed the result. Lin's 2009 earlier Neuroscience Letters meta-analysis had pointed at the depression-pole signal first. Polyakova and colleagues' 2015 work extended the picture. Across these independent meta-analyses, reduced peripheral BDNF as a biomarker of bipolar disease activity is among the most reproducible biological findings in the disorder's pathophysiology. Bipolar disorder, at the molecular level, is in significant part a BDNF-deficit disease — and the deficit is not pole-specific. It is present in the substrate. It is not consequent on mood state alone.
The second is Moliner, Girych, Brunello and colleagues' 2023 paper in Nature Neuroscience. The authors demonstrated that psychedelics — psilocin, LSD, DMT — bind directly to a transmembrane site on the TrkB receptor, the receptor for BDNF, and act as positive allosteric modulators of BDNF signalling. The binding is independent of 5-HT2A. The affinity, as reported, is approximately one thousand-fold greater than ketamine, the prior canonical TrkB-engaging plastogen. Conformational coupling between psilocin and TrkB increases the receptor's responsiveness to ambient BDNF and drives the structural plasticity signature — dendritic spine growth, dendritic arbour elaboration, synaptogenesis — that anchors the broader BDNF and neuroplasticity literature. Psilocin, in other words, is a BDNF-receptor agent acting near-directly on the receptor whose downstream pathway is constitutively underactive in bipolar disorder.
The convergence is the same shape the OOTW Journal piece on Parkinson's disease laid out — and structurally parallel. In Parkinson's, α-synuclein binds the kinase domain of TrkB and shuts it down (Kang et al., 2017); psilocin binds the same receptor with the opposite valence. In bipolar disorder, the deficit is not a single aggregated protein driving TrkB suppression but a constitutive lower set-point in BDNF-pathway signalling. The biology differs — α-synuclein aggregation in one disease versus dysregulated BDNF signalling in the other — but the downstream consequence (impaired TrkB activity) and the intervention logic (psilocin's TrkB engagement) are the same. The pharmacological case for psilocybin in bipolar II depression is, mechanistically, more direct than the case for unipolar major depressive disorder. The molecule's primary downstream effect maps onto the disease's primary molecular signature.
This is the central reason to take Aaronson 2024 seriously rather than to dismiss it as an *n* = 15 anomaly. The effect size makes sense if the molecule is doing what the receptor binding study says it is doing in the population the meta-analysis says has the deficit. It must also be said plainly: BDNF restoration alone does not explain why psilocybin should fail to trigger mania. That is the question of the next two sections.
5-HT2A modulation and the mood-elevation axis
The straightforward pharmacological prediction is that psilocybin should worsen mania in bipolar disorder. Atypical antipsychotics — risperidone, olanzapine, quetiapine, asenapine, aripiprazole — are the first-line pharmacological mood stabilisers in mania. The mechanism they share is 5-HT2A antagonism or inverse agonism. The Nature Molecular Psychiatry 2024 pharmacological fingerprint review (doi:10.1038/s41380-024-02531-7) showed that the anti-manic efficacy of antipsychotics correlates with their 5-HT2A binding affinity. Psilocybin is a 5-HT2A agonist. The naïve receptor prediction is that 5-HT2A agonism should be pro-manic. Cannon and colleagues' 2006 post-mortem work and Tsuji and colleagues' 2014 receptor-binding studies further document elevated 5-HT2A density in the frontal cortex of bipolar patients post-mortem — additional substrate for the receptor-level worry.
Vargas, Dunlap and colleagues' 2023 Science paper resolved the upstream question of how 5-HT2A agonism produces durable plasticity. Serotonin itself is polar and does not readily cross neuronal membranes. Psychedelics are lipophilic and access intracellular 5-HT2A receptors, and it is the intracellular pool that drives the structural plasticity signature. The pharmacology is not "more serotonin" but access to a receptor pool the endogenous ligand cannot reach — the distinction underpinning the broader 5-HT2A receptor literature. For bipolar disorder, however, the question is not whether intracellular 5-HT2A engagement drives plasticity; it is whether the same receptor engagement that produces benefit in unipolar depression should be expected to drive mania in bipolar substrate. Three reconciliations apply.
First, time scale. Atypical antipsychotics silence chronic constitutive 5-HT2A tone over weeks of daily dosing. Psilocybin produces an acute four-to-six-hour pulse of agonism on a single occasion. The pharmacology is qualitatively different. Acute pulsed agonism at 5-HT2A is followed by rapid receptor internalisation, functional desensitisation, and post-acute reductions in receptor density that persist for days to weeks. The acute occupancy state and the post-acute state differ in direction. The therapeutic window — the days-to-weeks period in which the antidepressant effect emerges — corresponds to a 5-HT2A hypoactive phase that resembles the receptor signature of mood stabilisers more than it resembles the receptor signature of the acute drug. Second, the downstream TrkB engagement described in section five dominates the post-acute phase. The clinical signature looks more like a plasticity-promoting agent than a stimulant. Third, and most consequential for the empirical result: the Aaronson protocol restricted enrolment to bipolar II without psychotic features, excluded bipolar I, excluded prior psychedelic-induced mood elevation, and excluded lithium. The trial population was the lowest baseline-mania-risk slice of the bipolar spectrum. The receptor-level paradox resolves into a stratification problem. Agonism and antagonism at 5-HT2A on different schedules in different populations are not opposite interventions on the same biology.
This is structurally the same reconciliation that the OOTW Journal A48 article on Parkinson's offered for the pimavanserin paradox: a 5-HT2A inverse agonist treats Parkinson's psychosis while a 5-HT2A agonist appears safe in non-psychotic Parkinson's. The receptor-level prediction is not wrong; it is operating on a different population and time scale than the clinical trial. The same logic, applied to bipolar disorder, lands the same way.
The cognitive-flexibility and default-mode-network case
The second mechanistic angle approaches bipolar pathology from the network level rather than the receptor level. Bora, Yücel and Pantelis's 2009 meta-analysis in the Journal of Affective Disorders aggregated forty-five studies of euthymic bipolar patients and found large impairments in cognitive flexibility, set-shifting, and reversal learning. Set-shifting: Cohen's *d* = 0.83. Sustained visual attention: *d* = 0.86. Conceptual flexibility and perseveration: *d* = 0.59–0.77. First-degree relatives also show impairment (Stroop *d* = 0.51), implicating the deficits as endophenotypic rather than state-dependent. The pathology persists into mood-stable periods. The deficits are not artefacts of acute episode.
Doss, Považan, and Rosenberg's 2021 Translational Psychiatry paper showed that psilocybin therapy increases cognitive and neural flexibility in major depressive disorder, with the cognitive-flexibility gain persisting at follow-up. The mechanistic case converges from a second direction: psilocybin moves the dimension that bipolar pathology has flattened. Öngür and colleagues' 2010 work on default-mode-network abnormalities in bipolar disorder and schizophrenia and Martino and colleagues' 2016 PNAS paper on contrasting DMN variability patterns in bipolar depression versus mania document that the bipolar depressive state is characterised by DMN hyperconnectivity and medial prefrontal cortex hyperactivity — the same network signature that psilocybin specifically attenuates (Carhart-Harris and colleagues' 2017 Scientific Reports paper and Daws, Timmermann and Giribaldi's 2022 Nature Medicine paper on increased global integration after psilocybin therapy). The convergence is not metaphorical. The network pathology in bipolar depression and the network signature of psilocybin's therapeutic action point at each other.
Bipolar I and bipolar II diverge here too. The Martino 2016 work shows that the manic state is DMN-hypoactive — the opposite direction from the depressive state. A network-rebalancing intervention applied to bipolar II depression engages a target the disease is producing. Applied to a bipolar I manic state, the same intervention would engage a target in the wrong direction. This is one more reason the trial enrolment criteria matter: psilocybin's network-level signature aligns with the pathology of bipolar II depression specifically, and the alignment cannot be assumed to hold for the bipolar I manic substrate.
What happened to the people who did get manic on psychedelics
The case reports that justified the original exclusion are not dismissible. They are diagnosable. Schweizer and colleagues' 2022 systematic review and the subsequent 2026 Molecular Psychiatry meta-analysis (doi:10.1038/s41380-026-03657-6) catalogue what the documented cases of psychedelic-induced mania have in common. The pattern is consistent. The cases tend to involve: bipolar I diagnosis or undiagnosed bipolar I; absence of mood stabiliser or use of lithium specifically; unsupervised recreational use; high or repeated doses; recent rapid cycling or mixed-state episodes preceding use; polysubstance combinations. Gard and colleagues' 2021 review of forty-three case studies returned the same fingerprint: most documented mania cases involved high recreational doses, polysubstance use, or pre-existing undiagnosed bipolar diatheses.
These are exactly the criteria the Aaronson trial used to exclude participants. Bipolar I excluded. Lithium excluded. Substance use disorder in the prior twelve months excluded. Psychotic features excluded. Prior psychedelic-induced mood elevation excluded. The trial population was the lowest-risk slice of the bipolar spectrum, and the trial was supervised, single-dose, low-arousal, and embedded in psychotherapy. Aaronson 2024 is not evidence that psilocybin is safe in bipolar I. It is not evidence that psilocybin is safe in bipolar II patients with rapid cycling, recent mixed states, psychotic features, or active substance use. It is evidence that the field's universal exclusion was overcalibrated for the specific subpopulation of bipolar II without those features. In the authors' own words, quoted verbatim from the discussion section of Aaronson et al. 2024: "Care should be taken in expanding this paradigm to BDI disorder given the higher potential risk… It is premature to extrapolate these data to the BDI population, who are at higher risk of mania and psychosis."
The asymmetry of harm has not been removed by the trial. The asymmetry has been re-located. Bipolar II without lithium, without psychotic features, without recent rapid cycling, with full medication washout, under therapeutic support, on a single 25 mg dose: this is a population in which the trial-grade signal now exists. Outside that envelope, the data are silent.
Lithium: the hard interaction
Nayak, Bradley, Kettner and colleagues' 2021 paper in Pharmacopsychiatry — Johns Hopkins, lead-authored by Sandeep Nayak — analysed the largest available self-report psychedelic database (Erowid) for evidence of drug interactions between classic psychedelics and bipolar medications. The lithium signal is severe. Lithium combined with LSD: twenty-seven of fifty-five reports (49.1 percent) described seizures. Lithium combined with psilocybin: two of six reports (33.3 percent) described seizures. Lamotrigine, in contrast, showed no seizure signal across the available reports. The lithium combination also produced disproportionate rates of severe physical adverse events including loss of consciousness.
The mechanism is plausible. Psychedelics increase cortical excitability via 5-HT2A agonism and reduce tonic locus coeruleus activity. Lithium has complex effects including modulation of serotonergic neurotransmission, lowering of seizure threshold in predisposed individuals, increase in calcium-dependent excitability, and direct neuronal-membrane effects. The combination appears to be synergistically pro-convulsant. Older literature on lithium plus serotonergic agents (Goldstein 1988 and others) documented serotonin-syndrome features and reduced seizure threshold; the contemporary classic-psychedelic-specific data confirm the signal at scale.
The clinical implication is unambiguous. Lithium is an absolute exclusion criterion in every modern psilocybin clinical trial in bipolar disorder. Aaronson 2024 required full medication washout including discontinuation of lithium for at least two weeks before dosing. The forthcoming UCSF trial (NCT05065294), the University of Texas Health Science Center trial (NCT06706232), the University of British Columbia randomised controlled trial (NCT06943573), and any COMPASS Pathways bipolar II expansion will all exclude concurrent lithium. Self-administering psilocybin while taking lithium carries a published seizure-rate signal in the order of one-in-three to one-in-two of identified reports and should be treated as a hard contraindication, not a relative one. The Nayak data are not subtle. They are not contested in the contemporary literature. They are the single most consequential safety finding in the psilocybin-bipolar interaction profile.
The corollary creates the trial-design paradox. Lithium is the most evidence-based long-term mood stabiliser for bipolar disorder. Excluding lithium-treated patients restricts trial enrolment to those who failed lithium, never tolerated lithium, or were never offered lithium — a meaningful slice of the bipolar II population but not its centre of mass. Lamotrigine appears safe to co-administer per Nayak, and is independently the strongest evidence-based agent for bipolar II depressive maintenance. The empirical trial-eligibility population, in practice, is bipolar II patients on lamotrigine, on no mood stabiliser, or in medication washout — not bipolar II patients on lithium. The frame within which "psilocybin works in bipolar disorder" can be tested is narrower than the population the disorder describes.
What we don't know
The case for psilocybin in bipolar II depression rests on a single open-label fifteen-person pilot, a coherent BDNF/TrkB mechanism, a network-level convergence with the underlying pathology, and observational survey data bounding the mania risk at order-of-magnitude ten-percent rather than majority. It does not yet rest on what the field would require to call the intervention established. The gaps are real and worth stating one by one.
*N* = 15. The Aaronson trial is, statistically, a pilot. The confidence interval on any rate the trial reports — including the zero hypomanic switches — is wide. A 95 percent confidence interval on zero events in fifteen participants extends to roughly a 20 percent population rate of switches, by the rule of three. The trial is not powered to detect rare adverse events. Open-label and unblinded. There is no placebo arm. The placebo response rate in bipolar depression trials is well-documented as high — Sysko and colleagues (2007) demonstrated placebo response rates as high as 50 percent in some bipolar depression trial cohorts. The April 2024 letter to JAMA Psychiatry (PMID 38598200) raised this critique formally, and Aaronson's own discussion section concedes it. A Cohen's *d* of 4.08 is genuinely extraordinary; it is also genuinely partly artefactual of the open-label expectation environment. The eventual randomised controlled trial — most likely NCT06943573 at the University of British Columbia, with first results not expected until 2028 — will refine the true effect size downward by some amount the field cannot estimate in advance.
Twelve-week follow-up is the published window. The natural cycle length of bipolar II is approximately twelve to twenty-four months. Whether psilocybin's mood elevation persists, or whether it accelerates the underlying cycling architecture and produces a hypomanic phase six months or twelve months out, cannot be ruled out at the published time horizon. Antidepressant-induced rapid cycling is well-described in bipolar II (Ghaemi, Goldberg, and others); psilocybin's documented six-month mood-elevation effects in the unipolar major depressive disorder trials, applied in bipolar substrate, could in principle manifest as hypomanic phase advancement.
The diagnostic problem is structural. Hirschfeld and colleagues' work with the Mood Disorder Questionnaire established that roughly 21 percent of patients on antidepressants for "unipolar" depression screen positive for bipolar disorder (Hirschfeld et al., 2003; 2005). The Mood Disorder Questionnaire has a sensitivity of 73 percent and specificity of 90 percent for bipolar disorder. Aggregate literature estimates place 50–75 percent of bipolar disorder patients as initially diagnosed with unipolar major depression, with average ten-year diagnostic delay. The Goodwin 2022 NEJM psilocybin trial in treatment-resistant depression — the major trial that anchored the field's MDD evidence base — excluded bipolar disorder by the MINI structured interview, but the MINI does not fully replicate gold-standard SCID assessment. Some published treatment-resistant depression responses to psilocybin, particularly the most durable multi-month elevations, may represent treatment effects in covertly bipolar patients. The "psilocybin in MDD" literature contains, by construction, some unknown fraction of psilocybin-in-undiagnosed-bipolar-II data already.
Bipolar I has not been tested. The "do not give psychedelics to bipolar I patients" standard of care remains the field's position, and the evidence to change it does not exist. The mechanistic prediction — 5-HT2A agonism against a backdrop of dopaminergic hyperarousal, plasticity surge in a population whose mania risk involves cortical hyperexcitability — runs in the wrong direction. No prospective trial has been registered in bipolar I.
Suicidality during mood elevation has not been characterised at scale. The Aaronson trial saw no completed suicides and no CSSRS rise. With *n* = 15 the confidence interval on a low-base-rate event is wide. Bipolar II suicide risk is concentrated in mixed states — dysphoric energy meeting cognitive flexibility. Whether psilocybin's acute increase in flexibility is, in a small subset, a vector for completion behaviour rather than for benefit, is a question the field cannot yet answer.
Repeat dosing is untested. Aaronson 2024 used a single dose. Most therapeutic protocols anticipate two or more sessions. The NCT06706232 University of Texas trial, currently recruiting with primary completion projected for January 2027, is the first to test sequential doses in bipolar II and is also the first to include a suicidality endpoint. The unrecognised hypomania-as-response problem is real: when remission persists for six months in a chronically depressed patient with bipolar II substrate, the differential diagnosis includes covert hypomanic phase elevation. The trial methodology to distinguish sustained antidepressant response from emergent hypomania has not yet been adequately defined for psychedelic-assisted therapy.
The selection bias of trial participants is enormous. Aaronson's fifteen participants were highly motivated, predominantly White, free of substance use disorder, free of lithium use, willing to undergo medication washout, willing to participate in an open-label single-arm protocol. Most bipolar II patients in the real-world population would not meet these criteria. The therapeutic context — three preparation sessions, eight-hour dosing with two therapists, three integration sessions — is intensive. Whether the same drug administered without the same scaffold produces the same outcome is essentially untested in bipolar disorder. Whether the signal is the drug or the scaffold, the field does not yet know.
Where this sits in the OOTW arc
The contemporary psychedelic literature is moving — across diseases, across articles in this journal, and across the published trial corpus — from population-level claims to subtype-level claims. "Psychedelics work for depression" was a defensible claim ten years ago; it is now an imprecise one. The contemporary evidence supports a stratified statement. Psilocybin in bipolar II depression, in patients without psychotic features, without lithium co-administration, without active substance use, without recent rapid cycling, in a single 25 mg dose with full preparation and integration support, in a controlled clinical setting, may produce one of the largest effect sizes in psychiatric trial literature for a non-bipolar-specific intervention. That is what Aaronson 2024 supports. It does not support the simpler claim that "psilocybin works for bipolar disorder." It actively does not support that claim, and the trial authors themselves write the constraints into the discussion section.
The same pattern shows up across the recent neuroscience pieces in this journal. The PTSD piece made the case at the subtype level — combat-trauma versus non-combat-trauma, with-medication-history versus without, with-moral-injury versus without. The Parkinson's piece made the case at the disease-stage and exclusion-criterion level: mild-to-moderate disease, no psychotic features, no concurrent pimavanserin, with the disease-modifying biomarker question still unanswered. The ketamine versus psilocybin comparison sits in the same shape. Every neuroscience-grounded case for psychedelic-assisted therapy gets more interesting and harder when read at the subtype level. It gets less catchy. It gets less marketable. It gets more empirically honest.
The field's two-decade exclusion of bipolar disorder from the psilocybin trial corpus was reasonable in 2006 and is no longer settled in 2026. Aaronson 2024 has weakened the strongest prior. The strongest prior has not been refuted. The trial result lives in tension with the receptor-level prediction; the receptor-level prediction lives in tension with the mechanism-and-network-level case; the mechanism case lives in tension with the *n* = 15 sample size; the sample size lives in tension with the unambiguous lithium-interaction data; and the lithium-interaction data lives in tension with the disease's standard of care. What the literature now is, at the bipolar II subtype level, is a single inflection point looking for a confirmatory trial. The University of British Columbia randomised controlled trial (NCT06943573) and the University of Texas sequential-dose trial (NCT06706232) will, between roughly 2027 and 2029, generate the next layer of empirical resolution. The argument here is not that bipolar II is the next indication. The argument is that bipolar II is the next hypothesis, and the rigour with which the field treats the hypothesis will determine whether the cost of being wrong is paid by patients.
The trial happened. The signal is real. The exclusion was overcalibrated for one specific subpopulation. None of those statements collapses into the simpler statement that the field has wanted to be able to make for twenty years. The next pillar in this arc is approaching, and the work of distinguishing what has been demonstrated from what has been hypothesised has never been more load-bearing.
---