Skip to content

Aetion has entered into an agreement to be acquired by Datavant, a leading health data platform company.   Learn more → 

AetionJun 5, 20258 min read

When Clinical Trials Stall, Evidence Shouldn't: How Generative Models Restore Power to Under-Enrolled Studies

Lucy Mosquera, MS, and Natalie Schibell, MPH

Understanding the Impact of Incomplete Trials

A growing share of clinical trials are stalling—not for lack of innovation, but because of insufficient patient recruitment. Studies often fail to accrue the participants needed to power statistically valid conclusions in oncology and rare diseases. Budgetary constraints, complex protocols, and limited patient populations only heighten the risk.

The implications are profound. Each stalled trial represents lost investment and, more importantly, lost scientific insight. For participants, it may mean exposure to investigational risk without benefit. And for the broader population, it delays the development of new treatments and slows progress toward market access.

But what if stalled trials didn’t have to be lost opportunities—for science, sponsors, or patients?

A new peer-reviewed study published in JMIR, co-authored by Lucy Mosquera of Aetion, offers a potential solution: using generative models to simulate missing patients and restore trial power. The study validates a technique known as sequential synthesis, as implemented in Aetion®’s Generate: Enhance solution. It provides a scientifically rigorous method to rescue incomplete evidence and prioritize the next steps in development.

Simulating Trial Completion with Generative Models

Aetion, in collaboration with academic partners including the University of Ottawa and the CHEO Research Institute, evaluated how generative models could augment incomplete clinical trials. The study was led by Lucy Mosquera, Senior Director of Data Science at Aetion, and Dr. Khaled El Emam, along with a broader team of clinical and methodological experts from institutions across Canada, Austria, and the U.S.

The team analyzed 10 datasets from 9 completed oncology trials. For each, they simulated a common real-world challenge: insufficient accrual. Between 10% and 50% of patients were removed to mimic early trial closure. The remaining data were then used to train generative models that simulated the missing patients. The full original trial results served as the benchmark to evaluate whether the partially synthetic datasets could replicate the actual treatment effects.

Four classes of generative models were evaluated based on their ability to produce high-utility synthetic health data:

  • Sequential synthesis: Constructs records variable by variable using decision trees trained on real data.
  • Bayesian networks: Models probabilistic relationships among variables using directed acyclic graphs.
  • Conditional GANs: Uses adversarial training of neural networks with conditional inputs to generate synthetic data.
  • Variational autoencoders (VAEs): Employs deep learning to encode and reconstruct data through a latent space representation.

Each generative model was benchmarked against a bootstrap baseline, which involved creating synthetic datasets by randomly sampling real records with replacement. This approach preserves the statistical properties of the original dataset without generating truly new data, serving as a reference point for evaluating the added value of more advanced generative methods.

Inside the Sequential Synthesis Method

Sequential synthesis is a generative method tailored to clinical and structured health data. It works by training on observed data and generating synthetic patient records that retain the statistical properties of the original cohort.

Sequential Synthesis Framework for Simulating Incomplete Clinical Trials

Step

Description

Training

The model is trained on an incomplete dataset, learning covariate and outcome patterns in both treated and control patients.

Sequential data generation

Synthetic patients are created iteratively, preserving inter-variable relationships and assigning treatment arms.

Outcome simulation

Clinical outcomes are generated conditional on covariates and treatment arms.

Data augmentation

Incomplete real trial data is joined with the synthesized patients.

Validation

Augmented datasets are compared to the full original datasets for treatment effect recovery.

This method supports transparency, reproducibility, and analytic rigor. It does not replace randomized controlled trials—but when recruitment stops short, it offers a scientifically grounded way to extract more value from limited data.

Scientific Results That Validate the Approach

The study first assessed the validity of using early-enrolled patients to train generative models to produce late-enrolled patients. This key assumption forms the foundation of this generative approach, regardless of the modeling solution used. Across nine clinical trials, no statistically significant interaction was found between recruitment order and treatment effect (e.g., REaCT-ILIAD: −0.00000371; 95% CI: −0.0000116 to 4.16×10⁻⁶). This shows no significant differences in the treatment effects observed between early and late recruited patients in these oncology trials.

Following this validation, the study evaluated specific performance metrics for sequential synthesis, revealing consistent performance across key criteria. Sequential synthesis often outperformed other generative methods—including Bayesian networks, conditional GANs, and variational autoencoders—particularly regarding stability and interpretability for structured clinical data.

Key performance metrics included:

  • Decision agreement: Synthetic augmentation preserved the direction and significance of treatment effects in 88–100% of trials, even with up to 40% of patients missing.
  • Estimate accuracy: In all cases, effect estimates from augmented data fell within the 95% confidence interval of the original.
  • Confidence interval overlap: The CI overlap between augmented and original results was consistently greater than 80%.

Sequential synthesis emerged as one of the more consistently reliable techniques among the four evaluated methods. It delivered robust performance across large and small datasets, even when trial outcomes were marginal or underpowered. In such cases, synthetic augmentation helped stabilize conclusions and preserve the treatment signal without introducing bias.

The evaluation metrics used in this study will be familiar to those working with real-world data (RWD). They closely align with those used in Aetion’s RCT DUPLICATE initiative, which rigorously assessed whether real-world evidence (RWE) could reproduce findings from randomized controlled trials. Just as RCT DUPLICATE helped establish conditions under which RWE can provide valid clinical answers, this new research shows that generative models can play a similar role—credibly augmenting underpowered trials and reinforcing evidence when traditional designs fall short.

Use Cases for Virtual Patient Modeling in Clinical Trials

Use Case

Why It Matters

Underpowered Phase 2 studies

Enables recovery of analytic power to support go/no-go decisions

Rare diseases or small populations

Extends insights in trials where patient accrual is inherently limited

Grant-funded or academic trials

Salvages value from studies where funding may not support full recruitment

Pandemic- or disruption-affected trials

Recover datasets from interruptions without restarting enrollment

Early signal validation

Supports hypothesis testing and prioritization of next-step research

Cross-study learning

Augments existing data to support simulation-based protocol refinement

 

From Peer-Reviewed Research to Practical Application

This peer-reviewed study, conducted across completed Phase 3 oncology trials, affirms the methodology behind Aetion® Generate: Enhance, a component of Aetion Evidence Platform®. Independently led and scientifically rigorous, the research aligns with Aetion’s approach to virtual patient modeling using RWD.

While the study used Phase 3 data for validation, this method is especially relevant in Phase 2 trials, where under-enrollment is more common and consequential. These earlier-stage studies often face tighter budgets, limited populations, and more exploratory endpoints, making recovering insight and preserving decision power even more critical.

By simulating missing patients based on real trial dynamics, Generate: Enhance enables teams to extract evidence that informs high-stakes development decisions in a range of scenarios:

  • Underpowered Phase 2 or 3 studies: Restore statistical confidence when recruitment falls short
  • Rare disease and precision oncology: Simulate additional patients in trials with limited populations
  • Academic and grant-funded research: Provide a defensible way to salvage scientific value from early closures
  • Pandemic- or disruption-affected trials: Recover insights when sites shut down, or funding gaps emerge

This ability to extend trial data with scientifically grounded simulation offers sponsors a powerful lever to de-risk development and make more confident go/no-go decisions.

A Growing Body of Evidence Supports the Methodology

Aetion’s Generate: Enhance translates the peer-reviewed methodology validated in the JMIR study—sequential synthesis—into an operational solution built for real-world research. It expands on the study’s foundation with RWD integration, transparent documentation, and regulatory-aligned workflows, offering a purpose-built approach to recover signals from incomplete trials, accelerate early insights, and de-risk development decisions.

This research adds to a growing body of peer-reviewed evidence supporting sequential synthesis as a credible and reproducible tool for data augmentation in clinical development. Aetion and its collaborators have contributed to 12 peer-reviewed publications applying this method across various domains—including oncology, pharmacology, RWD, cardiovascular disease, public health (including COVID-19), and health survey research. These studies reinforce the scientific integrity and practical relevance of generative approaches. As the evidence base grows, so too does confidence in the responsible use of sequential synthesis to extend the value of incomplete or underpowered trials.

Where Generate: Enhance Adds Value

Capability

Description

Scientific Impact

Validated methodology

Built on sequential synthesis, it has been shown to replicate treatment effects in oncology trials

Preserves signal and reduces the risk of Type II errors in underpowered trials

Synthetic patient traceability

Every virtual patient is fully auditable and linked to the input logic

Enhances transparency for internal review and external regulatory engagement

Regulatory-aligned outputs

Documentation supports exploratory and early-phase development use

Enables inclusion in go/no-go planning, adaptive trial design, and grant support

Flexible integration

Compatible with trial data or real-world data sources

Adapts to sponsor needs across research, clinical, and HEOR functions

Computational efficiency

Lightweight model deployment and reproducible workflows

Speeds up analysis cycles without compromising scientific rigor

Equity and inclusion

Supports modeling for underrepresented groups or rare disease populations

Expands the generalizability of findings and informs health equity-focused programs

These capabilities do not replace traditional evidence generation—but they equip teams to do more with their data, especially in resource-constrained or disrupted research settings. Generate: Enhance allows organizations to extend the life and impact of every dataset, making every patient count.

Reframing the Possibility of Evidence Recovery

Too often, the science stops short when clinical trials underperform on enrollment. This study shows that generative models—if applied responsibly and rigorously—can extend the utility of incomplete datasets.

For clinical development teams, this is a shift in mindset. With validated approaches like sequential synthesis, the question becomes: How can we continue learning from the data we do have?

At Aetion, we’re building tools that allow you to do just that. Not to replace trials—but to recover insight, refocus strategy, and accelerate decision-making when traditional evidence generation runs into real-world limits.

Explore how Aetion® Generate can support trial recovery and accelerate insight.

RELATED ARTICLES