Discover
Visualize rapid, validated insights through real-world data.
Aetion has entered into an agreement to be acquired by Datavant, a leading health data platform company. Learn more →
A growing share of clinical trials are stalling—not for lack of innovation, but because of insufficient patient recruitment. Studies often fail to accrue the participants needed to power statistically valid conclusions in oncology and rare diseases. Budgetary constraints, complex protocols, and limited patient populations only heighten the risk.
The implications are profound. Each stalled trial represents lost investment and, more importantly, lost scientific insight. For participants, it may mean exposure to investigational risk without benefit. And for the broader population, it delays the development of new treatments and slows progress toward market access.
But what if stalled trials didn’t have to be lost opportunities—for science, sponsors, or patients?
A new peer-reviewed study published in JMIR, co-authored by Lucy Mosquera of Aetion, offers a potential solution: using generative models to simulate missing patients and restore trial power. The study validates a technique known as sequential synthesis, as implemented in Aetion®’s Generate: Enhance solution. It provides a scientifically rigorous method to rescue incomplete evidence and prioritize the next steps in development.
Aetion, in collaboration with academic partners including the University of Ottawa and the CHEO Research Institute, evaluated how generative models could augment incomplete clinical trials. The study was led by Lucy Mosquera, Senior Director of Data Science at Aetion, and Dr. Khaled El Emam, along with a broader team of clinical and methodological experts from institutions across Canada, Austria, and the U.S.
The team analyzed 10 datasets from 9 completed oncology trials. For each, they simulated a common real-world challenge: insufficient accrual. Between 10% and 50% of patients were removed to mimic early trial closure. The remaining data were then used to train generative models that simulated the missing patients. The full original trial results served as the benchmark to evaluate whether the partially synthetic datasets could replicate the actual treatment effects.
Four classes of generative models were evaluated based on their ability to produce high-utility synthetic health data:
Each generative model was benchmarked against a bootstrap baseline, which involved creating synthetic datasets by randomly sampling real records with replacement. This approach preserves the statistical properties of the original dataset without generating truly new data, serving as a reference point for evaluating the added value of more advanced generative methods.
Sequential synthesis is a generative method tailored to clinical and structured health data. It works by training on observed data and generating synthetic patient records that retain the statistical properties of the original cohort.
Step |
Description |
Training |
The model is trained on an incomplete dataset, learning covariate and outcome patterns in both treated and control patients. |
Sequential data generation |
Synthetic patients are created iteratively, preserving inter-variable relationships and assigning treatment arms. |
Outcome simulation |
Clinical outcomes are generated conditional on covariates and treatment arms. |
Data augmentation |
Incomplete real trial data is joined with the synthesized patients. |
Validation |
Augmented datasets are compared to the full original datasets for treatment effect recovery. |
This method supports transparency, reproducibility, and analytic rigor. It does not replace randomized controlled trials—but when recruitment stops short, it offers a scientifically grounded way to extract more value from limited data.
The study first assessed the validity of using early-enrolled patients to train generative models to produce late-enrolled patients. This key assumption forms the foundation of this generative approach, regardless of the modeling solution used. Across nine clinical trials, no statistically significant interaction was found between recruitment order and treatment effect (e.g., REaCT-ILIAD: −0.00000371; 95% CI: −0.0000116 to 4.16×10⁻⁶). This shows no significant differences in the treatment effects observed between early and late recruited patients in these oncology trials.
Following this validation, the study evaluated specific performance metrics for sequential synthesis, revealing consistent performance across key criteria. Sequential synthesis often outperformed other generative methods—including Bayesian networks, conditional GANs, and variational autoencoders—particularly regarding stability and interpretability for structured clinical data.
Key performance metrics included:
Sequential synthesis emerged as one of the more consistently reliable techniques among the four evaluated methods. It delivered robust performance across large and small datasets, even when trial outcomes were marginal or underpowered. In such cases, synthetic augmentation helped stabilize conclusions and preserve the treatment signal without introducing bias.
The evaluation metrics used in this study will be familiar to those working with real-world data (RWD). They closely align with those used in Aetion’s RCT DUPLICATE initiative, which rigorously assessed whether real-world evidence (RWE) could reproduce findings from randomized controlled trials. Just as RCT DUPLICATE helped establish conditions under which RWE can provide valid clinical answers, this new research shows that generative models can play a similar role—credibly augmenting underpowered trials and reinforcing evidence when traditional designs fall short.
Use Case |
Why It Matters |
Underpowered Phase 2 studies |
Enables recovery of analytic power to support go/no-go decisions |
Rare diseases or small populations |
Extends insights in trials where patient accrual is inherently limited |
Grant-funded or academic trials |
Salvages value from studies where funding may not support full recruitment |
Pandemic- or disruption-affected trials |
Recover datasets from interruptions without restarting enrollment |
Early signal validation |
Supports hypothesis testing and prioritization of next-step research |
Cross-study learning |
Augments existing data to support simulation-based protocol refinement |
This peer-reviewed study, conducted across completed Phase 3 oncology trials, affirms the methodology behind Aetion® Generate: Enhance, a component of Aetion Evidence Platform®. Independently led and scientifically rigorous, the research aligns with Aetion’s approach to virtual patient modeling using RWD.
While the study used Phase 3 data for validation, this method is especially relevant in Phase 2 trials, where under-enrollment is more common and consequential. These earlier-stage studies often face tighter budgets, limited populations, and more exploratory endpoints, making recovering insight and preserving decision power even more critical.
By simulating missing patients based on real trial dynamics, Generate: Enhance enables teams to extract evidence that informs high-stakes development decisions in a range of scenarios:
This ability to extend trial data with scientifically grounded simulation offers sponsors a powerful lever to de-risk development and make more confident go/no-go decisions.
Aetion’s Generate: Enhance translates the peer-reviewed methodology validated in the JMIR study—sequential synthesis—into an operational solution built for real-world research. It expands on the study’s foundation with RWD integration, transparent documentation, and regulatory-aligned workflows, offering a purpose-built approach to recover signals from incomplete trials, accelerate early insights, and de-risk development decisions.
This research adds to a growing body of peer-reviewed evidence supporting sequential synthesis as a credible and reproducible tool for data augmentation in clinical development. Aetion and its collaborators have contributed to 12 peer-reviewed publications applying this method across various domains—including oncology, pharmacology, RWD, cardiovascular disease, public health (including COVID-19), and health survey research. These studies reinforce the scientific integrity and practical relevance of generative approaches. As the evidence base grows, so too does confidence in the responsible use of sequential synthesis to extend the value of incomplete or underpowered trials.
Capability |
Description |
Scientific Impact |
Validated methodology |
Built on sequential synthesis, it has been shown to replicate treatment effects in oncology trials |
Preserves signal and reduces the risk of Type II errors in underpowered trials |
Synthetic patient traceability |
Every virtual patient is fully auditable and linked to the input logic |
Enhances transparency for internal review and external regulatory engagement |
Regulatory-aligned outputs |
Documentation supports exploratory and early-phase development use |
Enables inclusion in go/no-go planning, adaptive trial design, and grant support |
Flexible integration |
Compatible with trial data or real-world data sources |
Adapts to sponsor needs across research, clinical, and HEOR functions |
Computational efficiency |
Lightweight model deployment and reproducible workflows |
Speeds up analysis cycles without compromising scientific rigor |
Equity and inclusion |
Supports modeling for underrepresented groups or rare disease populations |
Expands the generalizability of findings and informs health equity-focused programs |
These capabilities do not replace traditional evidence generation—but they equip teams to do more with their data, especially in resource-constrained or disrupted research settings. Generate: Enhance allows organizations to extend the life and impact of every dataset, making every patient count.
Too often, the science stops short when clinical trials underperform on enrollment. This study shows that generative models—if applied responsibly and rigorously—can extend the utility of incomplete datasets.
For clinical development teams, this is a shift in mindset. With validated approaches like sequential synthesis, the question becomes: How can we continue learning from the data we do have?
At Aetion, we’re building tools that allow you to do just that. Not to replace trials—but to recover insight, refocus strategy, and accelerate decision-making when traditional evidence generation runs into real-world limits.
Explore how Aetion® Generate can support trial recovery and accelerate insight.