Nataile Schibell, MPH and Pippa Hodgkins
Addressing Confounding in Real-World Treatment Comparisons
In real-world settings, treatment decisions aren’t randomized—they reflect clinical judgment, patient characteristics, and systemic factors. These underlying differences introduce confounding, making it difficult to isolate true treatment effects.
Propensity score (PS) methods — such as matching, weighting, and high-dimensional PS—help address this bias by estimating the probability of treatment based on baseline covariates. When applied correctly, these techniques can balance treatment groups and support more credible comparisons in non-randomized studies. High-dimensional PS, as described in recent peer-reviewed research by Aetion co-founders Jeremy Rassen, Sc.D., and Sebastian Schneeweiss, M.D., Sc.D., offers a scalable, data-driven approach to covariate selection in complex datasets such as claims and EHRs.¹
Aetion® Substantiate operationalizes these methods within a structured, transparent workflow, enabling teams to adjust for confounding with scientific rigor and day-to-day efficiency. From study design through reproducible outputs, Substantiate helps ensure that PS implementation is consistent, scalable, and aligned with regulatory expectations.
Why Use Substantiate for Propensity Score Adjustment
Propensity score methods are foundational to confounding adjustment in real-world evidence. But their value depends on consistent, transparent execution. Substantiate enables teams to implement these methods—matching, weighting, and high-dimensional PS—within a defined, audit-ready workflow that aligns with study protocols.
Rather than stitching together manual steps or relying on custom code, users can apply PS adjustments end-to-end within the platform’s Comparative Effectiveness Plan. Every step is linked—from cohort construction through covariate selection, model configuration, diagnostics, and output—ensuring alignment across studies and teams.
Image 1: Getting started with the Comparative Effectiveness Analysis Plan
Substantiate allows users to:
- Select covariates and define baseline windows: Configure variables and time frames directly within the study design interface, anchored to key analytic milestones.
- Choose the adjustment method: Apply PS matching, inverse probability of treatment weighting (IPTW), ATT/SMR weighting, overlap weighting, or high-dimensional PS—based on study requirements. Each analysis plan supports one matching and one weighting method, with additional methods available through sensitivity analyses.
- Configure method-specific parameters: Adjust caliper width, matching ratios, trimming thresholds, or variable ranking strategy as needed.
- Evaluate covariate balance: Review diagnostics, including standardized mean differences and distribution plots to confirm baseline alignment.
- Export study specifications and version-controlled outputs: Generate traceable documentation for internal governance or external submission, with every decision captured in-platform
This framework supports reproducibility and operational consistency while giving teams the flexibility to tailor methods to the complexity of their data. Substantiate brings structure to real-world analytics—so that scientifically sound methods scale reliably across studies, datasets, and therapeutic areas.
PS Matching in Substantiate
Propensity score matching is a widely used method for reducing confounding in real-world comparative studies. The goal is to create two groups with similar observed characteristics, so any outcome differences are more likely due to treatment than baseline differences.
In Substantiate, propensity score matching is built into the core study workflow. Analysts define the covariates, specify the time window for baseline measurement, and the platform generates propensity scores. Based on the configuration, Substantiate then matches patients across treatment arms using one of several matching algorithms, including 1:1 and variable-ratio methods.
Matching methods available in Substantiate:
- 1:1 Nearest Neighbor Matching (without replacement): Creates tightly aligned patient pairs for precise comparability
- Variable-Ratio Matching (Parallel or Sequential): Expands matching flexibility by allowing each treated patient to be matched with multiple referents, preserving more sample size
- Caliper: (optional parameter): Can be applied across all matching methods to restrict matches to a specified score distance, improving balance and match quality.
Matching is particularly useful when constructing trial-like cohorts—such as in external control arms, trial emulation studies, or regulatory-aligned analyses. Substantiate tracks and retains all produced study outputs, providing traceability of matching parameters and patient selection logic throughout the study lifecycle.
Image 2: Propensity score method selection options in the Comparative Effectiveness Analysis Plan
PS Weighting in Substantiate
Propensity score weighting adjusts for baseline differences by scaling how much each patient contributes to the analysis. Unlike matching, all patients are retained; their influence is weighted based on the probability of receiving treatment, rebalancing the population to improve comparability. This approach is particularly useful when preserving sample size is important or when treatment arms have sufficient overlap but full matching isn’t feasible. For a detailed explanation of these methods, see Understanding Propensity Score Weighting Methods.
In Substantiate, weighting is integrated directly into the comparative effectiveness workflow. Users can select from several weighting strategies and configure method-specific parameters—all within a structured, transparent interface that supports consistent application across studies.
Image 3: Propensity score overlap diagram showing patient density across treatment arms
Weighting options available in Substantiate:
- Inverse Probability of Treatment Weighting (IPTW): Estimates the average treatment effect (ATE) across the full population.
- ATT / SMR Weighting: Estimates the treatment effect among treated patients; commonly used when comparing real-world data to clinical trial cohorts.
- Overlap Weighting: Focuses on patients with similar treatment probabilities, reducing the influence of outliers and improving balance.
Truncation and trimming can be applied to any of the above methods to reduce the impact of extreme weights. This is particularly important in studies with low overlap or high-dimensional covariate sets.
High-Dimensional Propensity Scores in Substantiate
In real-world datasets—especially claims and EHRs—the number of potential covariates can be extensive. Manually selecting covariates can be time-intensive and may require iterative clinical and methodological input. While investigator-defined models remain standard, they can be limited when working with unfamiliar therapeutic areas or large, exploratory datasets.
High-dimensional propensity score (hdPS) methods were developed to help address this complexity. Instead of relying solely on predefined covariate lists, hdPS uses structured, data-driven algorithms to systematically identify and rank covariates most likely to influence both treatment and outcome. This is especially useful when confounders are not well-characterized or when covariate definitions vary across datasets. Published validation has shown that hdPS can perform comparably to—or better than—investigator-defined models in high-dimensional settings.
Image 4: Patient Characteristics for patients with propensity scores generated by the high-dimensional propensity score model
In Substantiate, high-dimensional propensity score (hdPS) modeling is integrated directly into the workflow. The platform enables users to:
- Specify data attributes: Choose from key data attributes—such as diagnoses, procedures, or medications, to define the covariate space used for automated selection in high-dimensional propensity score modeling.
- Surface candidate covariates automatically: Identify potential confounders across the selected domains without manual extraction or coding.
- Rank covariates based on user-selected logic: Choose from predefined options to prioritize covariates by prevalence (frequency) or bias potential (association with both treatment and outcome).
- Generate the propensity score model: Build the PS model using the ranked covariates—entirely within the platform, with no external preprocessing required.
The process is fully transparent and easy to adjust. Users can control which data attributes are included, how far back in time to look for baseline information, and how covariates are prioritized—while relying on automation to identify those most likely to influence treatment and outcome. This helps ensure that studies remain consistent and repeatable across teams and datasets.
Choose the Right PS Strategy—All in One Platform
Different study designs call for different adjustment strategies. Some require tightly matched cohorts for interpretability; others prioritize preserving the whole sample or minimizing variance. Substantiate supports multiple propensity score methods within a unified platform, giving researchers the flexibility to choose the right approach based on the question, the data, and the analytic constraints.
Propensity Score Methods and Their Application Across RWE Teams
Method |
Purpose |
What It Does |
Best Used When |
Matching |
Create balanced comparison groups |
Matches patients with similar propensity scores; supports both 1:1 and variable-ratio configurations |
Used by RWE teams conducting comparative effectiveness studies, regulatory-aligned analyses, and trial emulations |
IPTW (Weighting) |
Estimate the treatment effect across the full population |
Applies weights to all patients to balance covariates between groups |
Preferred by HEOR teams conducting population-level analyses where generalizability and full sample retention matter |
ATT/SMR Weighting |
Estimate the effect among those treated |
Reweights the comparator arm to resemble the treated group |
Common in safety or outcomes studies comparing real-world cohorts to trial populations or registry-based controls |
Overlap Weighting |
Focus on the most comparable subset of patients |
Prioritizes patients with similar treatment probabilities; minimizes the influence of outliers |
Ideal when treatment groups differ substantially—often used by methods teams and comparative safety researchers |
High-Dimensional PS |
Empirically identify key confounders in large datasets |
Algorithmically selects and ranks covariates based on bias or prevalence to build the PS model |
Used by data science teams working with large claims or EHR data when covariate selection is complex or uncertain |
Structure Matters —Substantiate Makes It Work
Substantiate equips research teams to implement propensity score methods—matching, weighting, and high-dimensional PS—within a consistent, transparent framework. The platform guides users from cohort construction through covariate selection, model configuration, and diagnostics, supporting both methodological rigor and day-to-day workflow efficiency.
All study inputs and outputs are version-controlled and fully documented, supporting internal reproducibility and external defensibility. Whether designing early feasibility analyses or generating comparative evidence for regulatory or payer decision-making, teams can rely on Substantiate to deliver consistency across studies and datasets.
Explore the Evidence Hub or contact our team to see how Substantiate powers reliable, scalable implementation of PS methods.