Abstract: Synthetic data generation is an emerging technology that offers several advantages over traditional “real” data. These advantages include preserving privacy, augmenting data to enhance machine learning model accuracy, simulating scenarios for algorithm testing, and even mitigating bias. This presentation will delve into the insights gathered from using synthetic data to facilitate the creation of healthcare data products and analytics in the context of Real World Evidence. It will also discuss its applicability, opportunities, and challenges, including resistance to adoption.
Abstract: In clinical research, obtaining adequate data quantity and quality is often a challenge. Synthetic data, which possesses the same statistical properties as a specific real patient population, can help address these issues. When used correctly, this type of data can serve as a valid representation of the target population, offering various analytical and scientific benefits. Although there are similarities between simulation methods, significant differences indicate that data synthesis should be viewed as a complement to, rather than a component of, such traditional techniques. Synthetic data can not only enhance the statistical power of analyses by including additional information but can also enrich specific patient subgroups, for instance extreme or rare cases. Furthermore, synthetic methods can be employed to extrapolate to similar yet distinct real patient populations. Since no actual patient information is involved, synthetic data can be shared with others, promoting result communication, increasing confidence in findings, and enhancing knowledge gained from clinical analyses. Various concepts and approaches exist for generating this type of data, namely from Bayesian statistics and, more recently, generative Machine Learning. This talk will provide an overview of both the potential and the pitfalls of synthetic data applications in clinical development.