Evidence Hub

An evaluation of the replicability of analyses using synthetic health data

Written by Admin | Mar 24, 2024

This paper published in Scientific Reports describes a study evaluating the replicability of analyses using synthetic health data. Synthetic data generation is being increasingly used as a privacy preserving approach for sharing health data. In addition to protecting privacy, it is important to ensure that generated data has high utility. A common way to assess utility is the ability of synthetic data to replicate results from the real data. Replicability has been defined using two criteria: (a) replicate the results of the analyses on real data, and (b) ensure valid population inferences from the synthetic data. A simulation study using three heterogeneous real-world datasets evaluated the replicability of logistic regression workloads.