Skip to content
AdminMar 24, 2024< 1 min read

An evaluation of the replicability of analyses using synthetic health data

This paper published in Scientific Reports describes a study evaluating the replicability of analyses using synthetic health data. Synthetic data generation is being increasingly used as a privacy preserving approach for sharing health data. In addition to protecting privacy, it is important to ensure that generated data has high utility. A common way to assess utility is the ability of synthetic data to replicate results from the real data. Replicability has been defined using two criteria: (a) replicate the results of the analyses on real data, and (b) ensure valid population inferences from the synthetic data. A simulation study using three heterogeneous real-world datasets evaluated the replicability of logistic regression workloads.