The topic area of synthetic data refers to the creation and use of artificially generated data that mimics real-world data. Synthetic data is designed to have similar statistical properties and patterns as the original data while not containing any sensitive or personally identifiable information (PII). It is often used in various fields such as machine learning, data analysis, application development and data privacy within industries such as healthcare, finance, automobile, robotics and insurance.
Synthetic data serves as a privacy-preserving alternative to using real data in situations where privacy concerns or data protection regulations prohibit the use of actual personal or sensitive data. By creating synthetic data that closely resembles the original dataset, organizations can perform analysis, testing, and development activities without exposing sensitive information.
The process of generating synthetic data involves applying statistical models, algorithms, or machine learning techniques to the original data to create new data points that are statistically similar but do not correspond to any real individuals or entities. Various approaches can be used, such as generative adversarial networks (GANs), Variational Autoencoders (VAE), differential privacy techniques, or rule-based algorithms.
The advantages of synthetic data include privacy protection, reduced risk of data breaches, and the ability to share data more freely for research and development. It also allows organizations to create larger datasets that can capture rare events or edge cases, which may be difficult to obtain from real data alone. However, it is important to note that the synthetic data must be carefully validated to ensure that it maintains the desired statistical properties and accurately represents the original data’s characteristics.
The objective is to let the student be exposed to the current research literature on the topic in combination with a range of use cases from the industry or public sector for experimentation and illustration. Below we list a set of knowledge, skills and competences that may be acquired through working on this topic area:
A good primer on the topic can be found here
From the health sector perspective, we recommend reading some of these articles as a starter: