2) Data-sensitive industries turn to synthetic data to ensure privacy and regulatory compliance
For example, the head of AI at a Fortune 500 company explained the importance of synthetic data’s privacy as a key differentiator in model training platforms in a recent interview with CB Insights:
“You have synthetic data generation models or vendors that create completely new datasets off of the dataset that you have internally, but it’s statistically identical, but has completely generated and made up data, so you don’t have to use any customer or employee data, and it reduces your risk of PII.”
3) Data generation startups will face stiff competition from big tech
Tech giants like Microsoft, Google, and Meta are all generating synthetic data to train their models.
Others are building applications for developers.
In July 2024, IBM and Red Hat announced the release of a new open-source tool, InstructLab, which generates synthetic data for model tuning.
“We also recently launched InstructLab, a tool for more rapid model tuning through synthetic data generation, allowing our clients to more efficiently customize models using their own data and expertise.” — Arvind Krishna, CEO of IBM, Q2’24 Earnings Call