Skip to main content

Using Synthetic Data to Improve Machine Learning

· 2 min read
Justin Bornais

Artificial intelligence is becoming an increasingly prominent field of computer science. The idea of computers being able to think for themselves is both amazing and is what motivates me to learn more about programming.

There is only one drawback to AI (Artificial Intelligence): data is needed for the model to learn and mature. A lot of data in fact, especially for sophisticated models. For example, Inception V3 (a model for image classification developed from Google) requires over a million datapoints for training.

That is a lot of data! Collecting the data can be expensive and incredibly time consuming. If only it were possible to generate your own data instead of having to collect it organically.

That is where synthetic data comes in. Synthetic data can solve lots of problems regarding AI. For one thing, it allows developers to work with more data without having to overcome various obstacles they may face. Some of these obstacles include the cost of collecting information, as well as potential privacy concerns.

It also will preserve relationships between variables in a model, as the data will be created intelligently instead of randomly. Not only that, but synthetic data can simulate conditions not yet encountered in your organic dataset. This can be highly beneficial for the health industry, robotics, security and other areas.

Now of course, any synthetic models generating data from already existing models can only go so far. They can only simulate general trends and reproduce specific properties derived from their organic counterparts. Though, a study at MIT in 2017 showed that training a model from real versus artificial data showed no significant performance difference 70% of the time.

Synthetic data is a growing reality in the field of artificial intelligence. As developers discover new and improved ways to generate data, we will see an increasing amount of synthetic data in various machine learning applications.

I hope this excites you as much as it excites me!