One of the main issues with training machine-learning models for image identification is the huge amount of data typically required.
Datasets can use millions of images to enable an algorithm to recognise and classify new images, items and objects.

These massive datasets can be very expensive and time-consuming to assemble, but now researchers have developed machine-learning models that are trained entirely on synthetic datasets.
A large dataset of real images is still required for training the generative machine-learning model at the top of the chain.
Once it understands what a particular thing should look like, the generative model is able to produce its own realistic synthetic images for other machine-learning models to train with.
According to researchers at MIT, this technique was able to produce results that rival or even outperform AI trained with real datasets.
The team added that working with synthetic data also had a number of other benefits.
Synthetic datasets have a number of advantages
A generated dataset takes far less storage space than a comparably sized set of real images.
It could help avoid some privacy and rights issues regarding how the images in a dataset are used and stored, and it could even help to reduce some biases that can exist in traditional datasets.
The generative model is not only able to create images that are so realistic that they are practically indistinguishable from the real thing, but can also edit out attributes such as race or gender that could give rise to biases.
They can theoretically produce infinitely large datasets for training other models, and they learn how to transform the underlying data behind the original images they are trained on.
This is important as it allows them to ‘imagine’ the image or object in a different setting.
If the model was trained on an original dataset of cars, for example, it could create a realistic image of a car viewed at an angle or in a colour or size that it had not encountered in the original data.
Using multiple views of the same object or image is important in contrastive learning, where a model learns which pairs of unlabelled images have similarities and differences.
Lead researcher Ali Jahanian said that the generative model helped the contrastive model to learn by feeding it different views of the same object.
Today’s news was brought to you by TD SYNNEX – the UK’s number one solutions distributor.
Read more of our latest Infrastructure news stories