Synthetic data: handle with care

Synthetic data: handle with care styles-h2 text-white

March 16, 2023

Synthetic data: handle with care

When using synthetic data to train AI models, first understand the risks amid the many benefits.

<h4>In the news</h4> Many business leaders would agree that data is now the world’s most valuable commodity. What if, as <a href="https://www.forbes.com/sites/robtoews/2022/06/12/synthetic-data-is-about-to-transform-artificial-intelligence/?sh=6ff0375b7523" target="_blank" rel="noopener noreferrer">this piece</a> posits, it were possible to create infinite amounts of it, “cheaply and quickly”? Welcome to the world of synthetic data. As the term implies, this data is created digitally rather than gathered from real-world events. <a href="https://arxiv.org/abs/1809.10790" target="_blank" rel="noopener noreferrer">Research shows</a> that synthetic data may be superior to actual data as a training tool for artificial intelligence (AI) systems. While synthetic data contains none of the original data from which it was derived, it retains the qualities of the original data; therefore, from a statistical standpoint, anything you do with it— such as building a predictive model—will produce the same results as if the original data was used. In healthcare, for example, synthetic data wouldn’t contain any actual patient data, so privacy regulations like HIPAA or GDPR would not apply to it. Synthetic data is already being used heavily in the <a href="https://digitally.cognizant.com/heres-where-self-driving-vehicles-are-becoming-a-reality-wf1423950" target="_blank" rel="noopener noreferrer">autonomous vehicle sector</a>, where it’s essentially impossible to gather enough data to simulate all potential driving situations. Two additional use cases are eliminating bias in <a href="https://www.biometricupdate.com/202302/innovatrics-ceo-advises-careful-use-of-synthetic-data-to-improve-biometrics-cut-bias" target="_blank" rel="noopener noreferrer">biometric algorithm training</a> by increasing the demographic diversity of the data set, and <a href="https://www.healthcareitnews.com/news/how-synthetic-data-can-boost-efficiency-clinical-researchers-and-it-leaders" target="_blank" rel="noopener noreferrer">boosting clinical researchers’ efficiency</a> by using non-patient—and thus more shareable—data. <h4>The Cognizant take</h4> Overall, synthetic data “offers a flexible, cost-effective way to generate high-quality training data for machine learning models,” says Aakash Shirodkar, a Senior Director in Cognizant’s AI & Analytics Practice. “By using synthetic data, companies can address privacy concerns, overcome data scarcity and accelerate the development of AI applications across various industries.” When advising clients on synthetic data, Aakash urges them to consider the risks and constraints of doing so, including: <ul> <li>The more closely the synthetic data resembles the actual underlying data, the more likely it will be reverse-engineered to uncover actual sensitive data.  </li> <li>Outliers can pose a problem to the final model output when synthetic data is scaled. </li> <li>The biases <a href="https://digitally.cognizant.com/built-in-bias-ais-impact-on-hiring-comes-under-scrutiny-wf1150121" target="_blank" rel="noopener noreferrer">contained in real-world data</a> could also potentially cause issues. After all, Aakash notes, “Most real-world data is biased in some way. You run the danger of replicating and magnifying that, skewing your synthetic data accordingly.”  </li> <li>When a field is new, real data sets may be too small to effectively synthesize. As an example, Aakash points to new and emerging <a href="https://digitally.cognizant.com/get-ready-for-new-ways-to-pay-wf1230105" target="_blank" rel="noopener noreferrer">payments methods</a>.</li> </ul> “We have to be very responsible, keep an eye on where we are going, and have checks and balances everywhere,” Aakash says. “Of course, this is true of any new technology—but the stakes are very high here.” He recommends starting with one use case, examining the numerous techniques to synthesize data and selecting the one that best matches the business’s needs.

Tech to Watch Blog

Cognizant’s weekly blog

Understand the transformative impact of emerging technologies on the world around us as they address our most significant global challenges. <a href="mailto:editorialboard@cognizant.com">editorialboard@cognizant.com</a>

linkedin twitter-bird

Latest posts

style

Background Transparent, rm bottom padding

style

Background Transparent

Keep up with AI innovations for business

AI is moving fast. Our bimonthly LinkedIn newsletter helps you do the same. Subscribe for breaking AI news and actionable insights.