Financial Synthetic Data is the New Oil for Financial Services

May 26, 2020 Andre van Heerden

Financial Data is very constrained by customer privacy regulations such as GDPR, this hampers the possibility of collaboration between different stakeholders in financial problems such as optimising Anti-Money Laundering (AML) tools and reducing financial crime. Solutions based on Machine Learning (ML) are on the way, but unfortunately the quality data required to train the models is just not available.

The three biggest drawbacks of using ML for AML are: lack of ‘labelled data’, the ‘imbalance class’ of misbehaviour financial activity and finally the evolving threat of finCrime that makes ‘training datasets obsolete’. All of these drawbacks are derived from the unknown ‘hidden crime’ problem.

We address these problems using advanced financial simulation. By creating digital synthetic twins of financial data especially enriched, we are now able to develop and benchmark advanced solutions based on machine learning. We add to our models the known normal and crime customers dynamics that are specially tailored to match realistic crime scenarios in our financial institutions.

These synthetic datasets are the new oil for ML to tackle complex problems and improve our AML controls. Our simulators output augmented non-confidential synthetic data, resulting in trustable enriched synthetic financial data ready for solution providers of advanced analytics.

To hear more, please go to this short talk given for the Security and Privacy 2020 conference:

https://www.linkedin.com/pulse/financial-synthetic-data-new-oil-fincrime-analytics-edgar-lopez-rojas/?published=t